File Transfer with Socket.io and RethinkDB
RethinkDB lets you store binary objects directly in the database. While this is perfect for small files, storing large files or accessing many small files simultaneously would impact system memory. Because you can not directly stream binary data from RethinkDB yet, you have to load the entire file content in memory. There is ReGrid module to tackle this issue which is inspired by MongoDB GridFS. But I needed a simpler version of this without any ‘revisioning’ or ‘bucket’ and also easily integrable with socket.io without using any socket.io streaming library.
The full demo is available at https://github.com/hassansin/rethinkdb-fileupload
Schema:
The schema is very similar to that of GridFS. Two tables are used:
messages
Used to store file meta information. Astream()
method is defined, which will stream file chunks in sequence so that we don’t have to load the entire file in memory. We can then easily pipe this stream with express.js response object and download the file.fs_chunks
Used for storing binary data. Meta information are file size, name, type, chunkSize etc.chunkSize
is the maximum size of file chunks in bytes. This example used chunk size of 255KB. Each chunk has anindex
value that represents it’s position in the stored file.
Following are the schemas defined using thinky.io - A light ORM for Node.js:
models/messages.js
models/fs-chunks.js
Socket.io events:
Defined four events upload.start
, upload.data
, upload.finish
, upload.delete
for starting file upload, transferring file chunks, finishing file upload and cancelling file upload respectively:
upload.start
: Transfer is initiated by the client by emitting this event. Server returns anuuid
in response. Client uses this id for further transfer.upload.data
: After receivinguuid
from server, client transfers file chunks in sequence along with this unique id. Server stores each chunk infs_chunks
tableupload.finish
: Client informs server about EOF and sends some meta information. Server creates a new file record inmessages
table with theuuid
and meta data.upload.delete
: Client can abort any ongoing upload or delete any existing file by sending this event and theuuid
of the upload/file.
A changefeed is attached to messages
table where file meta is stored. This feed emits a socket.io event whenever a new document is inserted.
Limitations
Some limitations that I would like to address soon
- File chunks are transmitted sequentially from browser using
FileReader
API. The next chunk is sent only after the previous chunk has been transferred and stored in DB. Need to explore the possibility of sending multiple chunks in parallel. - Any interrupted upload would result in orphan chunks in
fs_chunks
table. - Calculate MD5 checksum to detect corrupted file.