File Transfer with Socket.io and RethinkDB
RethinkDB lets you store binary objects directly in the database. While this is perfect for small files, storing large files or accessing many small files simultaneously would impact system memory. Because you can not directly stream binary data from RethinkDB yet, you have to load the entire file content in memory. There is ReGrid module to tackle this issue which is inspired by MongoDB GridFS. But I needed a simpler version of this without any ‘revisioning’ or ‘bucket’ and also easily integrable with socket.io without using any socket.io streaming library.
The full demo is available at https://github.com/hassansin/rethinkdb-fileupload
The schema is very similar to that of GridFS. Two tables are used:
messagesUsed to store file meta information. A
stream()method is defined, which will stream file chunks in sequence so that we don’t have to load the entire file in memory. We can then easily pipe this stream with express.js response object and download the file.
fs_chunksUsed for storing binary data. Meta information are file size, name, type, chunkSize etc.
chunkSizeis the maximum size of file chunks in bytes. This example used chunk size of 255KB. Each chunk has an
indexvalue that represents it’s position in the stored file.
Following are the schemas defined using thinky.io - A light ORM for Node.js:
Defined four events
upload.delete for starting file upload, transferring file chunks, finishing file upload and cancelling file upload respectively:
upload.start: Transfer is initiated by the client by emitting this event. Server returns an
uuidin response. Client uses this id for further transfer.
upload.data: After receiving
uuidfrom server, client transfers file chunks in sequence along with this unique id. Server stores each chunk in
upload.finish: Client informs server about EOF and sends some meta information. Server creates a new file record in
messagestable with the
uuidand meta data.
upload.delete: Client can abort any ongoing upload or delete any existing file by sending this event and the
uuidof the upload/file.
A changefeed is attached to
messages table where file meta is stored. This feed emits a socket.io event whenever a new document is inserted.
Some limitations that I would like to address soon
- File chunks are transmitted sequentially from browser using
FileReaderAPI. The next chunk is sent only after the previous chunk has been transferred and stored in DB. Need to explore the possibility of sending multiple chunks in parallel.
- Any interrupted upload would result in orphan chunks in
- Calculate MD5 checksum to detect corrupted file.