How to use MongoDB or other document database to keep video files, with options of adding to existing binary files and parallel read/write

13,263

I've used mongo gridfs for storing media files for a messaging system we built using Mongo so I can share what we ran into.

So before I get into this for your use case scenario I would recommend not using GridFS and actually using something like Amazon S3 (with excellent rest apis for multipart uploads) and store the metadata in Mongo. This is the approach we settled on in our project after first implementing with GridFS. It's not that GridFS isn't great it's just not that well suited for chunking/appending and rewriting small portions of files. For more info here's a quick rundown on what GridFS is good for and not good for:

http://www.mongodb.org/display/DOCS/When+to+use+GridFS

Now if you are bent on using GridFS you need to understand how the driver and read/write concurrency works.

In mongo (2.2) you have one writer thread per schema/db. So this means when you are writing you are essentially locked from having another thread perform an operation. In real life usage this is super fast because the lock yields when a chunk is written (256k) so your reader thread can get some info back. Please look at this concurrency video/presentation for more details:

http://www.10gen.com/presentations/concurrency-internals-mongodb-2-2

So if you look at my two links essentially we can say quetion 2 is answered. You should also understand a little bit about how Mongo writes large data sets and how page faults provide a way for reader threads to get information.

Now let's tackle your first question. The Mongo driver does not provide a way to append data to GridFS. It is meant to be a fire/forget atomic type operation. However if you understand how the data is stored in chunks and how the checksum is calculated then you can do it manually by using the fs.files and fs.chunks methods as this poster talks about here:

Append data to existing gridfs file

So going through those you can see that it is possible to do what you want but my general recommendation is to use a service (such as Amazon S3) that is designed for this type of interaction instead of trying to do extra work to make Mongo fit your needs. Of course you can go to the filesystem directly as well which would be the poor man's choice but you lose redundancy, sharding, replication etc etc that you get with GridFS or S3.

Hope that helps.

-Prasith

Share:
13,263

Related videos on Youtube

Al Mobix
Author by

Al Mobix

Updated on September 15, 2022

Comments

  • Al Mobix
    Al Mobix about 1 year

    I'm working on a video server, and I want to use a database to keep video files. Since I only need to store simple video files with metadata I tried to use MongoDB in Java, via its GridFS mechanism to store the video files and their metadata.

    However, there are two major features I need, and that I couldn't manage using MongoDB:

    1. I want to be able to add to a previously saved video, since saving a video might be performed in chunks. I don't want to delete the binary I have so far, just append bytes at the end of an item.
    2. I want to be able to read from a video item while it is being written. "Thread A" will update the video item, adding more and more bytes, while "Thread B" will read from the item, receiving all the bytes written by "Thread A" as soon as they are written/flushed.

    I tried writing the straightforward code to do that, but it failed. It seems MongoDB doesn't allow multi-threaded access to the binary (even if one thread is doing all the writing), nor could I find a way to add to a binary file - the Java GridFS API only gives an InputStream from an already existing GridFSDBFile, I cannot get an OutputStream to write to it.

    • Is this possible via MongoDB, and if so how?
    • If not, do you know of any other DB that might allow this (preferably nothing too complex such as a full relational DB)?
    • Would I be better off using MongoDB to keep only the metadata of the video files, and manually handle reading and writing the binary data from the filesystem, so I can implement the above requirements on my own?

    Thanks,

    Al

  • Al Mobix
    Al Mobix about 11 years
    Thanks, that helps a lot. It seems clear to me now that I can use MongoDB only for the metadata. However, since I can't use a cloud solution such as Amazon for the binary data - I need to keep the files on my local machines - I'm wondering if there's another DB which is both easy-to-use and document oriented as MongoDB, but also supports binary data as well. Otherwise, my only recourse would be to use the local filesystem manually, which is less preferable.
  • Prasith Govin
    Prasith Govin about 11 years
    Mobix - I'm not aware of any other document db's that handle this any better. You could look into something like a dedicated Media Server such as Adobe's Flash Server but outside of that I'm not aware of any other options other than the filesystem. If this answer helped then please mark it as answered. Thanks.