Storing Images in DB - Yea or Nay?

731,923

Solution 1

I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.

There are a couple of issues:

  • database storage is usually more expensive than file system storage
  • you can super-accelerate file system access with standard off the shelf products
    • for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
  • things like web servers, etc, need no special coding or processing to access images in the file system
  • databases win out where transactional integrity between the image and metadata are important.
    • it is more complex to manage integrity between db metadata and file system data
    • it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem

Solution 2

As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.

  • You are storing images that are changing dynamically, say invoices and you wanted to get an invoice as it was on 1 Jan 2007?
  • The government wants you to maintain 6 years of history
  • Images stored in the database do not require a different backup strategy. Images stored on filesystem do
  • It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images

On the other hand there are problems associated

  • Require additional code to extract and stream the images
  • Latency may be slower than direct file access
  • Heavier load on the database server

Solution 3

File store. Facebook engineers had a great talk about it. One take away was to know the practical limit of files in a directory.

Needle in a Haystack: Efficient Storage of Billions of Photos

Solution 4

This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.

FileStream solves most of the problems around storing the files in the DB:

  1. The Blobs are actually stored as files in a folder.
  2. The Blobs can be accessed using either a database connection or over the filesystem.
  3. Backups are integrated.
  4. Migration "just works".

However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.

From the MSDN Article:

Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

Solution 5

File paths in the DB is definitely the way to go - I've heard story after story from customers with TB of images that it became a nightmare trying to store any significant amount of images in a DB - the performance hit alone is too much.

Share:
731,923
James Hall
Author by

James Hall

Co-Founder of ImGame http://imga.me Tech Lead at 352 Inc - http://352inc.com Formerly Technology Directory at Maxmedia - http://maxmedia.com Formerly Associate Dir. Of Technology at AgencyNet Interactive - http://www.agencynet.com

Updated on January 27, 2020

Comments

  • James Hall
    James Hall over 4 years

    So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.

    What do you think are the pros/cons?

  • ethan
    ethan over 15 years
    Not having a seperate backup strategy can be a big deal when you are writing applications that are installed on premise (like SharePoint). When you create a SharePoint backup everything is in the DB which makes it very easy.
  • SqlACID
    SqlACID over 15 years
    You would think so, but the issues are actually minor; I have an app with millions of files in a one directory, accessed by hundreds of users, without a problem. It's not smart, but it works. The biggest issue is if you use Explorer to browse the directory, you watch a flashlight forever.
  • Jon Cage
    Jon Cage over 15 years
    Security by obscurity is not really an access control strategy!
  • Chan Chiem Jeffery Saeteurn
    Chan Chiem Jeffery Saeteurn over 15 years
    If you were worried about this, it would be easy to use a system similar to DNS, where the root directory has a separate directory underneath it for the first character of the key. To balance disk space (or even load balancing), one can use mount points or links to spread them out.
  • Seun Osewa
    Seun Osewa over 15 years
    It's better to use a filesystem that has no problem with large directories
  • nickf
    nickf over 15 years
    do you mean inside the database?
  • Draemon
    Draemon over 15 years
    easy - you just do mke2fs --go-faster-stripes
  • andrewmu
    andrewmu about 15 years
    Absolutely. Apparently the database is a lot bigger now. Having the data in a database means that replicating the database at different sites is a lot easier too.
  • derobert
    derobert about 15 years
    While I only manage 3TB of files, I definitely agree. Databases are for structured data, not blobs.
  • Calvin
    Calvin about 15 years
    I don't see anyone claiming that a filesystem is faster than a DB 100% of the time (read Mark Harrison's answer). That's a bit of a strawman. There are probably situations in which it's preferable not to wear your seatbelt, but generally speaking, wearing a seatbelt is a good idea.
  • Arafangion
    Arafangion about 15 years
    What is the likelihood of having two simultanious updates to a particular image?
  • Draemon
    Draemon about 15 years
    you don't need simultaneous updates to have problems - it can be a read and a write. In our case this is almost guaranteed to happen.
  • Beep beep
    Beep beep about 15 years
    When dealing with replication, storing the images in the database is far superior IMO.
  • Nils Weinander
    Nils Weinander almost 15 years
    @derobert: quite so, if you will never use a data element in a query, as a condition or for a join, it probably doesn't belong in the database. Then again, if you have a nice database function to query images for likeness...
  • Nico
    Nico almost 15 years
    I had an app with millions of files in one directory (server running RHEL 4) - to even list the directory contents (piping to a file) took days and created an output file 100's of MB in size. Now they are in a database I have a single file that I can move or backup quite easily.
  • Nirvikalpa Samadhi
    Nirvikalpa Samadhi almost 15 years
    its also most useful storing images on the file system. just think if a client phones up asking they cant view an image but they have the image id. much faster locating and viewing the image on the file system rather than from the db (there may be problems with the code).
  • Stu Thompson
    Stu Thompson almost 15 years
    I saw a demonstration of Oracle where the could actually mount a file system to the database, or something like that. Do you know if this is what you did? (Sorry, I am clueless with Oracle so maybe I am talking garbage.)
  • andrewmu
    andrewmu almost 15 years
    I don't think so - it was storing images in the database as a database. The database was aggressively tuned - I remember multiple discussions regarding the size of the images changing as fields were added and removed. Everything was boundary aligned.
  • faceclean
    faceclean almost 15 years
    what off the shelf products are available for "super-accelerating" the file system?
  • Nils Weinander
    Nils Weinander over 14 years
    You can put that like this: don't put the data in a database column if you cannot use it for a where condition or a join. That is unlikely for binary data.
  • puzzledbeginner
    puzzledbeginner over 13 years
    I don't think he's advocating security by obscurity - he's saying that putting images in the DB adds another layer of security. (I think... @Conrad, don't want to put words in your mouth)
  • Marijn Huizendveld
    Marijn Huizendveld over 13 years
    Very nice indeed. Your users can now easily increment your filename to access other files...
  • Guillaume
    Guillaume over 13 years
    @Seun Osewa : every file system has limitations ... and if you know of one that has no problems storing millions of entries in the same directory, please let me know !
  • Seun Osewa
    Seun Osewa over 13 years
    Actually, no, you can. As long as image files are never deleted, changed or over-written once created, all image files are synced before attempting to commit transactions, there is no filesystem corruption, you can be sure that image files and metadata are in sync. For some applications, those are too many ifs, I guess.
  • Seun Osewa
    Seun Osewa over 13 years
    ext3 with the dir_index flag on handles large directories just fine. I have a directory with 288,000 large images. ls > /dev/null takes less than 2 seconds. Ext3 with dir_index stores the directory information in a btree.
  • Seun Osewa
    Seun Osewa over 13 years
    @Richard: Is your single file db-with-images backup less than "100's of mb" in size? Does it take less time to backup than the directory of images?
  • Seun Osewa
    Seun Osewa over 13 years
    I would argue that the database is better for files that are frequently edited, since consistency can be a problem in that case.
  • Nico
    Nico over 13 years
    @Seun Osewa : the database is up to 28GB now, with 5.4 M records. I ended up having to partition the database table so I have several files to back up that are approx 5GB in size.Moving the individual images onto Amazon S3 now so I only have to store the filename in the DB (and Amazon can do the backups)
  • Seun Osewa
    Seun Osewa over 13 years
    @Richard: My image directory is 19GB on a single disk and I have no issues at all. I think your experience proves that the file approach was better. With files, you can do differential backups using rsync, which only copies new files or files that changed since the last backup. Works for me; 19gb and no issues. No partitioning, and no need for Amazon S3. You should go back to it.
  • Alan Donnelly
    Alan Donnelly over 13 years
    Re: "super-accelerating" products: Most web servers can now take advantage of the sendfile() system call to deliver static files asynchronously to the client. It offloads to the operating system the task of moving the file from disk to the network interface. The OS can do this much more efficiently, operating in kernel space. This, to me, seems like a big win for file system vs. db for storing/serving images.
  • Mark Harrison
    Mark Harrison over 13 years
    re "super-accelerating": I'm thinking of products such as those from isilon, emc, netapp, etc, that can be configured to cluster, cache, etc data stored in file systems (in our case, NFS). Here's a presentation I made that discusses some of the issues. It was at a perforce conference so it doesn't go into detail about the database side, but it covers the gist of what we do: maillist.perforce.com/perforce/conferences/us/2009/…
  • Seun Osewa
    Seun Osewa over 13 years
    @Marijn: That's only if you expose the images to the world.
  • scunliffe
    scunliffe over 13 years
    @Seun Osewa - Although I agree with you on use of the FileSystem there can be issues with rsync if data gets corrupted. Ma.gnolia (an online bookmark site/tool) had a devastating blow with rsync vimeo.com/3205188 in their case it killed their live and backup DB's. Likely not so much an issue with images that don't change much (except for add/delete) but a not-so subtle reminder that one should have multiple backups ;-)
  • Bart van Heukelom
    Bart van Heukelom about 13 years
    I chose storing images in the database because of the single backup advantage (or more generally speaking, having all data in one place), but the problems you mention are true as well, which is why I cache the images on the filesystem. It's the best of both worlds, and I'm surprised none of the top answers here mention it.
  • Seun Osewa
    Seun Osewa about 13 years
    ext3's dir_index helps a lot.
  • dhara tcrails
    dhara tcrails almost 13 years
    +1 for FileStream. It actually stores the blobs as files on disk, but manages them transactionally.
  • dhara tcrails
    dhara tcrails almost 13 years
    Also, SQL server allows FileStream blobs to be access directly off of the disk, so that you can avoid tying up the DB connection
  • Andrew Neely
    Andrew Neely almost 13 years
    We did something very similar with our imaged documents (our primary key is a composite key of three items.), but we added the date and time the document was scanned so that we can have multiple versions in the same directory.
  • Andrew Neely
    Andrew Neely almost 13 years
    @Osewa, How's that? Yes, to directly access the file, the end user would need access to the folder. You could have a process to serve the file via FTP based upon request, and the security would be on par with SQL server.
  • Andrew Neely
    Andrew Neely almost 13 years
    I would go even further and say that with a Journaling file system and some additional program logic, the ACID compliance can be achieved. The steps would be write the db record, write the file. If the file commits, commit the db transaction.
  • Andrew Neely
    Andrew Neely almost 13 years
    We have over 10 million imaged documents in our system. It is spread out so that in each sub-folder, there is no more than 60k images (or so.) We have close to a half a terabyte of images and have no issues.
  • Lilith River
    Lilith River over 12 years
    I work with clients (of the ImageResizing.Net library) that store images both ways, and the filesystem is much more scalable and performant. But cloud storage is an much better option for scalability. Also, on Windows, NTFS starts to crawl after 100,000 files, and ASP.NET doesn't like SANs. I've helped get customers with upwards of 5 million images working on Windows, but it can be painful.
  • Lilith River
    Lilith River over 12 years
    Are you, by chance, using the ImageResizing.Net library to handle your SQL->disk image caching? It's the most advanced, scalable, and robust disk cache you can get...
  • Lilith River
    Lilith River over 12 years
    Still, added latency between the DB and the web server... And the web server will have to load it into memory to stream it to the client instead of being able to stream it from disk, unless you're using disk caching.
  • wallyk
    wallyk over 12 years
    @Computer Linguist: When NTFS slows down, defragment file 0, $MFT (the master file table).
  • Deebster
    Deebster over 12 years
    +1 This also allows you to store the original image, delivering the cached/optimised version while allowing the size/compression to be altered later
  • Rajat Gupta
    Rajat Gupta over 12 years
    @Mark Harrison, Does the retrieval performance of the images in the two cases, also depends on the size of the images ? For e.g . If its profile pic of users then is it may be recommended to store in DB ?
  • Rajat Gupta
    Rajat Gupta over 12 years
    @Conrad: What about small sized images ? I believe retrieval performance of the images in the two cases, also depends on the size of the images right? For e.g . If its profile pic of users then will it be recommended to store in DB ?
  • Mark Harrison
    Mark Harrison over 12 years
    @Marcos, yes, you're right. In that case, the convenience of keeping the small image in the same place as the other data about the user outweights the other factors. Especially since the image is probably being accessed at the same time as the other data about the user.
  • Rajat Gupta
    Rajat Gupta over 12 years
    Thanks a lot Mark! Is the performance also improved for small sized images(75*75px) stored in DB, relative to file system. I heard some time back that if documents size if below 1 MB then it may be better to store in DB as compared to FileSystem. Is that true ?
  • Mark Harrison
    Mark Harrison over 12 years
    I think if the images are small enough then the time to serve the data becomes negligible and other factors (such as the convenience of keeping the image data as part of the row) become more important. Of course as in all performance-related questions, it's often a case of experimentation in the particular application/environment to see what works best, but I believe you're thinking along the right track. Good Luck!!