Monday, December 03, 2007 4:37 PM
by
loufranco
Images in Databases Part I: What to store
I've been working on a way to make it easier to automatically attach DotImage classes to images that are stored in a database.
In surveying our customers who use a RDBMS to store images, there seemed to basically two flavors.
- Store actual image in the database in a blob, varbinary or equivalent.
- Store a filename where the actual image is stored (usually an UNC path)
I'm going to run through the various pros and cons as I see them -- I think that both choices are acceptable.
Storing Image contents in the Database
PROS
- Don't have to maintain a separate image store. Everything is in one place.
- With a modern enterprise RDBMS, distributing the data among several physical disks and server machines might be automatically handled for you.
CONS
- Determining database sizing requirements can be tricky.
- Depending on the specific RDBMS you have, it won't be easy to split the database across multiple physical disks, and your database might grow quickly.
- The entire stream must be brought to the client process, even if only a portion is needed (this is because image decoding is not a forward-only stream read).
Storing filenames in the Database
PROS
- Easy to partition the images among several servers, even making that decision dynamically based on the image.
- Easy to save redundant copies on multiple servers.
- Easy to figure out size requirements of the database (as long as all other fields are not varbinary)
- Can use ACLs for image level security
- Your backup software might be able to handle incremental backups of the individual files better than it understands how to backup the database. This is assuming that you don't have RDBMS-aware backups.
CONS
- Images and associated business data are in two different places.
- If authentication and authorization is kept only in the database, then it will be hard to coordinate with the filesystem.
- Database operations need to be kept in sync with file operations and transactions that involve the image will be manual.
Store Metadata Separately
No matter what you do, a clear win is to store some image metadata separately in their own columns. This is the equivalent of de-normalizing the image data, since the information will be stored twice, and your extra copies need to be kept in sync with the images if they are updated.
If you go this route, the most important metadata to store is the number of frames in the image (for multi-framed image types like TIFF and PDF). The reason is that the number of total images is usually needed before the streams are needed -- for instance to layout the images. Similarly, sometimes it's nice to have pixel heights and widths available. If your layout algorithm determines that an image is not visible, the stream won't need to be read at all.
Another reason to store image metadata separately is that then you can use the information to filter, sort and group images. For instance, in SQL, there is no way to SELECT images with widths less than 100 pixels by reading the blob data, decoding it, and finding out the size.