LMDB files and how they are used for caffe deep learning network

10,089

Solution 1

There is no connection between LMDB files and MS Access files.

As I see it you have two options:

  1. Use the "convert_imageset" tool - it is located in caffe under the tools folder to convert a list of image files and label to lmdb.
  2. Instead of "data layer" use "image data layer" as an input to the network. This type of layer takes a file with a list of image file names and labels as source so you don't have to build a database (another benefit for training - you can use the shuffle option and get slightly better training results)

In order to use an image data layer just replace the layer type from Data to ImageData. The source file is the path to a file containing in each line a path of an image file and the label seperated by space. For example:

/path/to/filnename.png 23

If you want to do some preprocessing of the data without saving the preprocessed file to disk you can use the transformations available by caffe (mirror and cropping) (see here for information http://caffe.berkeleyvision.org/tutorial/data.html) or implement your own DataTransformer.

Solution 2

Caffe doesn't use LevelDB - but it uses LMDB 'Lightning' db from Symas

You can try using this Matlab LMDB wrapper I personally had no experience with using LMDB with Matlab, but there is nice library for doing this from Python: py-lmdb

LMDB database is a Key/Value db (similar to HashMap in Java or dict in Python). In order to store 4D matrices you need to understand the convention Caffe uses to save images into LMDB format.

This means that the best approach to convert images to LMDB for Caffe would be doing this with Caffe.

There are examples in Caffe on how to convert images into LMDB - I would try to repeat them and then modify scripts to use your images.

Share:
10,089
mad
Author by

mad

Updated on June 08, 2022

Comments

  • mad
    mad almost 2 years

    I am quite new in deep learning and I am having some problems in using the caffe deep learning network. Basically, I didn't find any documentation explaining how I can solve a series of questions and problems I am dealing right now.

    Please, let me explain my situation first.

    I have thousands of images and I must do a series of pre-processing operations on them. For each pre-processing operation, I have to save these pre-processed images as 4D matrices and also store a vector with the images labels. I will store this information as LMDB files that will be used as input for the caffe googlenet deep learning.

    I tried to save my images as .HD5 files, but the final file size is 80GB, which is impossible to process with the memory I have.

    So, the other option is using LMDB files, right? I am quite newbie in this file format and I appreciate your help in understanding how to create them in Matlab. Basically, my rookie questions are:

    1- These LMDB files have extension .MDB, right? is this extension the same used by microsoft access? or the right format is .lmdb and they are different?

    2- I find this solution for creating .mdb files (https://github.com/kyamagu/matlab-leveldb), does it create the file format needed by caffe?

    3- For caffe, should I have to create one .mdb file for labels and other for images or both can be fields of the same .mdb file?

    4- When I create an .mdb file I have to label the database fields. Can I label one field as image and other as label? does caffe understand which field means?

    5- what does the function (in https://github.com/kyamagu/matlab-leveldb) database.put('key1', 'value1') and database.put('key2', 'value2') do? Should I have to save my 4-d matrices in one field and the label vector in another?

  • Tal Darom
    Tal Darom almost 9 years
    Caffe can use both LMDB or LevelDB
  • mad
    mad almost 9 years
    Thanks for your answer, now I understand the caffe's file format. But I don't want to use the caffe approach to create LMDB files as I have to store images in folders. I will do a series of pre-processing operations in 245000 images and want to test each of them in googlenet deep network. My best option is doing the pre-processing operations without saving them to disk and create directly the LMDB files.
  • mad
    mad almost 9 years
    Thank you so much for your answer. But both answers don't help me a lot. I don't want to read a list of files from directories as I have 250k images and will try a series of pre-processing operations on them. I don't want to save them in disk, so, what I wanted is to do the pre-processing operations in matlab and save the pre-processed files and labels as LMDB directly. Is that possible?
  • mad
    mad almost 9 years
    Can you help me how to use the image data layer? how is the syntax that I must use in my .prototxt file? I think this is the less expensive solution for me given that it is difficult to generate mdb files as I want.
  • Tal Darom
    Tal Darom almost 9 years
    added some explanations to the answer