Reading a memory mapped block of data into a structure

13,339

Dealing with a memory mapped file is really no different than dealing with any other kind of pointer to memory. The memory mapped file is just a block of data that you can read and write to from any process using the same name.

I'm assuming you want to load the file into a memory map and then read and update it at will there and dump it to a file at some regular or known interval right? If that's the case then just read from the file and copy the data to the memory map pointer and that's it. Later you can read data from the map and cast it into your memory aligned structure and use your structure at will.

If I was you I'd probably create a few helper methods like

data ReadData(void *ptr)

and

void WriteData(data *ptrToData, void *ptr)

Where *ptr is the memory map address and *ptrToData is a pointer to your data structure to write to memory. Really at this point it doesn't matter if its memory mapped or not, if you wanted to read from the file loaded into local memory you could do that too.

You can read/write to it the same exact way you would with any other block data using memcpy to copy data from the source to the target and you can use pointer arithmetic to advance the location in the data. Don't worry about the "memory map", its just a pointer to memory and you can treat it as such.

Also, since you are going to be dealing with direct memory pointers you don't need to write each element into mapped file one by one, you can write them all in one batch like

memcpy(mapPointer, data->entries, sizeof(float)*number)

Which copies float*entries size from data->entries into the map pointer start address. Obviously you can copy it however you want and wherever you want, this is just an example. See http://www.devx.com/tips/Tip/13291.

To read the data back in what you would do is something similar, but you want to explicity copy memory addresses to a known location, so imagine flattening your structure out. Instead of

data:
  int
  char * -> points to some address
  float * -> points to some address

Where your pointers point to other memory elsewhere, copy the memory like this

data:
  int 
  char * -> copy of original ptr
  float * -> copy of original ptr
512 values of char array 
number of values of float array

So this way you can "re-serialize" the data from the memory map to your local. Remember, array's are just pointers to memory. The memory doesn't have to be sequential in the object since it could have been allocated at another time. You need to make sure to copy the actual data the pointers are pointing to to your memory map. A common way of doing this is to write the object straight into the memory map, then follow the object with all the flattened arrays. Reading it back in you first read the object, then increment the pointer by sizeof(object) and read in the next array, then increment the pointer again by arraysize etc.

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct data{
    int size;
    char items[512];
    float * dataPoints;
};

void writeToBuffer(data *input, char *buffer){
    int sizeOfData = sizeof(data);
    int dataPointsSize = sizeof(float) * input->size;

    printf("size of data %d\n", sizeOfData);

    memcpy(buffer, input, sizeOfData);

    printf("pointer to dataPoints of original %x\n", input->dataPoints);

    memcpy(buffer + sizeOfData, input->dataPoints, dataPointsSize);
}

void readFromBuffer(data *target, char * buffer){
    memcpy(target, buffer, sizeof(data));

    printf("pointer to datapoints of copy %x, same as original\n", target->dataPoints);


    // give ourselves a new array
    target->dataPoints =  (float *)malloc(target->size * sizeof(float));

    // do a deep copy, since we just copied the same pointer from 
    // the previous data into our local

    memcpy(target->dataPoints, buffer + sizeof(data), target->size * sizeof(float));

    printf("pointer to datapoints of copy %x, now it's own copy\n", target->dataPoints);
}

int main(int argc, char* argv[])
{
    data test;

    for(unsigned int i=0;i<512;i++){
        test.items[i] = i;
    }

    test.size = 10;

    // create an array and populate the data
    test.dataPoints = new float[test.size];

    for(unsigned int i=0;i<test.size;i++){
        test.dataPoints[i] = (float)i * (1000.0);
    }

    // print it out for demosntration
    for(unsigned int i=0;i<test.size;i++){
        printf("data point value %d: %f\n", i, test.dataPoints[i]);
    }

    // create a memory buffer. this is no different than the shared memory
    char * memBuffer = (char*)malloc(sizeof(data) + 512 + sizeof(float) * test.size + 200);

    // create a target we'll load values into
    data test2;

    // write the original out to the memory buffer
    writeToBuffer(&test, memBuffer);

    // read from the memory buffer into the target
    readFromBuffer(&test2, memBuffer);

    // print for demonstration
    printf("copy number %d\n", test2.size);
    for(int i=0;i<test2.size;i++){
        printf("\tcopy value %d: %f\n", i, test2.dataPoints[i]);
    }

    // memory cleanup

    delete memBuffer;
    delete [] test.dataPoints;

    return 0;
}

You'll probably also want to read up on data alignment when writing data from a struct to memory. Check working with packing structures, C++ struct alignment question, and data structure alignment.

If you don't know the size of the data ahead of time when reading you should write the size of the data into a known position in the beginning of the memory map for later use.

Anyways, to address the fact of whether its right or not to use it here I think it is. From wikipedia

The primary benefit of memory mapping a file is increasing I/O performance, especially when used on large files. ... The memory mapping process is handled by the virtual memory manager, which is the same subsystem responsible for dealing with the page file. Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance. Since page file management is one of the most critical elements of a virtual memory system, loading page sized sections of a file into physical memory is typically a very highly optimized system function.

You're going to load the whole thing into virtual memory and then the OS can page the file in and out of memory for you as you need it, creating a "lazy loading" mechanism.

All that said, memory maps are shared, so if its across process boundaries you'll want to synchronize them with a named mutex so you don't overwrite data between processes.

Share:
13,339
foboi1122
Author by

foboi1122

Just starting out my journey as a programmer. Any help would be appreciated

Updated on June 24, 2022

Comments

  • foboi1122
    foboi1122 about 2 years

    I've been playing around with memory mapping today on VC++ 2008 and I still haven't completely understood how to use it or if it's correct for my purposes. My goal here is to quickly read a very large binary file.

    I have a struct:

    typedef struct _data
    {
        int number;
        char character[512];
        float *entries;
    }Data;
    

    which is written many many times into a file. the "entries" variable is an array of floating point decimals. After writing this file (10000 Data structs with each "entries" array being 90000 floats), I tried to memory map this file with the following function so that I could read the data faster. Here's what I have so far:

    void readDataMmap(char *fname,      //name of file containing my data
                      int arraySize,    //number of values in struct Data
                      int entrySize)    //number of values in each "entries" array
    {
        //Read and mem map the file
        HANDLE hFile = INVALID_HANDLE_VALUE;
        HANDLE hMapFile;
        char* pBuf;
    
        int fd = open(fname, O_RDONLY);
        if(fd == -1){
            printf("Error: read failed");
            exit(-1);
        }
    
        hFile = CreateFile((TCHAR*)fname, 
                           GENERIC_READ,          // open for reading 
                           0,                     // do not share 
                           NULL,                  // default security 
                           OPEN_EXISTING,         // existing file only 
                           FILE_ATTRIBUTE_NORMAL, // normal file 
                           NULL);                 // no template
    
        if (hFile == INVALID_HANDLE_VALUE) 
        { 
            printf("First CreateFile failed"));
            return (1);
        } 
    
        hMapFile = CreateFileMapping(hFile,
             NULL,                    // default security
             PAGE_READWRITE,
             0,                       // max. object size
             0,                    // buffer size
             NULL);                 // name of mapping object
    
        if(hMapFile == ERROR_FILE_INVALID){
            printf("File Mapping failed");
            return(2);
        }
    
        pBuf = (char*) MapViewOfFile(hMapFile,   // handle to map object
                            FILE_MAP_READ, // read/write permission
                            0,
                            0,
                            0);         //Was NULL, 0 should represent full file bytesToMap size
        if (pBuf == NULL)
        {
          printf("Could not map view of file\n");
          CloseHandle(hMapFile);
    
          return 1;
        }
    
        //Allocate data structure
        Data *inData = new Data[arraySize];
        for(int i = 0; i<arraySize; i++)inData[i].entries = new float[entrySize];
    
        int pos = 0;
        for(int i = 0; i < arraySize; i++)
        {
            //This is where I'm not sure what to do with the memory block
        }
    }
    

    At the end of the function, after the memory is mapped and I'm returned a pointer to the beginning of the memory block "pBuf", I don't know what to do to be able to read this memory block back into my data structure. So eventually I would like to transfer this block of memory back into my array of 10000 Data struct entries. Ofcourse, I could be doing this completely wrong...

    • Harry Johnston
      Harry Johnston over 11 years
      It doesn't usually make sense to write a pointer into a file. What is the actual format of the contents of the file?
    • foboi1122
      foboi1122 over 11 years
      @Harry Johnston: the contents of the file is binary set of Data structures
    • Harry Johnston
      Harry Johnston over 11 years
      So the actual floats aren't in the file anywhere? Can you show us the code used to write the file?
    • Thomas Matthews
      Thomas Matthews over 11 years
      Don't try using memcpy into the class/structure because the compiler is allowed to add padding between members which can screw things up. Also remember about Endianness for multibyte quantities. All the more reasons to assign structure members individually from buffers / memory.
  • foboi1122
    foboi1122 over 11 years
    how would I advance the address pointer of the memory block after I use memcpy?
  • devshorts
    devshorts over 11 years
    the pointer to the memory block is always pointing to the start address. To access any later points you just do something like ptr+number and that gives you a pointer number bytes from ptr. Check out some articles on pointer arithemtic: cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/pointer.html
  • devshorts
    devshorts over 11 years
    foboi I updated my answer to help address your concerns about the memory mapped pointer. Hopefully this helps a little