DataTable memory huge consumption

13,414

DataSet and its children DataTable, DataRow, etc. make up an in-memory relational database. There is a lot of overhead involved (though it does make [some] things very convenient.

If memory is an issue,

  • Build domain objects to represent each row in your CSV file with typed properties.
  • Create a custom collection (or just use IList<T> to hold them
  • Alternatively, build a light-weight class with the basic semantics of a DataTable:
    • the ability to select a row by number
    • the ability to select a column within a row by row number and either column name or number.
    • The ability to know the ordered set of column names
    • Bonus: The ability to select a column by name or ordinal number and receive a list of its values, one per row.

Are you sure you need an in-memory representation of your CSV files? Could you access them via an IDataReader like Sebastien Lorion's Fast CSV Reader?

Share:
13,414
JohnnyBravo75
Author by

JohnnyBravo75

Updated on June 05, 2022

Comments

  • JohnnyBravo75
    JohnnyBravo75 almost 2 years

    I´m loading csv data from files into a datatable for processing.

    The problem is, that I want to process several files and my tests with the datatable shows me huge memory consumption I tested with a 37MB csv file and the memory growed up to 240MB, which is way to much IMHO. I read, that there is overhead in the datatable and I could live with about 70MB in size , but not 240MB, which means it is six times the original size. I read here, that datatables need more memory than POCOs, but that the difference is way too much.

    I put on a memory profiler and looked, if I have memory leaks and where the memory is. I found, that the datatablecolumns have between 6MB and 19MB filled with strings and the datatable had about 20 columns. Are the values stored in the columns? Why is so much memory taken, what can I do to reduce memory consumption. With this memory consumption datattables seem to be unusable.

    Had somebody else such problems with datatables, or I´m doing something wrong?

    PS: I tried a 70MB file and the datatable growed up to 500MB!

    OK here is a small testcase: The 37MB csv-file (21 columns) let the memory grow up to 179MB.

        private static DataTable ReadCsv()
        {
            DataTable table = new DataTable();
            table.BeginLoadData();
    
            using (var reader = new StreamReader(File.OpenRead(@"C:\Develop\Tests\csv-Data\testdaten\test.csv")))
            {               
                int y = 0;
                int columnsCount = 0;
                while (!reader.EndOfStream)
                {
                    var line = reader.ReadLine();
                    var values = line.Split(',');
    
                    if (y == 0)
                    {
                        columnsCount = values.Count();
                        // create columns
                        for (int x = 0; x < columnsCount; x++)
                        {
                            table.Columns.Add(new DataColumn(values[x], typeof(string)));
                        }
                    }
                    else
                    {
                        if (values.Length == columnsCount)
                        {
                            // add the data
                            table.Rows.Add(values);
                        }
                    }
    
                    y++;
                }
    
                table.EndLoadData();
                table.AcceptChanges();
    
            }
    
            return table;
        }
    
  • JohnnyBravo75
    JohnnyBravo75 over 10 years
    Yes, but a factor of 6-7 cannot be. There must be a problem in my code otherwise the datatables are unusable crap. Thats why I asked for statistical/experienced values from other users to compare. No database has such an overhead. If I put them in a Sqlite it does not get so big. I wrote my own csv reader, AdoAdapter, XmlReader and designed the application to work with datables in memory, and with a calcaulation of factor two it would work. But this overhead will destroy my design. What is the experience from other users?
  • JohnnyBravo75
    JohnnyBravo75 over 10 years
    1 Alread tried the Silverlight datatable from Telerik (blogs.telerik.com/vladimirenchev/posts/09-04-23/…), this is NOT lightweight, it consumes much more memory. The problem is if you want edit capabilities it cannot be lightweight. Otherwise using a list of dictionary is more lightweight this is my only hope.
  • JohnnyBravo75
    JohnnyBravo75 over 10 years
    2. POCO based is not alternative, because I have no special data. Every data has another structure. 3. Yes thats what I´m evaluating. The problem is, that I need to process the data, build key columns/indexes, sort them.... Perhaps a lightweight database like sqlite is the way to go.
  • Gary Walker
    Gary Walker over 10 years
    That sounds a lot like my option 3 -- There are a couple of database options. A lightweight may be a good choice, or you might need something more scalable, but Databases are pretty much one of the first things you should consider using when your info does not fit conveniently into memory or when you need a persistent store. Its what they do. If you make the change now, the upside is that as you continue to add data you won't just suddenly "hit the wall"
  • Gary Walker
    Gary Walker over 10 years
    Just a thought. Did you call BeginLoadData() on your DataTable -- never compared the overhead personally and I guess I just assumed you would have tried this. Don't forget EndLoadData()
  • JohnnyBravo75
    JohnnyBravo75 over 10 years
    Yes I tried BeginLoadData() and EndLoadData() but they did not makes any difference. I showed a sample code...
  • Gary Walker
    Gary Walker over 10 years
    One last thing, try calling the garbage collector after loading. Lots of unused temp objects might be the issue. Repeat test, but call GC after every 1000 row see if results are the same -- this may be different as the GC may not release memory back to O/S, be forced GC calls might prevent using the extra memory in the first place -- repeated calls to GC definitely not recommended in practice but you would know if this is the real issue.
  • JohnnyBravo75
    JohnnyBravo75 over 10 years
    Hi, also tried, but only got released when datatable is cleared (is needed, only disposing doesn´t help) and disposed, then all memory gets freed.