Disk backed dictionary/cache for c#

17,339

Solution 1

What you really want is a B-Tree. That's the primary data structure that a database uses. It's designed to enable you to efficiently swap portions of a data structure to and from disk as needed.

I don't know of any widely used, high quality standalone B-Tree implementations for C#.

However, an easy way to get one would be to use a Sql Compact database. The Sql Compact engine will run in-process, so you don't need a seperate service running. It will give you a b-tree, but without all the headaches. You can just use SQL to access the data.

Solution 2

Disclaimer - I am about to point you at a product that I am involved in.

I'm still working on the web site side of things, so there is not a lot of info, but Serial Killer would be a good fit for this. I have examples that use .Net serialization (can supply examples), so writing a persistent map cache for .Net serializable objects would be trivial.

Enough shameless self promotion - if interested, use the contact link on the website.

Solution 3

This is very similar to my question

Looking for a simple standalone persistant dictionary implementation in C#

I don't think a library that exactly fits what you want exists, maybe its time for a new project on github.

Solution 4

Here is a B-Tree implementation for .net: http://bplusdotnet.sourceforge.net/

Share:
17,339

Related videos on Youtube

Tjkoopa
Author by

Tjkoopa

For me, fun is taking a program, language, whatever and making it do things it's designer never envisioned it doing. It comes in handy when I'm asked to do something a little outside the norm as part of my job.

Updated on April 30, 2022

Comments

  • Tjkoopa
    Tjkoopa almost 2 years

    I'm looking for a drop in solution for caching large-ish amounts of data.

    related questions but for different languages:

    Close question in different terms:

    I don't need (or want to pay anything for) persistence, transactions, thread safety or the like and want something that is not much more complex to use than a List<> or Dictionary<>.

    If I have to write code, I'll just save everything off as files in the temp directory:

    string Get(int i)
    {
       File.ReadAllText(Path.Combine(root,i.ToString());
    }
    

    In my cases in index will be an int (and they should be consecutive or close enough) and the data will be a string so I can get away with treating both a POD and would rather go ultra-light and do exactly that.

    The usage is that I have a sequence of 3k files (as in file #1 to #3000) totaling 650MB and need to do a diff for each step in the sequence. I expect that to total about the same or a little more and I don't want to keep all that in memory (larger cases may come along where I just can't).


    A number of people have suggested different solutions for my problem. However none seem to be targeted at my little niche. The reasons that I'm looking at disk backed caching is because I'm expecting that my current use will use up 1/3 to 1/2 of my available address space. I'm worried that larger cases will just flat run out of space. I'm not worried about treading, persistence or replication. What I'm looking for is a minimal solution using a minimum of code, a minimal usage foot print, minimal in memory overhead and minimum complexity.

    I'm starting to think I'm being overly optimistic.

    • D'Arcy Rittich
      D'Arcy Rittich over 15 years
      Why not the filesystem? That is what is designed for...
    • Tjkoopa
      Tjkoopa over 15 years
      I will if someone can't point me to something better that is already written.
    • Tjkoopa
      Tjkoopa over 15 years
      Depends on where the data is coming from. If it's memoizing expensive computed value or data coming over a slow internet link it can be more than reasonable.
    • Nikhil
      Nikhil over 14 years
      I just ran into a similar problem... Still hunting for a solution
  • Tjkoopa
    Tjkoopa over 15 years
    +1 for related stuff but I'm looking more for ultra-light solutions (ideal would be where key & values are both POD and get stored as binary data blocks)
  • Tjkoopa
    Tjkoopa over 15 years
    I'm not liking the overhead. See my edits but I could get away with a single in memory array look up and a single disk read per load so the B-Tree is overkill... in my case.
  • Tjkoopa
    Tjkoopa over 15 years
    The naive, probably buggy and extendability version of what I'm looking for (skipping the eviction policy stuff) could be done in about 30 LOC. I'd be impressed if you could get even half your feature list in nder that.
  • Scott Wisniewski
    Scott Wisniewski over 15 years
    One advantage to using the in-proc DB is that it gives you access path independence. When you need to change what you data you store, or what keys you need to access it, you don't need to re-write a big chunk of your app
  • Scott Wisniewski
    Scott Wisniewski over 15 years
    However, if you really feel that the stuff you need to do with the data is that simple, then I would think you could something from scratch that used Dictionary(of int, string), where the string was a file name, in about 2-3 hours of work....
  • Tjkoopa
    Tjkoopa over 15 years
    skipping of iteration and recursion (unneeded in this case) LOC ~ execution time (for some values of LOC :)
  • Daniel Paull
    Daniel Paull over 15 years
    I disagree. Overcoming issues of fragmentation in the file system and caching strategies greatly affect execution time, so performance may be inversely proportional to LOC!
  • Tjkoopa
    Tjkoopa over 15 years
    looks like a neat project. OTOH it looks like overkill for me.
  • Tjkoopa
    Tjkoopa over 15 years
    Added link. How about you add a link the other way?
  • Tjkoopa
    Tjkoopa over 15 years
    OTOH the motivation is different. you were looking for persistence, I'm wanting to store stuff on disk rather than in memory. Large overlap, but not quite the same.
  • Sam Saffron
    Sam Saffron over 15 years
    No worries, I added a link from my post