In python, how can I load a sqlite db completely to memory before connecting to it?

12,723

Solution 1

apsw is an alternate wrapper for sqlite, which enables you to backup an on-disk database to memory before doing operations.

From the docs:

###
### Backup to memory
###

# We will copy the disk database into a memory database

memcon=apsw.Connection(":memory:")

# Copy into memory
with memcon.backup("main", connection, "main") as backup:
    backup.step() # copy whole database in one go

# There will be no disk accesses for this query
for row in memcon.cursor().execute("select * from s"):
    pass

connection above is your on-disk db.

Solution 2

  1. Get an in-memory database running (standard stuff)
  2. Attach the disk database (file).
  3. Recreate tables / indexes and copy over contents.
  4. Detach the disk database (file)

Here's an example (taken from here) in Tcl (could be useful for getting the general idea along):

proc loadDB {dbhandle filename} {

    if {$filename != ""} {
        #attach persistent DB to target DB
        $dbhandle eval "ATTACH DATABASE '$filename' AS loadfrom"
        #copy each table to the target DB
        foreach {tablename} [$dbhandle eval "SELECT name FROM loadfrom.sqlite_master WHERE type = 'table'"] {
            $dbhandle eval "CREATE TABLE '$tablename' AS SELECT * FROM loadfrom.'$tablename'"
        }
        #create indizes in loaded table
        foreach {sql_exp} [$dbhandle eval "SELECT sql FROM loadfrom.sqlite_master WHERE type = 'index'"] {
            $dbhandle eval $sql_exp
        }
        #detach the source DB
        $dbhandle eval {DETACH loadfrom}
    }
}

Solution 3

If you are using Linux, you can try tmpfs which is a memory-based file system.

It's very easy to use it:

  1. mount tmpfs to a directory.
  2. copy sqlite db file to the directory.
  3. open it as normal sqlite db file.

Remember, anything in tmpfs will be lost after reboot. So, you may copy db file back to disk if it changed.

Solution 4

Note that you may not need to explicitly load the database into SQLite's memory at all. Simply prime your operating system disk cache by copying it to null.

Windows: copy file.db nul:
Unix/Mac:  cp file.db /dev/null

This has the advantage of the operating system taking care of memory management, especially discarding it if something more important comes along.

Share:
12,723
relima
Author by

relima

At UIUC, living in Champaign-Urbana.

Updated on June 11, 2022

Comments

  • relima
    relima almost 2 years

    I have a 100 mega bytes sqlite db file that I would like to load to memory before performing sql queries. Is it possible to do that in python?

    Thanks

  • relima
    relima over 13 years
    I like your solution but there is only one problem, I use a lot of row_factory feature of pysqlite; and it seems that apsw does not have this feature.
  • relima
    relima over 13 years
    This has really solved my problem. My queries are MUCH faster now.
  • relima
    relima over 13 years
    import apsw mem_db_loader=apsw.Connection(file_sqlite_db) connection=apsw.Connection(":memory:") connection.backup("main", mem_db_loader, "main").step() cursor = connection.cursor()
  • relima
    relima over 13 years
    It may be only my computer, but this technique didn't really improve my performance. (Win 7 x64, 8gb ram).
  • Roger Binns
    Roger Binns over 13 years
    It has worked for many other people on the SQLite mailing list in the past especially after a machine has just booted as it primes the file system cache. In your case it is most likely that file didn't end up in the file system cache. (Some copy tools tell the OS to bypass the cache so that they don't throw out existing "good" content in it.)
  • Peter Rust
    Peter Rust over 12 years
    The "nul:" trick didn't work for me on Win7, but a real copy (to temp.db) does. It's a little annoying b/c I have to delete the temp file to prevent taking excessive space on the HD, but it gets the file into the disk cache (makes the 1st query just as fast as subsequent queries).
  • kxr
    kxr almost 7 years
    when anyway in a programming language (Python), you could just dummy-read the whole file before doing work. Any experience how this cache priming performs vs the :memory: backup method?