Using C# to Sync two tables in different databases

10,028

There's two solutions to this problem. You could try to be smart and only transfer the changes - but that requires some integration on the source database; I'm sure your db admin can help you with that - potentially triggers that keep track of all touched rows (e.g. using the source table's primary key). This solution scales fairly well, but it's more complicated. I think you should look at a second option: simple brute force.

By the sounds of it, your entire table fits comfortably in 100MB. This just isn't much data. Assuming you can get a transfer rate of 10MB/s (which isn't very outlandish at all), you could transfer everything in just 10 seconds. If, as you say, you only need a few columns, the total data transfer could be considerably less. Going with the 10 second number every five minutes, that'd be on the order of 3% load to keep the data up to date - and it's a trivial query for the source database what will likely not cause much load, especially since it'll all be cached in memory since the data set is so small.

Take a look at SqlBulkCopy. This article (Transderring data using SqlBulkCopy) is an example of using it to copy data from one database to the next. The source data reader could be anything; I use it to insert computed data from objects, for instance, but a particularly easy case is a DbDataReader, which you can get for the select statement from Postgresql.

Unfortunately, the default options aren't too brilliant, so you'll probably want to specify SqlBulkCopyOptions useful to you. TableLock is probably not a bad one. Also, if you're doing this in parallel (i.e. multiple bulk inserts into one table), beware of indexes (which can cause deadlocks). If you play with the bulk-copies batch size you can optimize the trade-off between throughput and memory usage, though the default may work just fine.

Conceptually then I'd do this:

  • open a connection to your source and target databases (use using)
  • start an SqlTransaction on the target database connection
  • delete all rows from the target table.
  • bulk copy from the source to the target (don't forget to pass the transaction)
  • commit

This way, you'll atomically update the destination table.

I'm not sure what you're trying to do, but if this is some form of caching, consider doing away with the target SQL server entirely and leaving the data in memory as an array of objects. In-memory accesses to read-only data is super fast, and your dataset easily fits in memory.

Share:
10,028
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a remote Postgresql DB with a table (which I can not alter) that contains information (just a few columns not the entire table) I want to sync to a local SQL Server 2008 Express database with a table (which I can alter).

    Now I'm looking for an efficient way to do this. Since the sync will run every ~5 minutes reloading the entire table every time would produce alot of unnecessary traffic I'm trying to avoid.

    I thought about saving the most recent ID and just grabing everything that's newer, but it is possible (not likely but still possible) that the older data gets altered. Just going this way I would miss out on the altered data.

    The sync will be done by a C# program running on the same machine as the local SQL Server.