How to implement background/asynchronous write-behind caching in PHP?

10,131

Solution 1

You should be able to do 200 inserts relatively quickly, but it will depend on lots of factors. If you are using a transactional engine and doing each one in its own transaction, don't - that creates way too much I/O.

If you are using a non-transactional engine, it's a bit trickier. Using a single multi-row insert is likely to be better as the flushing policy of MySQL means that it won't need to flush its changes after each row.

You really want to be able to reproduce this on your production-spec development box and analyse exactly why it's happening. It should not be difficult to stop.

Of course, another possibility is that your inserts are slow because of extreme sized tables or large numbers of indexes - in which case you should scale your database server appropriately. Inserting lots of rows into a table whose indexes don't fit into RAM (or doesn't have RAM correctly configured to be used for caching those indexes) generally gets pretty smelly.

BUT don't try to look for a way of complicating your application when there is a way of easily turning it instead, keeping the current algorithm.

Solution 2

One more solution that you could use (instead of tuning mysql :) ) is to use some JMS server and STOMP connection driver for PHP for write data to database server in a asynchronous manner. ActiveMQ have built-in support for STOMP protocol. And there is StompConnect project which is STOMP proxy for any JMS compilant server (OpenMQ, JBossMQ etc).

Solution 3

You can update your local cache (hopefully memcached) and then push the write requests through beanstalkd.

Solution 4

I would suspect a problem with your SQL inserts - it really shouldn't take that long. Would prepared queries help? Does your mysql server need some more memory dedicated to the keyspace? I think some more questions need asked.

Solution 5

Note that if your table is INSERT-ONLY (no deletes, and no updates on variable-length columns), then inserts will not lock or block reads when using MyISAM.

This may or may not improve insert performance, but it could help if you are having concurrent insert/read issues.

I'm using this, and only purging old records daily, followed by 'optimize table'.

Share:
10,131
Lordn__n
Author by

Lordn__n

I am a software developer from Perth, Western Australia (now living in New York City) with roughly 13 years experience in developing end-to-end solutions in Java, C#, C, C++, Perl, PHP and HTML/CSS/Javascript. I have experience in developing user interfaces (Web and desktop), database design and development (Oracle, SQL Server, MySQL), data modelling, software design, data conversion, Web application development and high-performance high-availability computing. I work for Google in New York City. I have a blog I call C for Coding. Come check it out!

Updated on June 12, 2022

Comments

  • Lordn__n
    Lordn__n almost 2 years

    I have a particular PHP page that, for various reasons, needs to save ~200 fields to a database. These are 200 separate insert and/or update statements. Now the obvious thing to do is reduce this number but, like I said, for reasons I won't bother going into I can't do this.

    I wasn't expecting this problem. Selects seem reasonably performant in MySQL but inserts/updates aren't (it takes about 15-20 seconds to do this update, which is naturally unacceptable). I've written Java/Oracle systems that can happily do thousands of inserts/updates in the same time (in both cases running local databases; MySQL 5 vs OracleXE).

    Now in something like Java or .Net I could quite easily do one of the following:

    1. Write the data to an in-memory write-behind cache (ie it would know how to persist to the database and could do so asynchronously);
    2. Write the data to an in-memory cache and use the PaaS (Persistence as a Service) model ie a listener to the cache would persist the fields; or
    3. Simply start a background process that could persist the data.

    The minimal solution is to have a cache that I can simply update, which will separately go and upate the database in its own time (ie it'll return immediately after update the in-memory cache). This can either be a global cache or a session cache (although a global shared cache does appeal in other ways).

    Any other solutions to this kind of problem?

  • Lordn__n
    Lordn__n over 15 years
    Each field update is either an update if it exists already or an insert if it doesn't. I already know if it exists or not because I've preloaded all the values. This'd solve extensive inserts but the modifications are and will be primarily updates.
  • Lordn__n
    Lordn__n over 15 years
    Thanks for your reply. I've had a look and there are a lot of moving parts: memcached, beanstalkd and assumedly something else to read the work queues and persist the data?
  • Dustin
    Dustin over 15 years
    That is correct. The parts are really simple, though and allow you to greatly scale your app. You'll find yourself wanting to put more stuff through the workers (such as sending out emails, updating search engines, etc..). memcached is orthogonal, but should be everywhere anyway.
  • MarkR
    MarkR over 15 years
    A gross oversimplification (and possibly inaccurate one) of the performance characteristics of mysql engines. Locking the entire table is probably faster than locking rows (because there's only one lock to take).
  • Till
    Till over 15 years
    What? MYISAM locks the entire table, thus making write unavailable during the lock, which leads to a large backlog (= delay). Innodb only locks the row it's working on - other processes can still write on the table.
  • Lordn__n
    Lordn__n over 15 years
    The "premature optimisation" bit is good advice. The tables aren't overly complicated but I've got a couple of things I can check now (indexes and the database engine).
  • Lordn__n
    Lordn__n over 15 years
    Any suggestions on what to use for the background process? Something will need to read from the work queue and process the requests. I've seen some blogs about people using Ruby with beanstalkd, possibly Java too. I'm not keen on having another server in the mix though.
  • Dustin
    Dustin over 15 years
    I often run stuff on the same machine. I don't write any PHP at all, but if you can run a background process that just sits on a queue and pops stuff off to process it, running it all even on the same machine would be beneficial.
  • mjn
    mjn about 15 years
    Also OpenMQ, part of the GlassFish application server (but also runnable without it) is a JMS server which will support Stomp in version 4.4
  • Johnny Graber
    Johnny Graber over 11 years
    How does this make the query in the background?
  • nightograph
    nightograph almost 11 years
    how does it answer the question regarding asynchronous insert?
  • Dane Caswell
    Dane Caswell about 10 years
    cletus you can use UNIQUE INDEX in the database and ON DUPLICATE KEY UPDATE in your queries to solve that problem using this solution. Google for it :)