Best way to manage row expiration in mysql

18,340

You didn't mention which version of MySQL you are using. As long as you have 5.1, you have event scheduler.

CREATE EVENT clearExpired
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 2 MINUTES
DO
   UPDATE table SET status=0
    WHERE status <> 0
      AND TIMESTAMPDIFF(SECONDS, NOW(),table.timestamp)>120;

or

CREATE EVENT clearExpired
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 2 MINUTES
DO
   DELETE FROM table
    WHERE TIMESTAMPDIFF(SECONDS, NOW(),table.timestamp)>120;

In this case, you also set a trigger that will copy all the deleted rows into a history table before deleting them.

For measuring efficiency, create a table events (columns: timestamp and action), and generate huge number of rows in your table, then change your event to:

...
DO
   BEGIN
      INSERT INTO events SET action="clearExpired start";
      <insert or delete>
      INSERT INTO events SET action="clearExpired end";
   END

Then you can see how long it takes one way vs. the other.

However, this event will run only once every minute or two, while there will be many more inserts. You want to optimize the actions that happen most frequently.

Share:
18,340
smartcity
Author by

smartcity

Updated on June 20, 2022

Comments

  • smartcity
    smartcity almost 2 years

    An application does the following:

    • writes a row to a table that has a unique ID
    • read the table and find the unique ID and output the other variables (among which the timestamp).

    The question is: the application needs to read only the non-expired rows, which do expire every 2 minutes. There are a few alternatives to accomplish this: which has the best performance?

    Consider that reading the expired rows doesn't matter as it will be done sporadically. The number of non expired rows will always be below a few thousands, but may expect just a few hundreds.

    The cron job (run every minute) (or mysql event schedule) to do this can be one of the following (or any other idea?) based on the timestamp:

    A) add a BOL variable to index the table and then read WHERE is not expired (based on the boll variable of course) B) add a BOL variable to partition the table and then read only the relevant partition (i am new to partitioning so I'm not sure how this may work out) C) read the whole table and delete each row that is expired and then write the same row to another table D) when writing, write two rows contemporarily in two tables and then delete the expired ones on in one table

    OR

    E) not use a cron job at all and check the timestamp on every read. (but why would I scan the whole table? Expired row are basically useless to the application itself)

    Edit 1

    Let me rephrase the question:

    The objective is to retrieve all columns of a row only if the row was written less than 2 minutes ago.

    The table has this structure, but can be entirely redefined

    transactionID CHAR(8) NOT NULL, 
    PRIMARY KEY(transactionID),
    details VARCHAR(416),
    tel INT(10),
    time TIMESTAMP(),
    address VARCHAR(60),
    town VARCHAR(30),
    flat VARCHAR (5),
    supplier SMALLINT()
    

    also supplier is a foreign key

    Indexes are transactionID and eventually "status", an extra column, data type TO_BE_DEFINED_but_probably_SMALLINT_?()

    Solution 1: use an indexed column which indicates the status of the row (expired, active, used)

    Which is achieved running a cron job to change the field value from 1 (active) to 2 (expired) and the query would be the following:

    $transaction = //a POST or GET 
    
    $query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND status = 1");
    
    if(mysql_num_rows($query)== 0){
       echo "Transaction expired";
    }
    else{
    // I will show all the fields in the selected row;
    }
    
    mysqli_close($con);
    

    Solution 2: use a partition based on the timestamp of each row

    Which is achieved running a cron job to partition the column every 2 minutes, but then the query is probably faster:

    $transaction = //a POST or GET 
    
    $query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND PARTITION (active)");
    
    if(mysql_num_rows($query)== 0){
       echo "Transaction expired";
    }
    else{
    // I will show all the fields in the selected row;
    }
    
    mysqli_close($con);
    

    THe cron job would then be similar to this

    $date = date("Y-m-d H:i:s");
    $time = strtotime($date);
    $check = $time - (2);
    
    
    $query = mysqli_query($con,"PARTITION BY RANGE (timestamp_field) 
    (PARTITION active VALUES LESS THAN ($check)),
    ((PARTITION expired VALUES MORE THAN ($check));")
    

    Solution 3: forget all of this and copy the rows with a timestamp older than 2 minutes to another table with a cron job

    Simple to accomplish although the script would be probably heavy on the write side, altough write efficiency to the "expired" table is irrelevant, I want the "active" query to be fast.

    Solution 3B: When adding a row, write it also on another table so that it's only a DELETE function that the cron job had to perform.

    Solution 4: forget all about it and WHERE on the timestamp

    $transaction = //a POST or GET 
    
    $date = date("Y-m-d H:i:s");
    $time = strtotime($date);
    $check = $time - (2);
    
    $query = mysqli_query($con,"SELECT * FROM table WHERE transaction = '$transaction' AND timestamp > $check");
    
    if(mysql_num_rows($query)== 0){
       echo "Transaction expired";
    }
    else{
    // I will show all the fields in the selected row;
    }
    
    mysqli_close($con);