Best way to archive MySQL tables data (huge data)

15,622

It is unclear what you are asking.

"Archive" means to copy the data somewhere else, then delete it from here.

"Backup" means to copy the data elsewhere, but keep it here.

You say you want to "store the data", then "delete" it and use it for "historical...". If it is deleted, how can you use it? Maybe you mean that you have "archived" it out of the main table into another table?

Case 1: Backup daily, archive or delete after 30 days. See Partitioning and Backup.

Case 2: Keeping only one day. Well, one way is to use Case 1, but do things hourly instead if daily.

Case 3: You have two tables: The current 30 days, plus a bigger table of older data. This might involve partitioning and "transportable table spaces".

Partitioning: See http://mysql.rjweb.org/doc.php/partitionmaint (see Cases 1 and 4)

Backup: See mysqldump (in the manual) or Xtrabackup (Percona.com)

Note that it is possible to do

INSERT INTO db2.History
    SELECT * FROM db1.Current
        WHERE ...

That would copy rows from one table in one database to a table in another database. However partitioning and using "transportable tablespaces" to move on partition would be a lot faster.

(My opinion:) 3 million rows (86th percentile) in a month is "medium" sized. After a year (94th percentile), I might call it "large", but not yet "huge".

(after Comment...)

Plan A - zero disruption:

Probably pt-archiver would be the core tool for removing months-old data to some other machine. (See also http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks )

Plan B - convert to partitioning:

This involves a major one-time disruption to add partitioning to the table. The benefit is that carving off a month's data has virtually zero impact.

Decide on how much time before archiving, split on weeks or months, such that you have about 20-50 partitions. Then us "transportable tablespaces" to break off the oldest partition to move to another machine. And REORGANIZE PARTITION to add a new partition each week or month. Some details are in the first link above.

Note that you need to rethink the indexes when adding (or removing) PARTITIONing. Suggest you experiment with a large set of data to make sure you get the indexing optimal. (If you want to discuss it here, start a new Question and include the main queries, plus SHOW CREATE TABLE now and after adding PARTITION BY RANGE(..).)

Share:
15,622
Mukesh Kumar Singh
Author by

Mukesh Kumar Singh

Updated on June 04, 2022

Comments

  • Mukesh Kumar Singh
    Mukesh Kumar Singh almost 2 years

    So Here is the story .

    Table pattern

    item ` `on_date` `value1` `value2` `value3` `value4` ... `value10`   
    

    All the values are important parameters in my table which are stored every day for each item. Every day I capture data for 100,000 rows , which in turn becomes approx 3,000,000 rows in a month .

    The table is fine as the indexing and keys are created to handle the data and I have no issues because I keep deleting the data after 30 days.

    Now , I would like to store the data that I delete in a MySQL table which would be purely used for historical data analysis.

    So, I tried a technique to concatenate all the columns into 1 column and store in a table with pipe seperated | .

    item | on_date |value1 | value2 .....| value10  
    

    But this din't solve the problem and the data was huge.

    I browsed through the community and MySQL documentation with nothing much of help . Kindly suggest me what could be the best pattern or technology to solve my problem .

    • theMayer
      theMayer over 6 years
      Is there a specific concern here besides size of the data to be stored?
    • Sergio Tulentsev
      Sergio Tulentsev over 6 years
      I'd suggest that you move archived records to a separate database/server. Data size will still be huge, but at least, archived records will not take space in main server's RAM.
    • Mukesh Kumar Singh
      Mukesh Kumar Singh over 6 years
      @SergioTulentsev I am already doing the same , but I am looking for more optmized way of storing historical data , as this use case is required in more cases now.
    • Mukesh Kumar Singh
      Mukesh Kumar Singh over 6 years
      @theMayer no , only size is the issue. I just want to store the data . I would require it once in a while , and do some analysis using R .
    • Namrata Das
      Namrata Das over 6 years
      I don't see a problem with keeping this in mysql as-is. It's not gonna take up RAM if it's not actively used.
    • Mukesh Kumar Singh
      Mukesh Kumar Singh over 6 years
      @Evert size is the issue . I have just given a example , there are few more such tables .
    • Namrata Das
      Namrata Das over 6 years
      How big are we talking (in gigabytes). And why is it an issue?
    • Avatar
      Avatar about 4 years
  • Nicholas
    Nicholas over 6 years
    Hi Rick, how would you approach. Live DB with X months of data and an archive DB will all data? I want to keep BinLogging for live DR, but I need a way to archive to another database. I may also introduce partitions (your index and INNODB buffer size advice has been perfect) on the live DB to optimise queries (if needed). I need something that does not disrupt live so thinking pt-archiver may be good? Happy to ask a SE question.
  • Rick James
    Rick James over 6 years
    @Nicholas - I responded in my Answer.
  • Mukesh Kumar Singh
    Mukesh Kumar Singh almost 4 years
    Partioning the tables solved my problem .
  • Rakmo
    Rakmo over 3 years
    @RickJames regarding my question that I asked separately, I reached here somehow. Looking for how to use transportable tablespaces. Can you please elaborate on the steps?
  • Rick James
    Rick James over 3 years
    @Rakmo - My partitionmaint link has a couple of links to info on Transportable Tablespaces. Be aware that 5.6 does it one way; 5.7 simplified it slightly.