MySql - WAMP - Huge Table is very slow (20 million rows)

10,145

Solution 1

Some answers:

  • 20 million rows is well within the capability of MySQL. I work on a database that has over 500 million rows in one of its tables. It can take hours to restructure a table, but ordinary queries aren't a problem as long as they're assisted by an index.

  • Your laptop is pretty out of date and underpowered to use as a high-scale database server. It's going to take a long time to do a table restructure. The low amount of memory and typically slow laptop disk is probably constraining you. You're probably using default settings for MySQL too, which are designed to work on very old computers.

  • I wouldn't recommend using TEXT data type for every column. There's no reason you need TEXT for most of those columns.

  • Don't create an index on every column, especially if you insist on using TEXT data types. You can't even index a TEXT column unless you define a prefix index. In general, choose indexes to support specific queries.

You probably have many other questions based on the above, but there's too much to cover in a single StackOverflow post. You might want to take training or read a book if you're going to work with databases.
I recommend High Performance MySQL, 2nd Edition.


Re your followup questions:

For MySQL tuning, here's a good place to start: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

Many ALTER TABLE operations cause a table restructure, which means basically lock the table, make a copy of the whole table with the changes applied, and then rename the new and old tables and drop the old table. If the table is very large, this can take a long time.

A TEXT data type can store up to 64KB, which is overkill for a phone number or a state. I would use CHAR(10) for a typical US phone number. I would use CHAR(2) for a US state. In general, use the most compact and thrifty data type that supports the range of data you need in a given column.

Solution 2

It's going to take a long time because you've only got 2GB RAM and 6GB of data/indexes and it's going to force a ton of swapping in/out between RAM and disk. There's not much you can do about that, though.

You could try running this in batches.

Create a separate empty table with the auto_increment column included in it. Then insert your records a certain amount at a time (say, 1 state at a time). That might help it go faster since you should be able to handle those smaller datasets completely in memory instead of paging to disk.

You'll probably get a lot better responses for this if it's on dba.stackexchange.com also.

Share:
10,145
Kevin
Author by

Kevin

Updated on August 05, 2022

Comments

  • Kevin
    Kevin almost 2 years

    So I posted this! yesterday and got a perfect answer, which required running this code first: ALTER TABLE mytable AUTO_INCREMENT=10000001;

    I ran it several times, but restarted WAMP after a couple of hours of it not working. After running overnight (12 hours), the code still hadn't run.

    I am wondering if my database table size is past the limits of mysql or my computer or both.

    However, I have a sneaky suspicion that proper indexing or some other factor could greatly impact my performance. I know 20 million is a lot of rows, but is it too much?

    I don't know much about indexes, except that they are important. I attempted to add them to the name and state fields, which I believe I did successfully.

    Incidentally, I am trying to add a unique ID field, which is what my post yesterday was all about.

    So, the question is: Is 20 million rows outside the scope of MySql? If not, am I missing an index or some other setting that would help better work with this 20 million rows? Can I put indexes on all the columns and make it super fast?

    As always, thanks in advance...

    Here are the specs:

    My PC is XP, running WAMPSERVER, Win32 NTFS, Intel Duo Core, T9300 @ 2.50GHz, 1.17 GHz, 1.98 GB or RAM

    DB: 1 table, 20 million rows The size of the tables is: Data 4.4 Gigs, Indexes 1.3 Gigs, Total 5.8 Gigs

    The indexes are set up on the 'BUSINESS NAME' and 'STATE' fields

    The table fields are like this:

    `BUSINESS NAME` TEXT NOT NULL, 
    `ADDRESS` TEXT NOT NULL, 
    `CITY` TEXT NOT NULL, 
    `STATE` TEXT NOT NULL, 
    `ZIP CODE` TEXT NOT NULL, 
    `COUNTY` TEXT NOT NULL, 
    `WEB ADDRESS` TEXT NOT NULL, 
    `PHONE NUMBER` TEXT NOT NULL, 
    `FAX NUMBER` TEXT NOT NULL, 
    `CONTACT NAME` TEXT NOT NULL, 
    `TITLE` TEXT NOT NULL, 
    `GENDER` TEXT NOT NULL, 
    `EMPLOYEE` TEXT NOT NULL, 
    `SALES` TEXT NOT NULL, 
    `MAJOR DIVISION DESCRIPTION` TEXT NOT NULL, 
    `SIC 2 CODE DESCRIPTION` TEXT NOT NULL, 
    `SIC 4 CODE` TEXT NOT NULL, 
    `SIC 4 CODE DESCRIPTION` TEXT NOT NULL 
    
  • Kevin
    Kevin over 12 years
    Thanks for the comments. Actually, I've been using mysql/php for many many years, only I've never had to work with a table this big. Most of the tables I've had to deal with in the past have been a million rows or less. That said, much to learn still for sure.
  • Kevin
    Kevin over 12 years
    If not using 'text' data type, what is the standard data type that you would recommend for a phone number or address field...something that could be alpha numeric, but isn't an integer.
  • Kevin
    Kevin over 12 years
    I have changed the settings for php.ini/mysql.ini as follows: post_max_size = 750M upload_max_filesize = 750M max_execution_time = 5000 max_input_time = 5000 memory_limit = 1000M max_allowed_packet = 200M (in my.ini) Are there other obvious changes I should make? If so, could you point me in the right direction? Lastly, I'm not having problems with ordinary queries (select * from table where field = x;), it was the adding of the index field, adding a new field that seems to have crushed my PC. Is this the 'restructuring the table' that you are referring to that could take hours?
  • ypercubeᵀᴹ
    ypercubeᵀᴹ over 12 years
    You really don't want to store phone numbers in int fields.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ over 12 years
    It's still wrong. For a big company like Facebook, I guess it's even more difficult to change it now to a datatype that makes sense. As they'd have to change all the code, too, that treats them as int.
  • Gustav
    Gustav over 12 years
    Every phone-number can be written in digits so there is simply no other more space efficient way of storing it.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ over 12 years
    In digits, ok. If I'm not wrong, letters have been used besides digits in certain areas of the world in the past. Even more, if we assume only digits, how will we distinguish between 0035712345678, 035712345678 and 35712345678 if we use int ? How do we store 001234567(internal 66) ?
  • Gustav
    Gustav over 12 years
    Letters can be translated into digits. So you must store any phone-number from the global root level. Internal numbers needs meta data anyway so it would not help using alpha-chars. People write phone numbers in so many stupid ways that's just data suicide to have characters in there. You are gonna end up with catalog only readable to humans, not parsable for computers. If you ever are gonna have an auto-dialer connected to this system, you will have to manually correct all bad formated numbers.
  • ypercubeᵀᴹ
    ypercubeᵀᴹ over 12 years
    I agree, stroing from the global root level is the only solution. What we disagree is that it is a solution to a problem created by the decision to use int for something that should be stored as char. Ints are to be used when one wants to do additions, multiplications, divisions, etc. Phone "numbers" are not added or substracted. They have sometimes prefixes added or removed, operations that are handled better with string functions.
  • KeitelDOG
    KeitelDOG over 5 years
    Your answer is almost good. But as crazyhat said, Int type is for making calculation like it could be for age, year, number of children, price, quantity etc. But phone is just a unique code in which they use Digits so any ordinary people can use it easier. Instead EDIT your answer so that programmers vote it up and help others find better answers.