Friday, July 19, 2013

MySQL: ALTER TABLE Gotcha When It Comes to Large Tables

Maybe you already know this, but I sure didn't, and it came around to bite me in a big way. When you use ALTER TABLE in MySQL, MySQL first copies the original table, modifies the copy, and then renames it to the original table's name. I'll repeat an important part of that sentence: it copies the original table. So, who cares? Well, if you've got a really big table, then maybe you do!

I ran into this little fun fact one night in the middle of a code deployment when trying to run a seemingly harmless database migration to add a column to our Really Big Table. This particular table had been consuming mass quantities of space in our db, threatening to consume a small town if we didn't feed it more storage. The main reason it was growing so large is that each row contained a column that stored an XML dump that was anywhere from 0 to 60MB. (Don't ask...) We'd decided that the solution was to add a column for storing a file name, move the XML out to the file system, and then store the new file's name in the newly-created column. Easy peasy, right? Wrong.

So, there I was, running the migration, and it was taking forever. I decided to do some research on the ALTER TABLE syntax (hindsight is 20/20), and I discovered this:

Why an “alter table” query takes so long time?

Oh, joy. The great irony in all of this is that the point of this change was to reduce the size of the db, but the result of attempting to add the column was to further grow our ibdata1 file so much it used up the rest of our available diskspace. We had to get the sys admin involved to find us another drive to mount and archive some files to get storage back. Very stressful.

Anyway, this article wasn't really about posing a good solution. Unfortunately, I don't really have one, if you need to keep all your existing data. Lucky for us, we really didn't need all of the rows in that particular table, as they're supposed to be purged by the app when they're no longer useful (but it simply wasn't happening). Our solution ended up being to delete the rows we didn't need (which dropped the table size from about 200GB to just over 1GB) which made things more manageable. I just  wanted to post this so that others wouldn't run into the problem at an inopportune moment like I did.

Afterthought: if you do need to keep the data, one thing you might try (and I haven't) would be to use a CREATE TABLE LIKE sort of syntax to make a new table just like the old one (with no data and a new name). Then you could alter THAT (empty) table. Now you have an empty table into which to copy the data, 1000 rows at a time, deleting the rows from the other table as you go. This might take a while, but limiting the operation to 1000 rows or less should speed it up. Unfortunately, if you're not using a data file per table, your ibdata1 file is going to continue to grow. Shrinking that file will be the subject of another post soon.

No comments:

Post a Comment