Friday, July 19, 2013

MySQL: ALTER TABLE Gotcha When It Comes to Large Tables

Maybe you already know this, but I sure didn't, and it came around to bite me in a big way. When you use ALTER TABLE in MySQL, MySQL first copies the original table, modifies the copy, and then renames it to the original table's name. I'll repeat an important part of that sentence: it copies the original table. So, who cares? Well, if you've got a really big table, then maybe you do!

I ran into this little fun fact one night in the middle of a code deployment when trying to run a seemingly harmless database migration to add a column to our Really Big Table. This particular table had been consuming mass quantities of space in our db, threatening to consume a small town if we didn't feed it more storage. The main reason it was growing so large is that each row contained a column that stored an XML dump that was anywhere from 0 to 60MB. (Don't ask...) We'd decided that the solution was to add a column for storing a file name, move the XML out to the file system, and then store the new file's name in the newly-created column. Easy peasy, right? Wrong.

So, there I was, running the migration, and it was taking forever. I decided to do some research on the ALTER TABLE syntax (hindsight is 20/20), and I discovered this:

Why an “alter table” query takes so long time?

Oh, joy. The great irony in all of this is that the point of this change was to reduce the size of the db, but the result of attempting to add the column was to further grow our ibdata1 file so much it used up the rest of our available diskspace. We had to get the sys admin involved to find us another drive to mount and archive some files to get storage back. Very stressful.

Anyway, this article wasn't really about posing a good solution. Unfortunately, I don't really have one, if you need to keep all your existing data. Lucky for us, we really didn't need all of the rows in that particular table, as they're supposed to be purged by the app when they're no longer useful (but it simply wasn't happening). Our solution ended up being to delete the rows we didn't need (which dropped the table size from about 200GB to just over 1GB) which made things more manageable. I just  wanted to post this so that others wouldn't run into the problem at an inopportune moment like I did.

Afterthought: if you do need to keep the data, one thing you might try (and I haven't) would be to use a CREATE TABLE LIKE sort of syntax to make a new table just like the old one (with no data and a new name). Then you could alter THAT (empty) table. Now you have an empty table into which to copy the data, 1000 rows at a time, deleting the rows from the other table as you go. This might take a while, but limiting the operation to 1000 rows or less should speed it up. Unfortunately, if you're not using a data file per table, your ibdata1 file is going to continue to grow. Shrinking that file will be the subject of another post soon.

Friday, November 16, 2012

PHP is_int() vs. is_numeric()

These two functions seem similar, don't they? In fact, is_int() seems like a more specific version of is_numeric(). You may, for example, want to make sure that the ID of a blog post that is being passed through the query string is not only numeric but, specifically, an integer. After all, 7.145 is numeric, but you want to make sure the incoming value hasn't been tinkered with and is simply a 7. You'd be tempted to do something like the following:

if(isset($_GET['blog_id']) && is_int($_GET['blog_id'])) {
     // do something with the blog ID
}

Unfortunately, this won't work. It turns out that is_int() will return false when the value passed to it is a string, whereas is_numeric is perfectly capable of evaluating a string and deciding whether it's something numeric. So, even if the value of blog_id as passed through the query string is a 7, this will return false.

"Wait... What?" you may be saying to yourself, right now. Yes, you read that correctly. It will be false, even if it's a 7. That's because (and here's the important bit) anything passed in the query string is considered by PHP to be a string. Even though there are no quotes around it, even though, to you and me, it's a number 7, PHP reads that as a string. This goes for all superglobals, in fact. In short, anything that is accessed through a '$_<something or other>' will return as a string: $_POST, $_SESSION, you get the idea.

One way around this might be to cast the value as an int, like so:

(int)$_GET['blog_id']

Or you could settle for is_numeric(), knowing that it will be fine with scientific notation, floats, etc. But that might be enough to make sure the value is safe, even if it doesn't match.

The two main takeaways for me, here, are that:

  1. Values stored in superglobals translate as strings in PHP.
  2. is_numeric() can reliably check to see whether strings are numeric or not, whereas is_int() will return false for strings, even if it looks like an integer.
Hope this helps someone!

Sunday, September 9, 2012

Getting the 'protected' Directory's Parent in Yii

I needed the directory just above "protected." Here's how you do it:


Yii::getPathOfAlias('webroot')

Accessing a Module from Anywhere

I recently had the need to access the configuration variables for a module from within a controller that didn't own the module. It was a little frustrating that I was inside a widget that was a component of the module. Someone new to Yii or to MVC might assume, "Hey! I'm in the modules own widget. Why can't I get to the variables?" But think of it a bit like a stream: data moves down from controllers, which do the processing bits, and flow to relatively "dumb" things, like views which are only supposed to display, you know, stuff. If things were to move the other direction, it would like the stream getting backed up or even polluted. So, we keep things flowing in the right direction, letting the smarter parts of the app pass data down to the intentionally less smart parts.

All that being said, I *could* have passed the values to the widget - which is basically a view - but this was already a relatively poorly designed 3rd party module, and I just wanted to move forward. (This is also a very useful module that provides a lot of functionality that I'd prefer not to write myself, so I'm making modifications to get it to work a little better.) This meant finding some other way to gain access to some variables that had been set in the config file for the module. This is done like so:


// other Yii confi things go here, like components, import, etc

'import' => array(

      ...

),

'components' => array(

     ...

),

// and here is where the module goes

'modules' => array(

     'moduleName' => array(

          'moduleVar1' => 'value',

          'moduleVar2' => 'otherValue',

          'moduleArray' => array(
               'key' => 'val'
         ),

     ),

// More things happen here


In this example, I wanted to do something like find out what the value of "moduleVar1" was. There's a way to get at these sorts of things by basically starting from the top (the application level) and working your way down by using Yii::app(). Here, for example, is how you can see which modules are loaded across the entire application:


print_r(Yii::app()->getModules());


To get a specific module, you do this:

Yii::app()->getModule('moduleName');


To get the value for a variable from that module, you'd do this:

Yii::app()->getModule('moduleName')->moduleVar1


And actually, I had an array of values called 'config' like 'moduleArray' above, in which case I needed to get the value of a key of that array, like so:

Yii::app()->getModule('moduleName')->moduleArray['key']


Hope this helps someone.