This fix has measurably helped our db performance:
Expunging the cache of dropped table pages used to be one of the largest causes of blocking in our db's. It used to be right up there with the blocking caused by ROLLBACKs (see my earlier post). We've now been running the Percona build of 5.1.47 for a while, and things are way better. In fact, purging the page hash (the data structure that lets the db look up where a given page resides in the cache) of deleted table page entries now causes more blocking than cleaning up the cache itself.
Thursday, July 29, 2010
ROLLBACKs are the number one bottleneck in some of our databases. ROLLBACKs are bad because they grab the global semaphore on the data dictionary while each row is deleted. First of all, this means ROLLBACKs are not scalable since only one at a time can get any work done. Second, many different paths through the Innodb code need the global data dictionary semaphore. So a ROLLBACK not only blocks other ROLLBACKs, it blocks all sorts of unrelated activity in the db.
The code that causes the problem in the Percona build of 5.1.47 is in storage/innodb_plugin/row0undo.c. The comment before the lock is grabbed says "Prevent DROP TABLE etc. while we are rolling back this row". It is very unfortunate that this is implemented by locking down the ENTIRE data dictionary when only ONE table (usually) needs to be protected from schema changes.
Sunday, July 25, 2010
I have a lot of Linux "md" RAID-1 arrays in my set-up. Every so often, one of the two drives in a RAID-1 starts behaving badly and I want to replace it. The "bad" drive hasn't stopped working outright, but it's causing I/O errors to be logged in /var/log/messages, or it fails a SMART self-test. My favorite method is to put the bad drive in "write-mostly" mode, add a third drive to the RAID-1, let it fully sync, and then remove the bad drive from the RAID-1.
Why put the drive in "write-mostly" mode? Because the one remaining "good" drive in the RAID-1 may not be so good. It might have unreadable sectors. You will not discover that until the md software tries to read the entire "good" drive in order to copy it to the new third drive. If this happens, unless one's luck is extremely bad, the md software will be able to read the unreadable sectors from the "bad" drive. So instead of taking the bad drive out of the mirror immediately, I tell md to use it as little as possible.
The details; say our RAID-1 is /dev/md9, the bad drive is /dev/sdX, and our new drive is /dev/sdN.
Step 1. echo writemostly > /sys/block/md9/md/dev-sdX/state
This tells the md software to avoid reading from the bad drive. It still gets written to as usual.
Step 2. mdadm --grow /dev/md9 -n 3
Grow the RAID-1 to be able to handle 3 drives.
Step 3. mdadm /dev/md9 -a /dev/sdN
Add the new drive to the array. Syncing starts at once.
Step 4. grep md9 -A 3 /proc/mdstat
Monitor the progress of the sync and when it's done, continue. If you are unlucky, this step will never finish. You will see I/O errors in /var/log/messages and the md software will restart the sync over from scratch. This will happen in an endless loop. This is where backups come into play as the safety net.
Step 5. mdadm /dev/md9 -f /dev/sdX
Manually "fail" the bad drive.
Step 6. mdadm /dev/md9 -r /dev/sdX
Remove the bad drive from the array.
Step 7. mdadm --zero-superblock /dev/sdX
Erase the "md" superblock that's written towards the end of /dev/sdX to avoid any confusion that /dev/sdX might be part of an md array down the road.
Step 8. mdadm --grow /dev/md9 -n 2
Back to normal.
Saturday, July 24, 2010
I've used this procedure to upgrade several db's from 5.0 to the Percona build of 5.1. I like the Percona build because it includes a lot of performance fixes for innodb not found in stock 5.1. For us, the performance fixes have made a measurable difference.
MySql advises dumping the db from 5.0 and reloading the entire thing into a new 5.1 db. That's not feasible for us. We have terabytes of data. So I employ an "in-place" upgrade strategy.
STEP 0. Try out all the steps below on an old backup copy of the db to get an idea of what you are in for before attempting them on the production db.
STEP 1. SHOW CREATE TABLE for every FEDERATED table.
Are you using the new FEDERATED storage engine? None of my 5.0 FEDERATED tables would work after the in-place upgrade to 5.1. So save the output of SHOW CREATE TABLE to be able to drop and recreate them later.
STEP 2. SET GLOBAL INNODB_FAST_SHUTDOWN = 0;
The documentation for the innodb plugin says to flush all the undo and empty the insert buffer before upgrading innodb in-place. (The Percona build is based on the innodb plugin.) Turning off fast shutdown will accomplish exactly that when the db is shut down.
STEP 3. Shutdown the instance.
This step can take a lot of time. To get some idea of how long it will take, first do a SHOW INNODB STATUS. Look for the "TRANSACTIONS" section of the output; near the top you will see a "History list length" value. The larger this number is, the more undo there will be to clear. Look for the "INSERT BUFFER AND ADAPTIVE HASH INDEX" section and the "Ibuf: size" value, which is the size of the insert buffer content in units of 16K pages. The larger that is, the more deferred index updates the db will have to complete before it shuts down.
STEP 4. Replace 5.0 binaries with 5.1 binaries.
I like to build from source. So for me this involves running "make uninstall" in my 5.0 mysql source tree and then "make install" in my 5.1 source tree.
STEP 5. Replace the my.cnf file with one appropriate for 5.1 and restart instance.
Look over the 5.1 options and decide what's right for you. One variable we had to think about was innodb_autoinc_lock_mode. I perform the rest of the steps during a maintenance outage. Since in my environment all db connections are instigated remotely, I just add "skip-networking" to the my.cnf file before restarting.
STEP 5. mysql_upgrade
This script will probably try to run REPAIR TABLE on innodb tables, but it will complain every time it tries that since REPAIR TABLE is unimplemented for innodb tables. This is harmless.
STEP 6. Run CHECK TABLE FOO FOR UPGRADE on every innodb table. For every innodb table indicated as needing repairing, run ALTER TABLE FOO ENGINE = INNODB.
I wrote a python script to handle this for me. ALTER TABLE FOO ENGINE = INNODB creates a fresh copy of the table. For large tables, this can take a long time. Hopefully only a relatively small fraction of tables will need rebuilding as has been the case with my db's; otherwise you are in mysql purgatory.
STEP 7. Drop and recreate any FEDERATED tables.