Saturday, February 21, 2009

Happy with myisamchk

We've had problems time to time with head placement issues with disk drives (let's just say I'm no longer a fan of Seagate).  What happens is that a 512-byte sector gets written out to the wrong sector, corrupting the database.  Up to now we've been using innodb exclusively for our tables.  innodb has a nice built-in mechanism that lets you detect such an occurrence --- page level checksums.  As you know, innodb "tablespace" files are divided up into 16K pages.  Each page has a header and a trailer.  The page header contains a checksum computed from the page's content, and the trailer contains a number (log sequence number) that is redundant with the same number in the header.  So when a sector gets written out to the wrong spot in an innodb file, the checksum and/or the header/trailer mismatch will flag the problem.

I wrote a little utility in C that scans over an innodb file checking all the checksums.  I just plagiarized the code from the sources that computes checksums.  I love this utility.  I run it over all our backups after they are freshly made to gain confidence in them.

Recently we created a new "warehouse" db that's using myisam exclusively.  I was wondering if myisamchk could detect the aftermath of a head placement fault.  I made a copy of the .frm, .MYI, and .MYD files for a large table, and wrote a little C program that would go in and copy 512-bytes from one place to another in a file.  I ran "myisamchk --check" over differently corrupted versions of the .MYI and .MYI files, and it successfully detected a problem every time.  I was pleased that "myisamchk --check" was able to do the job rather than having to resort to "myisamchk --extend-check", which is several times slower.  I'll be using myisamchk to verify our backups.

Saturday, February 14, 2009

MySql 4.1 -> MySql 5.0 Incompatibilities

Here's a bunch of non-backward compatible changes to 5.0.  Applications running fine under 4.1 might have some glitches under 5.0.  Trying to replicate a 4.1 master to a 5.0 slave is probably not going to work smoothly because sql that worked fine in 4.1 will do something different or get an error in 5.0.  I stumbled on most of these incompatibilities trying to run a 5.0 slave with a 4.1 master, so the material here is not merely of academic interest.

This is a sad state of affairs.  Clearly backward compatibility isn't a concern at MySql.

---------------------------------------------------

In 4.1, GREATEST(x, NULL) == x, and LEAST(x, NULL) == x.

In 5.0, both of these expressions evaluate to NULL: GREATEST(x,NULL) == NULL and LEAST(x,NULL) == NULL.

To maintain 4.1 behavior both before and after the upgrade, rewrite...

GREATEST(x,y)

as...

CASE WHEN x IS NULL THEN y WHEN y IS NULL THEN x ELSE GREATEST(x,y) END

or...

IF(x IS NULL, y, IF(y IS NULL, x, GREATEST(x,y)))

(Rewrite LEAST in the same fashion.)

If expressions containing LEAST and GREATEST are rewritten as suggested, they will evaluate in 4.1 and 5.0 to the same thing they would have in 4.1 before being rewritten. So code would work the same way before and after the upgrade.

-------------------------------

Not an incompatibility, but a change that's good to know about --- the maximum supported VARCHAR has expanded from from 255 to 65K in 5.0.

In 5.0, stored VARCHAR values have either 1 or 2 length bytes stored depending on the declared maximum. In 4.1 there was only always a single length byte.

-------------------------------

Yet another reason to avoid doing "SELECT *" or "SELECT tabname.*" in production code...

In 5.0, the number and/or order of columns for a JOIN doing a "SELECT *" or "SELECT tabname.*" can change from what it was in 4.1.

Hopefully there's no "SELECT *" JOINs in your code base, but if there are they should have the list of columns hard-coded before upgrading.

---------------------------------

In 4.1, trailing blanks are silently truncated when storing a value into a VARCHAR. In 5.0, trailing spaces are retained.

For example, in 4.1:

mysql> use test
Database changed
mysql> create table foo (x varchar(100)) engine = innodb;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into foo values ('hello ');
Query OK, 1 row affected (0.00 sec)

mysql> select concat(x, 'bye') from foo;
+------------------+
| concat(x, 'bye') |
+------------------+
| hellobye |
+------------------+
1 row in set (0.00 sec)


By contrast, in 5.0:

mysql> create table foo (x varchar(100)) engine = innodb;
Query OK, 0 rows affected (0.13 sec)

mysql> insert into foo values ('hello ');
Query OK, 1 row affected (0.00 sec)

mysql> select concat(x,'bye') from foo;
+-----------------+
| concat(x,'bye') |
+-----------------+
| hello bye |
+-----------------+
1 row in set (0.00 sec)


In 5.0 as in 4.1, trailing spaces are *not* considered when doing comparisons. In 4.1 or 5.1:

mysql> select 'aa' = 'aa ';
+----------------+
| 'aa' = 'aa ' |
+----------------+
| 1 |
+----------------+
1 row in set (0.00 sec)


This has an "interesting" side effect when a VARCHAR is a primary key. You can't store both X and CONCAT(X,' ') into a primary key because they are both "equal":

mysql> create table foo (x varchar(100), primary key (x)) engine = innodb;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into foo values ('aa ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into foo values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 1

----------------------------------------


In 4.1, the max allowed indexable value is 1024 bytes. In 5.0, the limit has apparently been reduced to 1000 bytes.

So a CREATE TABLE that works fine in 4.1 can fail in 5.0.

In 4.1:

mysql> create table foo (x int, a varchar(255), b varchar(255), c varchar(255), d varchar(255), primary key(x,a,b,c,d));
Query OK, 0 rows affected (0.07 sec)

In 5.0:

mysql> create table foo (x int, a varchar(255), b varchar(255), c varchar(255), d varchar(255), primary key(x,a,b,c,d));
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

MySql documents that a 4.1 master should work with a 5.0 slave, but the devil is in the details. "Backward compatibility" is an oxymoron at MySql apparently.  A 5.0 slave will choke if this CREATE TABLE is run on the 4.1 master.

--------------------------------


The JOIN parser was overhauled for 5.0. The intent was to make 5.0 more compliant with the ANSI standard. The result is that JOINs that worked fine in 4.1 are now rejected as bad syntax in 5.0.

Consider the following statement that works fine in 4.1:

DELETE FROM k.r AS t1
USING k.r AS t1
LEFT JOIN ks.k AS t2
ON t2.kc_id = 30738
AND t2.id = t1.id
WHERE t1.experiment_keyset_id = 2638
AND t2.kc_id IS NULL

In 5.0, this SQL gets the error:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS t1
USING k.r AS t1
LEFT JOIN k.k' at line 1

The fix is to "use k" (why? I don't know) and refer to the table by its alias in the FROM clause.  This statement works as desired in both 5.0 and 4.1:

mysql> use k;
mysql> DELETE FROM t1 USING k.r AS t1 LEFT JOIN ks.k AS t2 ON (t2.kc_id = 30738 AND t2.id = t1.id) WHERE t1.experiment_keyset_id = 2638 AND t2.kc_id IS NULL;
Query OK, 64 rows affected (0.00 sec)

On the other hand, referring to the table by its full name in the FROM clause doesn't work in 5.0:

mysql> DELETE FROM k.r USING k.r AS t1 LEFT JOIN ks.k AS t2 ON t2.kc_id = 30738 AND t2.id = t1.id WHERE t1.experiment_keyset_id = 2638 AND t2.kc_id IS NULL;
ERROR 1109 (42S02): Unknown table 'r' in MULTI DELETE

---------------------------------------------------

There are more reserved words in 5.0 than in 4.1. SQL that worked fine in 4.1 will fail in 5.0 because a column name happens to be a reserved word in 5.0. The cure is to quote the now-reserved word column name with single backquotes.

For example, this statement works fine in 4.1 but fails in 5.0:

mysql> INSERT classifications.category_type_def_sql_codes (category_type_id, match_type_id, host, sql, db)
VALUES (56, '2', 'q', 'select pattern, category, priority from c_b', 'sloal');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'sql, db)
VALUES (56, '2', 'q', 'select pattern, category, priority from c' at line 1

The fix is to backquote the column "sql":

mysql> INSERT classifications.category_type_def_sql_codes (category_type_id, match_type_id, host, `sql`, db) VALUES (56, '2', 'q', 'select pattern, category, priority from c_b', 'sloal');
Query OK, 1 row affected (0.00 sec)

-------------------------------------------------------------

Joins of more than two tables can return different results in 4.1 and 5.0 when at least one of the joins is an "outer" join.

Basically, it seems to me that the implementation of multi-way joins in 4.1 was sloppy and incorrect. Things have been cleaned up in 5.0. The consequence is that the same statement can return different results.

Consider the following example:

mysql> create table t1(a int, primary key(a));
Query OK, 0 rows affected (0.02 sec)

mysql> create table t2(a int, b int, primary key(a, b));
Query OK, 0 rows affected (0.01 sec)

mysql> create table t3(b int, primary key(b));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values (1),(2);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> insert into t2 values (1,101);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t3 values (101);
Query OK, 1 row affected (0.00 sec)


In 4.1:

mysql> select * from t1 left join (t2 left join t3 on t2.b = t3.b) on t1.a = t2.a;
+---+---+-----+------+
| a | a | b | b |
+---+---+-----+------+
| 1 | 1 | 101 | 101 |
| 2 | 1 | 101 | NULL |
+---+---+-----+------+

2 rows in set (0.00 sec)


In 5.0:

mysql> select * from t1 left join (t2 left join t3 on t2.b = t3.b) on t1.a = t2.a;
+---+------+------+------+
| a | a | b | b |
+---+------+------+------+
| 1 | 1 | 101 | 101 |
| 2 | NULL | NULL | NULL |
+---+------+------+------+

2 rows in set (0.00 sec)

------------------------------------------------------

In 5.0, when one declares a FLOAT(m,n), m (total number of digits) must be at least as large as n (number of decimal digits).

In 4.1, one can declare FLOAT(m,n) with m <>

mysql> CREATE TABLE `bud` (
-> `timestamp` varchar(255) NOT NULL,
-> `url` varchar(250) NOT NULL,
-> `last_month_vol` float(1,3) NOT NULL default '',
-> PRIMARY KEY (`timestamp`,`url`)
-> );
ERROR 1427 (42000): For float(M,D), double(M,D) or decimal(M,D), M must be >= D (column 'last_month_vol').
mysql>



------------------------------------------------


In 5.0, in a CREATE TABLE the default value specified for an INT, FLOAT, etc. must be numeric. The value can be a string, but the string must be a number.

4.1 is happy with any random string, which it implicitly converts to zero.

In 4.1:

mysql> create table foo ( x int not null default 'garbage');
Query OK, 0 rows affected (0.01 sec)

mysql> show create table foo \G
*************************** 1. row ***************************
Table: foo
Create Table: CREATE TABLE `foo` (
`x` int(11) NOT NULL default '0'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

In 5.0:

mysql> create table foo ( x int not null default 'garbage');
ERROR 1067 (42000): Invalid default value for 'x'

In particular, using the empty string ('') as a default value for numeric types used to work in 4.1, but no longer in 5.0:

mysql> CREATE TABLE `bud` ( `timestamp` varchar(255) NOT NULL, `url` varchar(250) NOT NULL, `relevance_score` float(4,4) NOT NULL default '',  PRIMARY KEY (`timestamp`,`url`) );
ERROR 1067 (42000): Invalid default value for 'relevance_score'