Tuesday, May 19, 2009

Beware: myqldump and FEDERATED tables

mysqldump in 5.1 is broken in my opinion with respect to FEDERATED tables.

mysqldump insists on dumping all the rows of a FEDERATED table.  That's bad when you go to restore from the dump.  The normal case is that the remote table our FEDERATED table points to is fine and needs nothing done to it.  The restore will try to re-INSERT all the rows into the remote table.  But they are already there.

A solution to this problem is to isolate all FEDERATED tables within a single database, say "my_federated" for example.  Don't put anything but FEDERATED tables in "my_federated".

Then dump the database as follows:

mysqldump --opt --no-data --result-file=federated.dmp --verbose --databases my_federated

mysqldump --opt --result-file=data.dmp --verbose --databases list of all databases except my_federated and information_schema

Restore the db by loading both dump files:

mysql
mysql> source federated.dmp
mysql> source data.dmp

Wednesday, May 13, 2009

Building MySql From Sources on Redhat

UPDATE 7/2010:  Caveat emptor, this information is out of date.  I've learned some new things since originally writing this post as well.  I wrote this when I was first figuring out how to build from source.

Building MySql from sources has worked very well for me, at least once I got over a few humps.  First, there are several non-default options to "configure" that you probably want to use.  For example, by default, the "FEDERATED" storage engine isn't included.  Second, there is additional software that's not installed by default with Redhat that's needed.

The guidance here works for MySql 5.1.33.

Building MySql:

1. Get the source tree from mysql.com
2. cd mysql-X.XX
3. ./configure options
4. make
5. make install

Step (5) installs the mysql binaries to /usr/local/bin.

My "configure" options:

./configure --with-charset=utf8 --with-collation=utf8_general_ci --with-plugins=innobase,partition,archive,federated,heap,myisam,myisammrg --enable-local-infile --without-debug --enable-thread-safe-client --with-client-ldflags=-all-static

I'll explain why I use these particular options later on in this post.

Missing software problems:

1. "configure" fails complaining about "no curses/termcap library found"

Cure:  yum install ncurses-devel

2. "make" fails because of missing g++

Cure:  yum install gcc-c++

3. "make" (of innodb backend) fails due to missing file zlib.h

Cure:  yum install zlib-devel

Configure options:

1. --with-charset=utf8 --with-collation=utf8_general_ci

Why: we got tired of having to jump through hoops when our data happens to contain
non-latin1 characters.  These options make all db's utf8 by default.

2. --with-plugins=innobase,partition,archive,federated,heap,myisam,myisammrg

Why: don't futz with plug-ins.  Link things directly in.  Should yield a bit of performance.

3. --enable-local-infile

Why: without this, LOAD DATA LOCAL INFILE is disabled.

4. --without-debug

Why:  should gain some performance. 

5. --enable-thread-safe-client

Why:  our clients can have multiple threads.

6. --with-client-ldflags=-all-static

Why: should gain a bit of performance



Friday, April 17, 2009

UNIQUE Index Insanity

Does this look right to you?  Oracle doesn't work this way.  But MySql insists this is the desirable behavior.





mysql> create table foobar (x int, y int, unique(x, y));
Query OK, 0 rows affected (0.10 sec)

mysql> insert into foobar values (1, null);
Query OK, 1 row affected (0.00 sec)

mysql> insert into foobar values (1, null);
Query OK, 1 row affected (0.00 sec)

mysql> select * from foobar;
+------+------+
| x | y |
+------+------+
| 1 | NULL |
| 1 | NULL |
+------+------+
2 rows in set (0.00 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.1.33 |
+-----------+
1 row in set (0.00 sec)

mysql>

Saturday, February 21, 2009

Happy with myisamchk

We've had problems time to time with head placement issues with disk drives (let's just say I'm no longer a fan of Seagate).  What happens is that a 512-byte sector gets written out to the wrong sector, corrupting the database.  Up to now we've been using innodb exclusively for our tables.  innodb has a nice built-in mechanism that lets you detect such an occurrence --- page level checksums.  As you know, innodb "tablespace" files are divided up into 16K pages.  Each page has a header and a trailer.  The page header contains a checksum computed from the page's content, and the trailer contains a number (log sequence number) that is redundant with the same number in the header.  So when a sector gets written out to the wrong spot in an innodb file, the checksum and/or the header/trailer mismatch will flag the problem.

I wrote a little utility in C that scans over an innodb file checking all the checksums.  I just plagiarized the code from the sources that computes checksums.  I love this utility.  I run it over all our backups after they are freshly made to gain confidence in them.

Recently we created a new "warehouse" db that's using myisam exclusively.  I was wondering if myisamchk could detect the aftermath of a head placement fault.  I made a copy of the .frm, .MYI, and .MYD files for a large table, and wrote a little C program that would go in and copy 512-bytes from one place to another in a file.  I ran "myisamchk --check" over differently corrupted versions of the .MYI and .MYI files, and it successfully detected a problem every time.  I was pleased that "myisamchk --check" was able to do the job rather than having to resort to "myisamchk --extend-check", which is several times slower.  I'll be using myisamchk to verify our backups.

Saturday, February 14, 2009

MySql 4.1 -> MySql 5.0 Incompatibilities

Here's a bunch of non-backward compatible changes to 5.0.  Applications running fine under 4.1 might have some glitches under 5.0.  Trying to replicate a 4.1 master to a 5.0 slave is probably not going to work smoothly because sql that worked fine in 4.1 will do something different or get an error in 5.0.  I stumbled on most of these incompatibilities trying to run a 5.0 slave with a 4.1 master, so the material here is not merely of academic interest.

This is a sad state of affairs.  Clearly backward compatibility isn't a concern at MySql.

---------------------------------------------------

In 4.1, GREATEST(x, NULL) == x, and LEAST(x, NULL) == x.

In 5.0, both of these expressions evaluate to NULL: GREATEST(x,NULL) == NULL and LEAST(x,NULL) == NULL.

To maintain 4.1 behavior both before and after the upgrade, rewrite...

GREATEST(x,y)

as...

CASE WHEN x IS NULL THEN y WHEN y IS NULL THEN x ELSE GREATEST(x,y) END

or...

IF(x IS NULL, y, IF(y IS NULL, x, GREATEST(x,y)))

(Rewrite LEAST in the same fashion.)

If expressions containing LEAST and GREATEST are rewritten as suggested, they will evaluate in 4.1 and 5.0 to the same thing they would have in 4.1 before being rewritten. So code would work the same way before and after the upgrade.

-------------------------------

Not an incompatibility, but a change that's good to know about --- the maximum supported VARCHAR has expanded from from 255 to 65K in 5.0.

In 5.0, stored VARCHAR values have either 1 or 2 length bytes stored depending on the declared maximum. In 4.1 there was only always a single length byte.

-------------------------------

Yet another reason to avoid doing "SELECT *" or "SELECT tabname.*" in production code...

In 5.0, the number and/or order of columns for a JOIN doing a "SELECT *" or "SELECT tabname.*" can change from what it was in 4.1.

Hopefully there's no "SELECT *" JOINs in your code base, but if there are they should have the list of columns hard-coded before upgrading.

---------------------------------

In 4.1, trailing blanks are silently truncated when storing a value into a VARCHAR. In 5.0, trailing spaces are retained.

For example, in 4.1:

mysql> use test
Database changed
mysql> create table foo (x varchar(100)) engine = innodb;
Query OK, 0 rows affected (0.02 sec)

mysql> insert into foo values ('hello ');
Query OK, 1 row affected (0.00 sec)

mysql> select concat(x, 'bye') from foo;
+------------------+
| concat(x, 'bye') |
+------------------+
| hellobye |
+------------------+
1 row in set (0.00 sec)


By contrast, in 5.0:

mysql> create table foo (x varchar(100)) engine = innodb;
Query OK, 0 rows affected (0.13 sec)

mysql> insert into foo values ('hello ');
Query OK, 1 row affected (0.00 sec)

mysql> select concat(x,'bye') from foo;
+-----------------+
| concat(x,'bye') |
+-----------------+
| hello bye |
+-----------------+
1 row in set (0.00 sec)


In 5.0 as in 4.1, trailing spaces are *not* considered when doing comparisons. In 4.1 or 5.1:

mysql> select 'aa' = 'aa ';
+----------------+
| 'aa' = 'aa ' |
+----------------+
| 1 |
+----------------+
1 row in set (0.00 sec)


This has an "interesting" side effect when a VARCHAR is a primary key. You can't store both X and CONCAT(X,' ') into a primary key because they are both "equal":

mysql> create table foo (x varchar(100), primary key (x)) engine = innodb;
Query OK, 0 rows affected (0.10 sec)

mysql> insert into foo values ('aa ');
Query OK, 1 row affected (0.00 sec)

mysql> insert into foo values ('aa');
ERROR 1062 (23000): Duplicate entry 'aa' for key 1

----------------------------------------


In 4.1, the max allowed indexable value is 1024 bytes. In 5.0, the limit has apparently been reduced to 1000 bytes.

So a CREATE TABLE that works fine in 4.1 can fail in 5.0.

In 4.1:

mysql> create table foo (x int, a varchar(255), b varchar(255), c varchar(255), d varchar(255), primary key(x,a,b,c,d));
Query OK, 0 rows affected (0.07 sec)

In 5.0:

mysql> create table foo (x int, a varchar(255), b varchar(255), c varchar(255), d varchar(255), primary key(x,a,b,c,d));
ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

MySql documents that a 4.1 master should work with a 5.0 slave, but the devil is in the details. "Backward compatibility" is an oxymoron at MySql apparently.  A 5.0 slave will choke if this CREATE TABLE is run on the 4.1 master.

--------------------------------


The JOIN parser was overhauled for 5.0. The intent was to make 5.0 more compliant with the ANSI standard. The result is that JOINs that worked fine in 4.1 are now rejected as bad syntax in 5.0.

Consider the following statement that works fine in 4.1:

DELETE FROM k.r AS t1
USING k.r AS t1
LEFT JOIN ks.k AS t2
ON t2.kc_id = 30738
AND t2.id = t1.id
WHERE t1.experiment_keyset_id = 2638
AND t2.kc_id IS NULL

In 5.0, this SQL gets the error:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS t1
USING k.r AS t1
LEFT JOIN k.k' at line 1

The fix is to "use k" (why? I don't know) and refer to the table by its alias in the FROM clause.  This statement works as desired in both 5.0 and 4.1:

mysql> use k;
mysql> DELETE FROM t1 USING k.r AS t1 LEFT JOIN ks.k AS t2 ON (t2.kc_id = 30738 AND t2.id = t1.id) WHERE t1.experiment_keyset_id = 2638 AND t2.kc_id IS NULL;
Query OK, 64 rows affected (0.00 sec)

On the other hand, referring to the table by its full name in the FROM clause doesn't work in 5.0:

mysql> DELETE FROM k.r USING k.r AS t1 LEFT JOIN ks.k AS t2 ON t2.kc_id = 30738 AND t2.id = t1.id WHERE t1.experiment_keyset_id = 2638 AND t2.kc_id IS NULL;
ERROR 1109 (42S02): Unknown table 'r' in MULTI DELETE

---------------------------------------------------

There are more reserved words in 5.0 than in 4.1. SQL that worked fine in 4.1 will fail in 5.0 because a column name happens to be a reserved word in 5.0. The cure is to quote the now-reserved word column name with single backquotes.

For example, this statement works fine in 4.1 but fails in 5.0:

mysql> INSERT classifications.category_type_def_sql_codes (category_type_id, match_type_id, host, sql, db)
VALUES (56, '2', 'q', 'select pattern, category, priority from c_b', 'sloal');
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'sql, db)
VALUES (56, '2', 'q', 'select pattern, category, priority from c' at line 1

The fix is to backquote the column "sql":

mysql> INSERT classifications.category_type_def_sql_codes (category_type_id, match_type_id, host, `sql`, db) VALUES (56, '2', 'q', 'select pattern, category, priority from c_b', 'sloal');
Query OK, 1 row affected (0.00 sec)

-------------------------------------------------------------

Joins of more than two tables can return different results in 4.1 and 5.0 when at least one of the joins is an "outer" join.

Basically, it seems to me that the implementation of multi-way joins in 4.1 was sloppy and incorrect. Things have been cleaned up in 5.0. The consequence is that the same statement can return different results.

Consider the following example:

mysql> create table t1(a int, primary key(a));
Query OK, 0 rows affected (0.02 sec)

mysql> create table t2(a int, b int, primary key(a, b));
Query OK, 0 rows affected (0.01 sec)

mysql> create table t3(b int, primary key(b));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into t1 values (1),(2);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0

mysql> insert into t2 values (1,101);
Query OK, 1 row affected (0.02 sec)

mysql> insert into t3 values (101);
Query OK, 1 row affected (0.00 sec)


In 4.1:

mysql> select * from t1 left join (t2 left join t3 on t2.b = t3.b) on t1.a = t2.a;
+---+---+-----+------+
| a | a | b | b |
+---+---+-----+------+
| 1 | 1 | 101 | 101 |
| 2 | 1 | 101 | NULL |
+---+---+-----+------+

2 rows in set (0.00 sec)


In 5.0:

mysql> select * from t1 left join (t2 left join t3 on t2.b = t3.b) on t1.a = t2.a;
+---+------+------+------+
| a | a | b | b |
+---+------+------+------+
| 1 | 1 | 101 | 101 |
| 2 | NULL | NULL | NULL |
+---+------+------+------+

2 rows in set (0.00 sec)

------------------------------------------------------

In 5.0, when one declares a FLOAT(m,n), m (total number of digits) must be at least as large as n (number of decimal digits).

In 4.1, one can declare FLOAT(m,n) with m <>

mysql> CREATE TABLE `bud` (
-> `timestamp` varchar(255) NOT NULL,
-> `url` varchar(250) NOT NULL,
-> `last_month_vol` float(1,3) NOT NULL default '',
-> PRIMARY KEY (`timestamp`,`url`)
-> );
ERROR 1427 (42000): For float(M,D), double(M,D) or decimal(M,D), M must be >= D (column 'last_month_vol').
mysql>



------------------------------------------------


In 5.0, in a CREATE TABLE the default value specified for an INT, FLOAT, etc. must be numeric. The value can be a string, but the string must be a number.

4.1 is happy with any random string, which it implicitly converts to zero.

In 4.1:

mysql> create table foo ( x int not null default 'garbage');
Query OK, 0 rows affected (0.01 sec)

mysql> show create table foo \G
*************************** 1. row ***************************
Table: foo
Create Table: CREATE TABLE `foo` (
`x` int(11) NOT NULL default '0'
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)

In 5.0:

mysql> create table foo ( x int not null default 'garbage');
ERROR 1067 (42000): Invalid default value for 'x'

In particular, using the empty string ('') as a default value for numeric types used to work in 4.1, but no longer in 5.0:

mysql> CREATE TABLE `bud` ( `timestamp` varchar(255) NOT NULL, `url` varchar(250) NOT NULL, `relevance_score` float(4,4) NOT NULL default '',  PRIMARY KEY (`timestamp`,`url`) );
ERROR 1067 (42000): Invalid default value for 'relevance_score'


Friday, January 9, 2009

Upgrade is Cruel

I've got a 5.0 db I'd like to upgrade to 5.1.

This db is several terabytes in size.

Here's what the manual says about the ideal way to upgrade:

2.12.1. Upgrading from MySQL 5.0 to 5.1

[snip] MySQL recommends that you dump and reload your tables from any previous version to upgrade to 5.1.

Arghhhh!!!!!  Dumping and reloading a multi-terabyte db would take weeks with mysql!!!  That's the best option for upgrading a mysql db?  Unacceptable!!!!!!