Monday 5 May 2014

My MySQL database impressions

I have been in the data and database world for over a decade now, working with mainly Oracle and data integration projects with Salesforce.

I have also spent time working with MySQL a relational database with open source roots, now part of Oracle. In this post I want to talk about my impressions of MySQL and some tips and tricks I learned working with it.

First and for all, you will have to be ready to get your hands dirty. I couldn't find a package structure for database administration in MySQL - like DBMS libraries of packages and procedures in Oracle. That means you will have to do most of the things on your own. Nevertheless good news is he database starts showing an Oracle banner when you login from version 5.0 onwards and some features like on-line Schema changes, more cost based optimisation and partitioning are added in versions 5.6 - a sign of good things to come.

Some key points

  • Data Import/Export - You can use the native mysqldump utility to dump data with parameters, but it is slow. You can dump schemas and data. I couldn't get it to dump data fast (in parallel) though that is why I strongly recommend mydumper an open source utility written by guys in Oracle and MySQL to dump data using parallel threads and is very fast. Import can be done in parallel as well and it can give you that boost provided your hardware permits it. Don't try to disable constraints, drop indexes before imports as you will read in posts and suggestions on the net, mysqldump already does that for you.
  • Hot Backup - mylvmbackup seems like the de-facto script to take hot backups when the database in online. There are tools like XtraBackup from Percona too. It takes a snapshot of the disk where your datafiles and logfiles are. At restore it does a crash recovery using the logs and brings the database transactions forwards to the point of crash. Then if you have the logs after that, you can play them forwards and bring the database to a point in time after the backup. 
  • Parallel processing - Nada, there is none! I couldn't get it do anything in parallel. The only thing I managed to do in parallel was to export and import data with mydumper, that works! So if you have many CPUs you will be watching them being idle most of the time as one thread only will be chugging away. Unless you use mydumper for your import/export operations where you can make those CPUs sweat. 
  • DBMS packages - You fancy automating, do you need scripts to do repetitive tasks? Well there is no DBMS package library to help you administer the database in MySQL. Instead, you can use Percona Toolkit scripts, a consultancy specialising in helping MySQL DBAs to do great work with MySQL databases. They have a variety of scripts from comparing (diff), syncing databases, tables to extracting metadata and GRANTS structures.  
  • Hints, Explain Plan, Performance Tuning. I couldn't see much of Cost Based Optimisation in MySQL, the data dictionary (INFORMATION_SCHEMA) has metadata names but doesn't hold any dynamic statistics about objects, estimates of counts of rows in tables and indexes it holds can be up 50% wrong. The whole thing is based on heuristics, I suppose. The EXPLAIN PLAN is just a row where it says what the optimiser will do, there is no cost analysis or logical tree structure of execution plans yet.  I couldn't see much on Join orders either, no Nested Loops, HASH or MERGE joins yet. 

MySQL is a popular, relational database. The free version of this database is probably what a small website and a start-up needs. But having said that, many sites outgrow MySQL and still stay with it.

Oracle will probably turn it to a serious database too. Adding partitioning, multi threading to it in the recent releases, is a step forwards in becoming an Enterprise size and scale database.  I don't know much about the MySQL Cluster Version and MySQL Replication I know takes a load off from the reads. I want to see it doing more Performance Tuning science.

Top tools with MySQL that I used

MySQL Workbench - SQL IDE.
Mydumper - Fast logical backup and restore.
Mylvmbackup - Hot backup script
Pentaho Kettle - PDI is an all round data integration and middle-ware tool