My MySQL tipsvalid-rss-rogers

AWS Aurora Benchmarking - Blast or Splash? PDF Print E-mail
Written by Marco Tusa   
Sunday, 01 November 2015 00:00


In this investigation, three solutions were analyzed: MySQL with MHA (Master High Availability Manager for MySQL), MySQL with Galera Replication (synchronous data replication cross-node), and AWS RDS-Aurora (data solution from Amazon promising HA and read scalability).

These three platforms were evaluated for high availability (HA; how fast the service would recover in case of a crash) and performance in managing both incoming writes and concurrent read and write traffic.

These were the primary items evaluated, followed by an investigation into how to implement a multi-region HA solution, as requested by the customer. This evaluation will be used to assist the customer in selecting the most suitable HA solution for their applications.


HA tests


MySQL + Galera was proven to be the most efficient solution; in the presence of read and write load, it performed a full failover in 15 seconds compared to 50 seconds for AWS-Aurora, and to more than two minutes for MySQL with MHA.


Tests indicated that MySQL with MHA is the most efficient platform. With this solution, it is possible to manage read/write operations almost twice as fast as and more efficiently than MySQL Galera, which places second. Aurora consistently places last.


In light of the above tests, the recommendations consider different factors to answer the question, "Which is the best tool for the job?" If HA and very low failover time are the major factors, MySQL with Galera is the right choice. If the focus is on performance and the business can afford several minutes of down time, then the choice is MySQL with MHA.


Finally, Aurora is recommended when there is a need for an environment with limited concurrent writes, the need to have significant scale in reads, and the need to scale (in/out) to cover bursts of read requests such as in a high-selling season.




Why This Investigation?

The outcomes presented in this document are the result of an investigation taken in September 2015. The research and tests were conducted as an answer to a growing number of requests for clarification and recommendations around Aurora and EC2. The main objective was to be able to focus on real numbers and scenarios seen in production environments.

Everything possible was done to execute the tests in a consistent way across the different platforms.

Errors were prune by errors executing each test on each platform before collecting the data, this run also had the scope to identify the saturation limit that was different for each tested architecture. During the real test execution, I had repeated the test execution several times to identify and reduce any possible deviation.


Things to Answer:

In the investigation I was looking to answer to the following things by platform

  • HA
    • Time to failover
    • Service interruption time
    • Lag in execution at saturation level
  • Ingest & Compliance test
    • Execution time
    • Inserts/sec
    • Selects/sec
    • Deletes/sec
    • Handlers/sec
    • Rows inserted/sec
    • Rows select/sec (compliance only)
    • Rows delete/sec (compliance only)


Brief description about MHA, Galera, Aurora


MHA is a solution that sits on top of the MySQL nodes, checking the status of each the nodes, and using custom scripts to manage the failover. An important thing to keep in mind is that MHA is not acting as the “man in the middle”, and as such, no connection is sent to the MHA Controller.  The MHA controller instead could manage the entry point with a VIP (Virtual IP), as HAProxy settings, or whatever makes sense in the design architecture.

At the MySQL level, the MHA controller will recognize the failing master and will elect the most up-to-date one as the new master. It will also try to re-align the gap between the failed master and the new one using original binary logs if available and accessible.

Scalability is provided via standard MySQL design, having one master in read/write and several servers in read mode. Replication is asynchronous replication, and therefore it could easily lag, leaving the read nodes quite far behind the master.



MySQL with Galera Replication

MySQL + Galera works on the concept of a cluster of nodes sharing the same dataset.

What this means is that MySQL+Galera is a cluster of MySQL instances, normally three or five, that share the same dataset, and where data is synchronously distributed between nodes.

The focus is on the data not on the nodes given to each node sharing the same data and status. Transactions are distributed across all active nodes that represent the primary component. Nodes can leave and re-join the cluster, modifying the conceptual view that expresses the primary component (for more details on Galera review my presentation ).

What is of note here, is that each node has the same data at the end of a transaction; given that the application can connect to any node and read/write data in a consistent way.

As previously mentioned replication is (virtually) synchronous, data is locally validated and certified, and conflicts are managed to keep data internally consistent. Failover is more of an external need than a Galera one, meaning that the application can be set to connect to one node only or on all the available nodes, and if one of the nodes is not available the application should be able to utilize the others.

Given that not all applications have this function out of the box, it is common practice to add another layer with HAProxy to manage the application connection, and have HAProxy distribute the connection across nodes or to use a single node as main point of reference and then shift to the others in case of needs. The shift in this case is limited to move the connection points from Node A to Node B.

MySQL/Galera write scalability is limited by the capacity to manage the total amount of incoming writes by the single node. There is no write scaling in adding nodes. MySQL/Galera is simply a write distribution platform, and reads can be performed consistently on each node.



Aurora is based on the RDS approach with the ease of management, built-in backup, data on disk autorecovery etc. Amazon also states that Aurora offers a better replication mechanism (~100 ms lag).

This architecture is based on a main node acting as a read/write node and a variable number of nodes that can work as read-only nodes. Given that the replication is claimed to be within ~100ms, reading from the replica nodes should be safe and effective. A read replica can be distributed by AZ (availability zone), but must reside in the same region.

Applications connect to an entry point that will not change. In case of failover, the internal mechanism will move the entry point from the failed node to the new master, and also all the read replicas will be aligned to the new master.

Data is replicated at a low level, and pushed directly from the read/write data store to a read data store, and here there should be a limited delay and very limited lag (~100ms). Aurora replication is not synchronous, and given only one node is active at time, there is no data consistency validation or check.

In terms of scalability, Aurora is not write scaling by adding of the replica nodes. The only way to scale the writes is to scale up, meaning upgrading the master instance to a more powerful one. Given the nature of the cluster, it would be unwise to upgrade only the master.

Reads are scaled by adding new read replicas.



What about multi-region?

One of the increasing requests I receive is to design architectures that could manage failover across regions.

There are several issues in replicating data across regions, starting from data security down to packet size and frame dimension modification, given we must use existing networks for that.

For this investigation, it is important to note one thing: the only solution that could offer internal replication cross-region, is Galera with the use of Segments. But given the possible issues I normally do not recommend using this solution when we talk of regions across continents. Galera is also the only one that eventually would help optimize asynchronous replication, using multiple nodes to replicate on another cluster.

Aurora and standard MySQL must rely on basic asynchronous replication, with all the related limitations.

Architecture for the investigation

The investigation I conducted used several components:

  • EIP = 1
  • VPC = 1
  • ELB=1
  • Subnets = 4 (1 public, 3 private)
  • HAProxy = 6
  • MHA Monitor (micro ec2) = 1
  • NAT Instance (EC2) =1 (hosting EIP)
  • DB Instances (EC2) = 6 + (2 stand by) (m4.xlarge)
  • Application Instances (EC2) = 6
  • EBS SSD 3000 PIOS
  • Aurora RDS node = 3 (db.r3.xlarge)
  • Snapshots = (at regular intervals)

MySQL_HA_failover - POC architecture

Application nodes connect to the databases either using an Aurora entry point, or HAProxy. For MHA, the controller action was modified to act on an HAProxy configuration instead on a VIP (Virtual IP), modifying his active node and reloading the configuration. For Galera, the shift was driven by the recognition of the node status using the HAProxy check functionality.
As indicated in the above schema, replication was distributed across several AZ. No traffic was allowed to reach the databases from outside the VPC. ELB was connected to the HAProxy instances, which have proven to be a good way to rotate across several HAProxy instances, in case an additional HA layer is needed.
For the scope of the investigation, given each application was hosting a local HAProxy, using the “by tile” approach ELB was tested in the phase dedicated to identify errors and saturation, but not used in the following performance and HA tests. The NAT instance was configured to allow access to each HAProxy web interface only, to review statistics and node status.




I performed 3 different types of tests:

  • High availability
  • Data ingest
  • Data compliance

High Availability

Test description:

The tests were quite simple; I was running a script that, while inserting, was collecting the time of the command execution, storing the SQL execution time (with NOW()), returning the value to be printed, and finally collecting the error code from the MySQL command.

The result was:


2015-09-30 21:12:49  2015-09-30 21:12:49 0
    /\                         /\       /\
    |                          |        |
 Date time sys           now() MySQL     exit error code (bash)

Log from bash : 2015-09-30 21:12:49 2015-09-30 21:12:49 0

Log from bash : 2015-09-30 21:12:49 2015-09-30 21:12:49 0

Inside the table:

  `c` datetime NOT NULL;

select a,d from ha; +-----+---------------------+ | a | d | +-----+---------------------+ | 1 | 2015-09-30 21:12:31 |

For Galera, the HAProxy settings were:

server node1 check port 3311 inter 3000 rise 1 fall 2  weight 50
server node2Bk check port 3311 inter 3000 rise 1 fall  2   weight 50 backup
server node3Bk check port 3311 inter 3000 rise 1 fall  2   weight 10 backup

I ran the test script on the different platforms, without heavy load, and then close to the MySQL/Aurora saturation point.


High Availability Results


I think an image is worth millions of words.

MySQL with MHA was taking around 2 minutes to perform the full failover, meaning from interruption to when the new node was able to receive data again.

Under stress, the master was so ahead of the slaves, and replication lag was so significant, that a failover with binlog application was just taking too long to be considered a valid option in comparison to the other two. This result was not a surprise, but had pushed me to analyze the MHA solution more independently given the behaviour was totally diverging from the other two such that it was not comparable.

More interesting was the behavior between MySQL/Galera and Aurora. In this case, MySQL/Galera was consistently more efficient than Aurora, with or without load. It is worth mentioning that of the 8 seconds taken by MySQL/Galera to perform the failover, 6 were due to the HAProxy settings which had 3000 ms interval and 2 loops in the settings before executing failover. Aurora was able to perform a decent failover when the load was low, while under increasing load, the read nodes become less aligned with the write node, and as such less ready for failover.

Note that I was performing the tests following the Amazon indication to use the following to simulate a real crash:


As such, I was not doing anything strange or out of the ordinary.

Also, while doing the tests, I had the opportunity to observe the Aurora replica lag using CloudWatch, which reported a much higher lag value than the claimed ~100 ms.

As you can see below:


In this case, I was getting almost 18 seconds of lag in the Aurora replication, much higher than ~100 ms!

Or a not clear


As you can calculate by yourself, 2E16 is several decades of latency.

Another interesting metric I was collecting was the latency between the application sending the request and the moment of execution.


Once more, MySQL/Galera is able to manage the requests more efficiently. With a high load and almost at saturation level, MySQL/Galera was taking 61 seconds to serve the request, while Aurora was taking 204 seconds for the same operation.

This indicates how high the impact of load can be in case of saturation, for the response time and execution. This is a very important metric to keep under observation to decide when or if to scale up.


Exclude What is Not a Full Fit

As previously mentioned, this investigation was intended to answer several questions first of all about the HA and the failover time. Given that, I had to exclude the MySQL/MHA solution from the remaining analysis, because it is so drastically divergent that it will make no sense to compare, also analyzing the performance of the MHA MySQL solution in conjunction of the others, would had flattered the other two. Details about MHA/MySQL are present in the Annex.

Performance tests

Ingest tests


This set of tests were done to cover how the two platforms behaved in case of a significant amount of inserts.

I used IIbench with a single table and my own StressTool that instead uses several tables (configurable) plus other more configurable options like:

  • Configurable batch inserts
  • Configurable insert rate
  • Different access method to PK (simple PK or composite)
  • Multiple tables and configurable table structure.

The two benchmarking tools differ also in the table definition:


CREATE TABLE `purchases_index` (
  `transactionid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  `data` varchar(4000) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)


CREATE TABLE `tbtest1` (
  `autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) NOT NULL,
  `uuid` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `b` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `c` char(200) COLLATE utf8_unicode_ci NOT NULL,
  `counter` bigint(20) DEFAULT NULL,
  `partitionid` int(11) NOT NULL DEFAULT '0',
  `date` date NOT NULL,
  `strrecordtype` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`autoInc`,`date`),
  KEY `IDX_a` (`a`),
  KEY `IDX_date` (`date`),
  KEY `IDX_uuid` (`uuid`)

IIbench was executed using 32 threads for each application server, while the stress tool was executed running 16/32/64 on each Application node, resulting in 96, 192, and 384 threads, each of which executing the batch insert with 50 insert per batch (19,200 rows).

Ingest Test Results

Execution time:


Time to insert ~305 ML rows, in one single table using 192 threads.

Rows Inserted/Sec
Insert Time


The result of this test is once more quite clear, with MySQL/Galera able to manage the load in 1/3 of the time Aurora takes. MySQL/Galera was also more consistent in insert time taken with the growth of the rows number inside the table.

It is also of interest to note how the number of rows inserted reduced faster in MySQL/Galera related to Aurora.


Java Stress Tool
Execution Time



This test was focused on multiple inserts (as IIbench) but using an increasing number of threads, and multiple tables. This test is closer to what could happen in real life given parallelization, and multiple entry points are definitely more common than a single table insert in a relational database.

In this test Aurora performs better and the distance is less evident than the IIbench test.

In this test, we can see that Aurora is able to perform almost as a standard MySQL instance, but this performance does not persist with the increase of concurrent threads. Actually, MySQL/Galera was able to manage the load of 384 threads in 1/3 of the time in respect to Aurora.

Analyzing more in-depth, we can see that MySQL/Galera is able to manage more efficiently the commit phase part, which is surprising keeping in mind MySQL/Galera is using synchronous replication, and it had to manage the data validation and replication.

Row inserted
Com Commit


Commit Handler Calls


In conclusion, I can say that also in this case MySQL/Galera performed better than Aurora.

Compliance Tests

The compliance tests I ran were using Tpcc-mysql with 200 warehouses, and  StressTool with 15 parent tables and 15 child tables generating Select/Insert/Delete on a basic dataset of 100K entries.

All tests were done with the buffer pool saturated.

Tests for tpcc-mysql were using 32, 64, and 128 threads, while for StressTool I was using 16, 32, and 64 threads (multiply for the 3 application machines).

Compliance Tests Results

Java Stress Tool
Execution Time


In this test, we have the applications performing concurrent access and action in read/write rows on several tables. It is quite clear from the picture above that MySQL/Galera was able to process the load more efficiently than Aurora. Both platforms had a significant increase in the execution time with the increase of the concurrent threads.

Both platforms reach saturation level with this test, using a total of 192 concurrent threads. Saturation was at different moment and hitting a different resource during the 192 concurrent threads test; in the case of Aurora it was CPU-bound and there was replication lag; for MySQL/Galera the bottlenecks were i/o and flow control.

Rows Insert and Read


In relation to the rows managed, MySQL/Galera performed better in terms of quantity of rows managed, but this trend went significantly down, while increasing the concurrency. Both read and write operations were affected, while Aurora was managing less in terms of volume, but became more consistent while concurrency increased.

Com Select Insert Delete


Analyzing the details by type of command, it is possible to identify that MySQL/Galera was more affected in the read operations, while writes showed less variation.

Handlers Calls


In write operation Aurora was inserting a significantly less volume of rows, but it was more consistent with concurrency increase. This is probably because the load exceed the capacity of the platform, and Aurora was acting at its limit. MySQL/Galera was instead able to manage the load with 2 times the performance of Aurora, also if the increasing concurrency was affecting negatively the trend.




The Tpcc-mysql is emulating the CRUD activities of transaction users against N warehouses. In this test, I used 200 warehouses; each warehouse has 10 terminals, with 32, 64, and 128 threads.

Once more, MySQL/Galera is consistently better than Aurora in terms of volume of transactions. Also the average per-second results were consistently over almost 2.6 times better than Aurora. On the other hand, Aurora shows less fluctuation in serving the transactions, having a more consistent trend for each execution.

Average Transactions




High Availability

MHA excluded, the other two platforms were shown to be able to manage the failover operation in a limited time frame (below 1 minute); nevertheless MySQL/Galera was shown to be more efficient and consistent, especially in consideration of the unexpected and sometime not fully justified episodes of Aurora replication lags. This result is a direct consequence of the synchronous replication, that by design brings MySQL/Galera in to not allow an active node to fell behind.

In my opinion the replication method used in Aurora, is efficient, but it still allows node misalignments, which is obviously not optimal when there is the need to promote a read only node to become a read/write instance.


MySQL/Galera was able to outperform Aurora in all tests -- by execution time, number of transactions, and volumes of rows managed. Also, scaling up the Aurora instance did not have the impact I was expecting. Actually it was still not able to match the EC2 MySQL/Galera performance, with less memory and CPUs.

Note that while I had to exclude the MHA solution because of the failover time, the performance achieved using standard MySQL were by far better than MySQL/Galera or Aurora, please see Appendix 1.

General Comment on Aurora

The Aurora failover mechanism is not so much different as other solutions. In case of a crash of a node another node will be elected as new primary node, on the basis of the "most up-to-date" rule.

Replication is not illustrated clearly in the documentation, but it seems to be a combination of block device distribution and semi-sync replication, meaning that a primary is not really affected by the possible delay in the copy of a block once its dispatched. What is also interesting is the way the data of a volume will be fixed in case of issues that will happen copying the data over from another location or volume that hosts the correct data. This resembles the HDFS mechanism, and may well be exactly it; what is relevant is that if HDFS, the latency in the operation may be relevant. What is also relevant in this scenario is the fact that if the primary node crashes, during the election of a secondary to primary, there will be service interruption; this can be up to 2 minutes according to documentation and verified in the tests.

About replication, it is stated in the documentation that the replication can take ~100 ms, but that is not guaranteed and is dependent on the level of traffic in writes incoming to the primary server. I have already reported that this is not true, and replication can take significantly longer.

What happened during the investigation is that, the more writes the more possible distance could exists between the data visible in the primary and the one visible in the replica (replication lag). No matter how efficient the replication mechanism is, this is not synchronous replication, and this does not guarantee consistent data reading by transaction.

Finally, replication across regions is not even in the design of the Aurora solution, and it must rely on standard replication between servers as with asynchronous MySQL replication. Aurora is nothing more, nothing less than an RDS with steroids, and with smarter replication.

Aurora is not performing any kind of scaling in writes, scaling is performed in reads. The way it scales in write is by scaling up the box, so more power and memory = more writes, nothing new and obviously scaling up is also more cost. That said, I think Aurora is a very valuable solution when in need to have a platform that requires extensive read scaling (in/out), and for rolling out a product in phase 1.

Appendix MHA

MHA performance Graphs

As previously mention MySQL/MHA was acting significantly better the other two solutions. IIBench test complete in 281 seconds against the 4039 of Galera.

IIBench Execution Time

MySQL/MHA execution time (Ingest & Compliance)


The execution of the Ingest and compliance test in MySQL/MHA had been 3 time faster than MySQL/Galera.

MySQL/MHA Rows Inserted (Ingest & Compliance)


The number of inserted rows is consistent with the execution time, being 3 times the one allowed in the MySQL/Galera solution. MySQL/MHA was also able to better manage the increasing concurrency with simple inserts or in the case of concurrent read/write operations.

Last Updated on Monday, 02 November 2015 00:47
Percona Live Amsterdam 2015 PDF Print E-mail
Written by Marco Tusa   
Sunday, 20 September 2015 15:27

On Monday 21 September Percona Live will start in Amsterdam.

The program is full of interesting topics and I am sure a lot of great discussions will follow.

I whish all my best to all my colleagues, friends and customers that will attend it. Have fun guys and drink a couple of beer for me as well.


That is it, I had decided to do not submit speech(es) and to do not come this year, not only to Percona Live but to most or all the conferences.

I want to stay focus on my customers for now, and be present as much as I can for my teammates.

We have so much going on that an effort in that direction must be done, and the few time left ... well I have to read a lot of intersting stuff not Tech related.


So have fun, learn, teach, listen and talk ... but on top of all share and keep the spirit high, these are hard times and events like Percona Live are important.

I will miss it ... but as said sometime we have to choose our priorities.


Great MySQL (still talking about MySQL right??) to everybody

Last Updated on Sunday, 20 September 2015 15:54
Why you should be careful when Loading data in MySQL with Galera. PDF Print E-mail
Written by Marco Tusa   
Saturday, 08 August 2015 18:01

An old story that is not yet solve.


Why this article.

Some time ago I had open a bug report to codership through Seppo.

The report was about the delay existing in executing data load with FK. (

The delay I was reporting at that time were such to scare me a lot, but I know talking with Alex and Seppo that they were aware of the need to optimize the approach an some work was on going.

After some time I had done the test again with newer version of PXC and Galera library.

This article is describing what I have found, in the hope that share information is still worth something, nothing less nothing more.

The tests

Tests had being run on a VM with 8 cores 16GB RAM RAID10 (6 spindle 10KRPM).

I have run 4 types of tests:

  • Load from file using SOURCE and extended inserts
  • Load from SQL dump and extended inserts
  • Run multiple threads operating against employees tables with and without FK
  • Run single thread operating against employees tables with and without FK

 For the test running against the employees’ db and simulating the client external access, I had used my own stresstool.

The tests have been done during a large period of time, given I was testing different versions and I had no time to stop and consolidate the article. Also I was never fully convinced, as such I was doing the tests over and over, to validate the results.

I have reviewed version from:

Server version:                        5.6.21-70.1-25.8-log Percona XtraDB Cluster binary (GPL) 5.6.21-25.8, Revision 938, wsrep_25.8.r4150


Server version:                        5.6.24-72.2-25.11-log Percona XtraDB Cluster binary (GPL) 5.6.24-25.11, Revision, wsrep_25.11

With consistent behavior.


What happened

The first test was as simple as the one I did for the initial report, and I was mainly loading the employees db in MySQL.

time mysql -ustress -ptool -h -P3306 < employees.sql

Surprise surprise … I literally jump on the chair the load takes 37m57.792s.

Yes you are reading right, it was taking almost 38 minutes to execute.

I was so surprise that I did not trust the test, as such I did it again, and again, and again.

Changing versions, changing machines, and so on.

No way… the time remain surprisingly high.

Running the same test but excluding the FK and using galera was complete in 90 seconds, while with FK but not loading the Galera library 77 seconds.

Ok something was not right. Right?

I decide to dig a bit starting from analyzing the time taken, for each test.

See image below:




From all the tests the only one not align was the data loading with FK + Galera .

I had also decided to see what was the behavior in case of multiple threads and contention.

As such I prepare a test using my StressTool and run two class of tests, one with 8 threads pushing data, the other single threaded.

As usual I have also run the test with FK+Galera, NOFK+Galera, FK+No Galera.

The results were what I was expecting this time and the FK impact was minimal if any, see below:




The distance between execution was minimal and in line with expectations.

Also it was consistent between versions, so no surprise, I relaxed there and I could focus on something else.

On what?

Well why on the case of the load from file, the impact was so significant.

The first thing done was starting to dig on the calls, and what each action was really doing inside MySQL.

To do so I have install some tools like PERF and OPROFILE, and start to dig into it.

First test with FK+Galera taking 38 minutes, was constantly reporting a different sequence of calls/cost from all other tests.

57.25%  [kernel]                      [k] hypercall_page

35.71%  [.] 0x0000000000010c61

2.73%                  [.] __strlen_sse42

0.16%  mysqld                        [.] MYSQLparse(THD*)

0.14%  [.] strlen@plt

0.12%              [.] galera::KeySetOut::KeyPart::KeyPart(galera::KeySetOut::KeyParts&, galera::KeySetOut&, galera::K

0.12%  mysqld                        [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned

0.09%                  [.] memcpy

0.09%                  [.] _int_malloc

0.09%  mysqld                        [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long,

0.08%  mysql                         [.] read_and_execute(bool)

0.08%  mysqld                        [.] ha_innobase::wsrep_append_keys(THD*, bool, unsigned char const*, unsigned char const*)

0.07%                  [.] _int_free

0.07%              [.] galera::KeySetOut::append(galera::KeyData const&)

0.06%                  [.] malloc

0.06%  mysqld                        [.] lex_one_token(YYSTYPE*, THD*)


Comparing this with the output of the action without FK but still with Galera:

75.53%  [kernel]                      [k] hypercall_page

1.31%  mysqld                        [.] MYSQLparse(THD*)

0.81%  mysql                         [.] read_and_execute(bool)

0.78%  mysqld                        [.] ha_innobase::wsrep_append_keys(THD*, bool, unsigned char const*, unsigned char const*)

0.66%  mysqld                        [.] _Z27wsrep_store_key_val_for_rowP3THDP5TABLEjPcjPKhPm.clone.9

0.55%  mysqld                        [.] fill_record(THD*, Field**, List<Item>&, bool, st_bitmap*)

0.53%                  [.] _int_malloc

0.50%                  [.] memcpy

0.48%  mysqld                        [.] lex_one_token(YYSTYPE*, THD*)

0.45%              [.] galera::KeySetOut::KeyPart::KeyPart(galera::KeySetOut::KeyParts&, galera::KeySetOut&, galera::K

0.43%  mysqld                        [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long,

0.43%  mysqld                        [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned

0.39%  mysqld                        [.] trx_undo_report_row_operation(unsigned long, unsigned long, que_thr_t*, dict_index_t*, dtuple_t

0.38%              [.] galera::KeySetOut::append(galera::KeyData const&)

0.37%                  [.] _int_free

0.37%  mysqld                        [.] str_to_datetime

0.36%                  [.] malloc

0.34%  mysqld                        [.] mtr_add_dirtied_pages_to_flush_list(mtr_t*)


What comes out is the significant difference in the FK parsing.

The galera function


KeySetOut::KeyPart::KeyPart (KeyParts&  added, 
                             KeySetOut&     store,
                             const KeyPart* parent,
                             const KeyData& kd,
                             int const      part_num) 



is the top consumer before moving out to share libraries.

After it the server is constantly calling the strlen function, as if evaluating each entry in the insert multiple times.

This unfortunate behavior happens ONLY when the FK exists and require validation, and ONLY if the Galera library is loaded.

It is logic conclusion that the library is adding the overhead, probably in some iteration, and probably a bug.


Running the application tests, using multiple clients and threads, this delay is not happening, at least with this level of magnitude.

During the application tests, I had be using batching insert up to 50 insert for SQL command, as such I could have NOT trigger the limit, that is causing the issue in Galera.

As such, I am not still convinced that we are “safe” there and I have in my to do list to add this test soon, in the case of significant result I will append the information, but I was feeling the need to share in the meanwhile.


The other question was, WHY the data load from SQL dump was NOT taking so long?

That part is easy, comparing the load files we can see that in the SQL dump the FK and UK are disable while loading, as such the server skip the evaluation of the FK in full.

That’s it, adding:




To the import and setting them back after, remove the delay and also the function calls become “standard”.




This short article has the purpose of:

  • Alert all of you of this issue in Galera and let you know this is going on from sometime and has not being fix yet.
  • Provide you a workaround. Use SET FOREIGN_KEY_CHECKS=0, UNIQUE_CHECKS=0; when performing data load, and rememeber to put them back (SET FOREIGN_KEY_CHECKS=1, UNIQUE_CHECKS=1;).
    Unfortunately, as we all know, not always we can disable them, Right? This brings us to the last point.
  • I think that Codership and eventually Percona, should dedicate some attention to this issue, because it COULD be limited to the data loading, but it may be not.





I have more info and oprofile output that I am going to add in the bug report, with the hope it will be processed.


Great MySQL to everyone …

Last Updated on Saturday, 08 August 2015 18:22
Performance Schema … How to (Part1) PDF Print E-mail
Written by Marco Tusa   
Friday, 29 May 2015 18:07

Performance Schema (PS) has been the subject of many, many recent discussions, presentations, and articles.  After its release in MySQL 5.7, PS has become the main actor for people who want to take the further steps in MySQL monitoring. At the same time, it has become clear that Oracle intends to make PS powerful with so many features and new instrumentation that old-style monitoring will begin to look like obsolete tools from the Stone Age.

This article will explain PS and provide guidance on what needs to be done in order to use it effectively.

What I am not going to do is to dig into specific performance issues or address polemics about what PS is and what, in a Utopian vision, it should be. I have seen too many presentations, articles and comments like this and they are not productive, nor are they in line with my target which is: keep people informed on how to do things EASILY.

For the scope of this article I will base my code mainly on version MySQL 5.7, with some digression to MySQL 5.6, if and when it makes sense.


Basic Concepts

Before starting the real how-to, it is my opinion that we must cover a few basic concepts and principles about PS. The primary goal of the Performance Schema is to measure (instrument) the execution of the server. A good measure should not cause any change in behavior. To achieve this, the overall design of the Performance Schema complies with the following, very severe design constraints:

  • The parser is unchanged. Also, there are no new keywords or statements. This guarantees that existing applications will run the same way with or without the Performance Schema.
  • All the instrumentation points return "void", there are no error codes. Even if the performance schema fails internally, execution of the server code will proceed.
  • None of the instrumentation points allocate memory. All the memory used by the Performance Schema is pre-allocated at startup, and is considered "static" during the server life time.
  • None of the instrumentation points use any pthread_mutex, pthread_rwlock, or pthread_cond (or platform equivalents). Executing the instrumentation point should not cause thread scheduling to change in the server.

In other words, the implementation of the instrumentation points, including all the code called by the instrumentation points is:

  • Malloc free
  • Mutex free
  • Rwlock free


Currently, there is still an issue with the usage of the LF_HASH, which introduces memory allocation, though a plan exists to be replace it with lock-free/malloc-free hash code table.

The observer should not influence the one observe. As such, the PS must be as fast as possible, while being less invasive. In cases when there are choices between:

Processing when recording the performance data in the instrumentation.


Processing when retrieving the performance data.

Priority is given in the design to make the instrumentation faster, pushing some complexity to data retrieval.

Performance schema was designed while keeping an eye on future developments and how to facilitate the PS usage in new code. As such, to make it more successful, the barrier of entry for a developer should be low, so it is easy to instrument code. This is particularly true for the instrumentation interface. The interface is available for C and C++ code, so it does not require parameters that the calling code cannot easily provide, supports partial instrumentation (for example, instrumenting mutexes does not require that every mutex is instrumented). The Performance Schema instrument interface is designed in such a way that any improvement/additions in the future will not require modifications, as well as old instrumentation remaining unaffected by the changes.

The final scope for PS is to have it implemented in any plugin included in MySQL, although pretending to have them always using the latest version will be unrealistic in most cases. Given that the Performance Schema implementation must provide up to date support, within the same deployment, multiple versions of the instrumentation interface must ensure binary compatibility with each version.

The importance of flexibility means we may have conditions like:

  • Server supporting the Performance Schema + a storage engine that is instrumented.
  • Server supporting the Performance Schema + a storage engine that is not instrumented.
  • Server not supporting the Performance Schema + a storage engine that is instrumented.



Finally, we need to take in to account that the Performance Schema can be included or excluded from the server binary, using build time configuration options, with exposure in the compiling interface.

Performance Schema Interfaces

As mentioned above, PS can be excluded from code at the moment of the code compilation, thanks to the PS compile interface. This interface is one of seven that are present in PS. The full list is:

  • Instrument interface
  • Compiling interface
  • Server bootstrap interface
  • Server startup interface
  • Runtime configuration interface
  • Internal audit interface
  • Query interface

Instrument Interface:

This is the one that allows plugin implementers to add their instruments to PS. In general the interface is available for:

  • C implementations
  • C++ implementations
  • The core SQL layer (/sql)
  • The mysys library (/mysys)
  • MySQL plugins, including storage engines,
  • Third party plugins, including third party storage engines.


Compiling Interface:

As mentioned earlier, this is used during the build and will include or exclude PS code from the binaries.

Server Bootstrap Interface:

This is an internal private interface, which has the scope to provide access to the instructions demanded and create the tables for the PS itself.

Server Startup Interface:

This interface will expose options used with the mysqld command line or in the my.cnf, required to:

  • Enable or disable the performance schema.
  • Specify some sizing parameters.


Runtime Configuration Interface

This is one of the two most important interfaces for DBAs and SAs. It will allow the configuration of the PS at runtime. Using the methods expose by this interface, we will be able to configure what instruments, consumers, users and more we want to have active. This interface uses standard SQL and is very easy to access and use. Also, it is the preferred method to activate or deactivate instruments. Thus, when we start the server we should always enable the PS with all the instruments and consumers deactivated, and use this interface to choose only the ones we are interested in.

Internal Audit Interface:

The internal audit interface is provided to the DBA to inspect if the Performance Schema code itself is functioning properly. This interface is necessary because a failure caused while instrumenting code in the server should not cause failures in the MySQL server itself, and in turn the performance schema implementation never raises errors during runtime execution. To access the information a DBA just needs to issue the SHOW ENGINE PERFORMANCE SCHEMA STATUS; command.

Query Interface:

Lastly, this interface is the one that allows us to access the collected data, and to perform data filtering, grouping, join, etc. It will also allow access to a special table like the summary tables and digest, which will be discussed later on.

Consumers and Instruments

Another important concept in PS to understand is the difference between Instruments and Consumers.


Instruments are the ones collecting raw data where the calls are embedded in the code, such as:


    { result= index_prev(buf); })


In this case the code refers to the MYSQL_TABLE_IO_WAIT function declared in the class (<mysql_root_code>/sql/ If enabled in the compilation phase the above function will provide PS the information related to specific table io_wait.

The instruments demanded to manage that data collection is: wait/io/table/sql/handler.

The naming convention for the instruments is quite easy. The first part wait is the name of the Top-level Instrument component (list later), the second io is the observed condition, and table is the object.  The remaining suffix is referring to more specific plugin implementations and includes innodb, myisam, sql or names like IO_CACHE::append_buffer_lock. In the above example it refers to the Handler class in SQL tree.


Instruments are organized by top level components like:

  • Idle: An instrumented idle event. This instrument has no further components.
  • Memory: An instrumented memory event.
  • Stage: An instrumented stage event.
  • Statement: An instrumented statement event.
  • Transaction: An instrumented transaction event. This instrument has no further components.
  • Wait: An instrumented wait event.

 Each top level has an n number of instruments:


| name        | Numb |
| idle        |    1 |
| memory      |  367 |
| stage       |  117 |
| statement   |  191 |
| transaction |    1 |
| wait        |  297 |


We can and should keep in consideration that, it is best practice to enable only the instruments we may require for the time we need them. This can be achieved using the re-using the runtime interface (I will explain how exactly later on).

There exists official documentation ( providing more detailed information about the list of what is available for each Top Component.


The Consumers are the destination of the data collected from the instruments. Consumers have different scope and timelines. Also, consumer like event statements has many different tables like:

  • Current
  • History
  • History long
  • Summaries (by different aggregation)
  • Summary Digest (like what we can find by processing the slow query log)

 Once more it is important to define what we are looking for and enable only what we need. For instance, if we need to review/identify the SQL with the most impacting, we should enable only the events_statements_current, events_statements_history and events_statements_summary_by_digest. All the other consumers can stay off. It is also important to keep in mind that each event may have a relation with another one. In this case, we will be able to navigate the tree relating the events using the fields EVENT_ID and NESTING_EVENT_ID where the last one is the EVENT_ID of the parent.

Pre-Filtering vs. Post-filtering

We are almost there, stay tight! Another important concept to understand is the difference between post and pre-filtering. As I mentioned, we can easily query the Consumer tables with SQL, we can create complex SQL to join tables and generate complex reports. But this can be quite heavy and resource consuming, especially if we want to dig on specific sections of our MySQL server.

In this case we can use the pre-filtering approach. The pre-filtering is basically a way to tell to PS to collect information ONLY from a specific source like user/IP (actors) or Object(s) like Tables, Triggers, Events, and Functions. The last one can be set at a general level or down to a specific object name.

The pre-filtering with the activation of the right instruments and consumer is a powerful way to collect the information without overloading the server with useless data. It is also very easy to implement given we just need to set the objects and/or actors in the setup tables as we like.


Rolling the Ball, Setup the PS for Observation as Start

Now that we have covered the basic concepts we can start to work on the real implementation.

Compile the Source Code:

As mentioned earlier, we can use the compile interface to include or exclude features from the code compilation. The available options are:

  • DISABLE_PSI_COND Exclude Performance Schema condition instrumentation
  • DISABLE_PSI_FILE Exclude Performance Schema file instrumentation
  • DISABLE_PSI_IDLE Exclude Performance Schema idle instrumentation
  • DISABLE_PSI_MEMORY Exclude Performance Schema memory instrumentation
  • DISABLE_PSI_METADATA Exclude Performance Schema metadata instrumentation
  • DISABLE_PSI_MUTEX Exclude Performance Schema mutex instrumentation
  • DISABLE_PSI_RWLOCK Exclude Performance Schema rwlock instrumentation
  • DISABLE_PSI_SOCKET Exclude Performance Schema socket instrumentation
  • DISABLE_PSI_SP Exclude Performance Schema stored program instrumentation
  • DISABLE_PSI_STAGE Exclude Performance Schema stage instrumentation
  • DISABLE_PSI_STATEMENT Exclude Performance Schema statement instrumentation
  • DISABLE_PSI_STATEMENT_DIGEST Exclude Performance Schema statement_digest instrumentation
  • DISABLE_PSI_TABLE Exclude Performance Schema table instrumentation

This level of detail is so granular that we can only include the things we are planning to use.

The positive aspect of doing so at the compilation level is that we will be sure no one will mess-up adding undesired instruments. The drawback is that if we change our mind and we decide we may need the ones we had excluded, we will have to compile the whole server again.

As a result, I would say that using this approach is not for someone that is just starting to use PS. Given you are still discovering what is there, it make sense to compile with all the features (default).

Configure PS in my.cnf:

To set the PS correctly in the my.cnf is quite important, so I strongly suggest disabling any instrument and consumer at the start-up. They can be enabled by the script later, and that would be much safer for a production database.

I normally recommend a section like the following:






The settings above will start the server with PS as “enabled”, but all the instruments and consumer will be OFF. Well, this is not entirely true, as for the moment of the writing (MySQL 5.7.7) once the PS is enabled the instruments related to memory/performance_schema are enabled regardless, which make sense given they are dedicated to monitor the memory utilization of PS.

A final note about the configuration is that we can decide to use the counting option of the instruments instead, capturing the latency time. To do so, we just have to declare it as: performance_schema_instrument='statement/sql/%=COUNTED'

In this case I had set that ALL the SQL statements should be counted.

Start Server and Set Only the Users We Need:

Once we have started our MySQL server, we are almost ready to go.

This is it, given we start it with NO instruments, we have to decide where to begin, and given we all know the most impacting factor in a database server is how we query it, we will start from there. In turn, analyzing what is going from the SQL point of view. Although, I want to catch the work coming from my application user, not from everywhere. Given this we can set the user in the actor table. This is very simple given we will use the Runtime configuration interface which uses SQL syntax.

So, let say I want to trace only my application user named stress running from machines in the range. I will need to:


UPDATE setup_actors SET ENABLED='NO' WHERE user='%'; 
INSERT INTO setup_actors VALUES('10.0.0.%','stress','%','YES');
(root@localhost) [performance_schema]>select * FROM setup_actors;
| HOST     | USER   | ROLE | ENABLED |
| %        | %      | %    | NO      |
| 10.0.0.% | stress | %    | YES     |
2 rows IN SET (0.00 sec)



Great, from now on PS will only focus on my user stress, so now let us decide what to enable for instruments and consumers.

Once more using SQL command we will enable all the instruments related to SQL statements, but wait a minute, if you check the instrument table, you will see we have several variations of the statements instrument:

  • SQL
  • SP
  • Scheduler
  • Com
  • Abstract

Also, this is not included but relevant is the TRANSACTION. For now, we will only enable the SQL, ABSTRACT, Scheduler and Transaction.

SQL will be:


UPDATE  setup_instruments SET ENABLED='YES' WHERE ENABLED='NO' AND name LIKE 'statement/abstract/%'; 
UPDATE  setup_instruments SET ENABLED='YES' WHERE ENABLED='NO' AND name LIKE 'statement/sql/%';
UPDATE  setup_instruments SET ENABLED='YES' WHERE ENABLED='NO' AND name LIKE 'transaction';
(root@localhost) [performance_schema]>select count(*) FROM setup_instruments
 WHERE ENABLED = 'YES' AND name NOT LIKE 'memory%';
| count(*) |
|      143 |
1 row IN SET (0.01 sec)



We have 143 instruments active. Now we must setup the consumers and choose the destination that will receive the data.

The list of consumers is the following:


(root@localhost) [performance_schema]>select * FROM setup_consumers;
| NAME                             | ENABLED |
| events_stages_current            | NO      |
| events_stages_history            | NO      |
| events_stages_history_long       | NO      |
| events_statements_current        | NO      |
| events_statements_history        | NO      |
| events_statements_history_long   | NO      |
| events_transactions_current      | NO      |
| events_transactions_history      | NO      |
| events_transactions_history_long | NO      |
| events_waits_current             | NO      |
| events_waits_history             | NO      |
| events_waits_history_long        | NO      |
| global_instrumentation           | NO      |
| thread_instrumentation           | NO      |
| statements_digest                | NO      |
15 rows IN SET (0.00 sec)



To enable ANY of them, first we have to enable the GLOBAL one, which works as a global power on/off. The same thing applies for the Thread instrumentation:


UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='global_instrumentation';
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='thread_instrumentation';


Then we need to activate at least the events_statements_current to see something, I suggest activating also history and statements_digest.


UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='events_statements_current';
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='events_statements_history';
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='statements_digest';
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='events_transactions_current';
UPDATE setup_consumers SET ENABLED='YES' WHERE NAME='events_transactions_history';



As result, we will have the following consumers activated:


(root@localhost) [performance_schema]>select * FROM setup_consumers;
| NAME                             | ENABLED |
| events_stages_current            | NO      |
| events_stages_history            | NO      |
| events_stages_history_long       | NO      |
| events_statements_current        | YES     |
| events_statements_history        | YES     |
| events_statements_history_long   | NO      |
| events_transactions_current      | YES     |
| events_transactions_history      | YES     |
| events_transactions_history_long | NO      |
| events_waits_current             | NO      |
| events_waits_history             | NO      |
| events_waits_history_long        | NO      |
| global_instrumentation           | YES     |
| thread_instrumentation           | YES     |
| statements_digest                | YES     |
15 rows IN SET (0.00 sec)



Final optimization for the pre-filtering is to decide IF we want to catch all the objects and reduce them to a subset. By default PS will use the settings below:


(root@localhost) [performance_schema]>select * FROM setup_objects;
| EVENT       | mysql              | %           | NO      | NO    |
| EVENT       | performance_schema | %           | NO      | NO    |
| EVENT       | information_schema | %           | NO      | NO    |
| EVENT       | %                  | %           | YES     | YES   |
| FUNCTION    | mysql              | %           | NO      | NO    |
| FUNCTION    | performance_schema | %           | NO      | NO    |
| FUNCTION    | information_schema | %           | NO      | NO    |
| FUNCTION    | %                  | %           | YES     | YES   |
| PROCEDURE   | mysql              | %           | NO      | NO    |
| PROCEDURE   | performance_schema | %           | NO      | NO    |
| PROCEDURE   | information_schema | %           | NO      | NO    |
| PROCEDURE   | %                  | %           | YES     | YES   |
| TABLE       | mysql              | %           | NO      | NO    |
| TABLE       | performance_schema | %           | NO      | NO    |
| TABLE       | information_schema | %           | NO      | NO    |
| TABLE       | %                  | %           | YES     | YES   |
| TRIGGER     | mysql              | %           | NO      | NO    |
| TRIGGER     | performance_schema | %           | NO      | NO    |
| TRIGGER     | information_schema | %           | NO      | NO    |
| TRIGGER     | %                  | %           | YES     | YES   |
20 rows IN SET (0.00 sec)



It is easy to understand that ANY object existing in the default Schema will be ignored. In our case, for now, we will keep it as it is, but this will be our next filtering step after we have analyzed some data. This will happen in the PART 2, stay tuned.


For now, you should understand what a Performance Schema is, its basic concept, as well as what interfaces are available and for what. You should also be able to compile the source code with and without PS, or part of it. You should be able to configure the MySQL configuration file correctly, and perform the initial configuration at runtime. Finally, you should know how to query the PS and how to dig in the information, which will also be discussed in the Part 2.

Last Updated on Friday, 29 May 2015 18:50
Community dinner @ Pedro’s PDF Print E-mail
Written by Marco Tusa   
Sunday, 12 April 2015 20:54

Folks, as usual Pythian is organizing the community dinner. After many years, food, (responsible) drinking and photos, this event has become an important moment for all of us, to know each other better, discuss and have fun.

This year is also the 20th year for MySQL so … YEAAAH let us celebrate, with more food, fun and responsible drinking.

If you had not done it yet … register yourself here:

Info about the event:

When: Tuesday April 14, 2015 – 7:00 PM at Pedro’s (You are welcome to show up later, too!
Where: Pedro’s Restaurant and Cantina – 3935 Freedom Circle, Santa Clara, CA 95054

I know, I know … we are that kind of people that decide where to go at the last minute, and every years we do the same, but if you could register, that will help us to organize it better … and c’mon the dinner is on Tuesday … so we are almost there!!!

Anyhow, hope to see all of you there, all of you!

Some reference: Menu Eventbrite Pedro

Last Updated on Sunday, 12 April 2015 20:56

Page 6 of 15

Who's Online

We have 48 guests online