Home MySQL Blogs
MySQL

My MySQL tipsvalid-rss-rogers




AWS Aurora Benchmark - Choose the right tool for the job PDF Print E-mail
Written by Marco Tusa   
Thursday, 19 May 2016 00:00

 

Some time ago, I published the article “AWS Aurora Benchmarking - Blast or Splash?”. In which I was analyzing the behavior of different solutions using synchronous replication in AWS environment.

After I published it, I received a lot of comments and feedback, from the community and from Amazon engineers.

Given that I had decide to perform another round of tests, keeping into account the comments received and the suggestions.

I had presented some of the results during the Percona conference in Santa Clara last April 2016. The following is the transposition into an article of that presentation with more details.

 

 

Why new test?

Very good question, with an easy answer.

Aurora is a product that is still under development and code refining, six months of development could present major changes in performance. Not only, the initial tests where focus on entry level solutions, meaning I was analyzing that kind of user, that are currently starting their business and looking for a flexible solution allow them to save money and scale.

This time I had put the focus on enterprise solution, analyzing what an already well establish company would eventually get when in the need to find for a decent scalable solution.

As such two different scenarios.

Why so many (different) tests?

I had used many different benchmarking tool, and I am still planning to run others. Why so? Why don’t simply relay in one of them?

Again simple answer, I had use different tools because in some case they provide me different way of access and use data. Not only, I do not trust benchmarking tools, not even the one I had developed, as such I want to tests same thing using different tools and compare results, ONLY if I see a common pattern, then I consider the test valid. Personally I tend to discard any test is not consistent or analysis performed using a single benchmarking tool. In my opinion be lazy is not an option when doing this kind of exercises.

Tests run

I had run three main kind of tests:

  • Performance and load stress
  • High Availability failover
  • Response time (latency) from application point of view

Performance and load stress

These tests were the most extensive and demanding.

I was analyzing the capacity to serve load in different conditions, from light load up to full utilization and some degree of saturation or resources.

 

First set of tests was to evaluate simple load on a single table causing the table to become a hotspot and showing how the platform would manage the increasing contention.

Second set of tests was to perform similar load but distributing it cross multiple table and batching the operations. Parallelization, contention, scalability and distributed hotspots where in the picture.

The two above were focus on write operation only, and were done using different tools comparing the results given they were complementary.

 

Third set of tests, using my own stresstool, was focus on R/W oriented usage. Performed tests execute against multiple tables, performing CRUD actions, using simple and batch insert, reads by PK, index, by range, IN and exact match conditions.

Fourth set of tests was performed using TPC-C like load (OLTP).

Fifth set of tests was using Sysbench in OLTP mode with 250 tables.

 

Scope of the last three set of tests was to identify how the platforms would had managed the load, considering the following:

  • Read and write contention on the same tables
  • High level of parallelism (from the application)
  • Possible hot-spots (TPCC district)
  • Increasing utilization (memory, threads, IO)
  • Saturation (connections)

Finally, all tests were run with fully utilized BufferPool.

 

About the tests

It was difficult to compare apple with apple here. And I think that is the main point to keep in mind.

Aurora is not a standard RDS solution, as we were used to have.
Aurora looks like MySQL, smell like MySQL, but is not vanilla MySQL.

To achieve what they have to achieve the engineers there had to change many parts, and the more you dig the more you realize there are significant differences.

Because that I had to focus more on identify what each solution can do, comparing solutions against expectations, instead comparing the numbers for the numbers.

 

I was more interested to see, what happen if I have a burst of connections and my application will go from 4K to 40K connections. Will it crash? Will it slow down?

How long I should wait if a node fails?

What should I prevent to have in my schema design, in order to do not have bottlenecks.

 

In this context, those in my opinion, are relevant questions, more than discover that solution A can have 3000 rows written/sec and the other can have 3100.

Or that I may (may) have some additional page rotation, file -> memory-> flush because the amount of memory differs.

 

Those are valuable information too, for sure, but less than have a decent understanding of which platform will help my business to grow and remain stable.

What is the right tool for the Job? This is the question I was addressing.

 

The machines

Small Boxes (first round of tests)

 

EIP = 1
VPC = 1
ELB=1
Subnets = 4 (1 public, 3 private)
HAProxy = 6 
MHA Monitor (micro ec2) = 1
NAT Instance (EC2) =1 (hosting EIP)
DB Instances (EC2) = 3 (m4.xlarge) 16GB
Application Instances (EC2) = 6 (4)
EBS SSD 3000 PIOS
Aurora RDS node = 3 (db.r3.xlarge) 30GB
 

 

 

Large Boxes (latest tests)

 

EIP = 1
VPC = 1
ELB=1
Subnets = 4 (1 public, 3 private)
HAProxy = 4
MHA Monitor (micro ec2) = 1
NAT Instance (EC2) =1 (hosting EIP)
DB Instances (EC2) = 3 (c3.8xlarge) 60GB
Application Instances (EC2) = 4
EBS SSD 5000 PIOS
Aurora RDS node = 3 (db.r3.8xlarge) 244GB
 

 

Note

It was pointed to me that I had deliberately choose to use an Ec2 solutions for PXC with less memory than the one available in Aurora.

This is true, and we must keep in to consideration.
Reason for this is the fact that the only Ec2 solution matching the memory of a db_r3.8xlarge is the d2.8xlarge.

I did try it but the level of scalability I got from the CPU point of view was less efficient than the one available with c3.8xlarge.

Given that I had decide to prefer CPU resource to memory here, especially because I was going to test, concurrency and parallelism in conjunction to load increase.

From the result I got I feel confident that I choose right, but I am open to comment.

 

 

 

The layout

This is how the setup looks like

hootsuite_mysql_ha_failover - poc architecture

Where you read Java, those are the application nodes running the different test applications.

Two words about Aurora first

Aurora has few key concepts that must have clear in mind. Especially how it manages the writes cross replica, and how connections are implemented.

The IO activity

To replicate the information across the different storage, Aurora replicate FRM files and data coming from IB_LOGS only. This is a quite significant advantage to other form of replication, given the limited number of bytes that are replicated over the network also if they are replicated for 6 times.

Screen Shot 2016-05-12 at 9.39.17 AM

image from Amazon Aurora Deep dive

 

Another significant advantage is that Aurora does not use double write buffer which is obviously another blast (see recent optimization in Percona Server https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/ ) .

Simplifying, writes in Aurora are organized filling its commit queue and pushing the changes as group commit to the storage.

Screen Shot 2016-05-12 at 1.34.10 PM

image from Amazon Aurora Deep dive

 

Now in some presentations you may have seen that all steps are asynchronous, but is important to underline that a commit is acknowledge by Aurora when at least 2 AZ had received and wrote the incoming data related to that commit. Writes here means received in the storage node incoming queue, and with a quorum of 4 over 6 nodes.

This means that no matter what, data has to travel on the network reach the final destination and ack signal come back, before Aurora returns the ack to the commit operation. Network is in the same region but still it could represent an incognita about performance. No wonder if we may have some latency at this stage.

As you can see what I am reporting is also confirmed in the image below and in the observations, point is that from that slide is not clear the impact of the step 1 – 2.

Screen Shot 2016-05-12 at 9.41.55 AM

image from Amazon Aurora Deep dive

 

Thread pooling

Oh yes, Aurora use thread pooling, a lot. That will become very clear later, and more the work is based on parallelism, more efficient thread pooling seems to be.

In most cases we are used to see CPUs on database servers not fully utilized, unless some heavy ordering operation or bad query. That behavior is also (not only) a direct consequence of the connection-to-thread model, that imply period of latency and stand by. In Aurora the incoming connections are not following the same model, instead the pool redistributes the load of the incoming connection to a pool of threads, optimizing the latency period, resulting in a higher CPU utilization. Which is what you want from your resource, to be utilized and not sit there waiting for something else to do its job.

 

Screen Shot 2016-05-12 at 9.42.13 AM

image from Amazon Aurora Deep dive

 

 

The results

Without additional waste of electronic ink, let see what comes out by this round of tests (not the final one by the way). To simplify the reading, I will report also the graphs from the first set of tests, but will focus on the latest, Small Boxes = SB, Large Boxes LB.

First test: IIBench

As declared previously my scope was to verify how the two platforms would have reacted to simple load focus on insert on a basic single table, bufferpool was saturated before the running.

SB

iibench_exectime_old

 

LB

iibench_exectime_new 

 

As we can see in presence of a hot spot the Solution using PXC outperform the Aurora, in both cases. What is notable though is that while PXC remain approximately around the same time/performance, Aurora is significantly reducing the time taken. This shows that Aurora was actually taking advantage of the more powerful platform while PXC was not able to.

Analyzing in more details what was happening, we can notice that Aurora is actually performing atomically better. It was able to manage more writes/second as well as rows and page managed. But it was inconsistent, Aurora was having performance hiccups at regular intervals. As such the final result was that it takes more time to process the whole workload.

I was not able to dig a lot given some metrics are not fully available in Aurora, as such I had to fully rely on Aurora engineers who mention me the hot-spot contention as possible issue.

 

Aurora Handler calls

iibench_aurora_handlers

PXC Handlers calls

iibench_pxc_handlers

 

The execution in PXC is showing less calls but constant performance, while Aurora has hiccups.

Aurora Page Activity Write

iibench_aurora_page

PXC Page Activity Write  

iibench_pxc_page

 

The trend shown by the handler stay consistent in the page management and rows insert, as logically expected.

Second test Application ingest

As mention this test see many threads from different application servers inserting by batch of 50 statement against multiple tables.

The results coming from this test are quite in favor to Aurora, as we can see starting from the time taken to complete the same workload:

LB

app_ingest_exec_time_old

 

 

SB

app_ingest_exec_time_new

While with small ones the situation was the inverse.

But here starts the interesting part.

Aurora is able to manage significantly higher number rows as the picture below shows

app_ingest_rows_inserted_old

The results are also quite constant and not significantly decreasing like the inserts with PXC.

But the number of Handler commits are significantly less.

 

 

 

 

app_ingest_rows_inserted_new

 

Once more they stay the same on the load increase, without impacting performance.

Reviewing all Handlers call we have a first surprise

PXC Handlers calls

 

app_ingest_pxc_handlers

Aurora Handlers calls

 

app_ingest_aurora_handlers

The gap/drop existing in the two graphs are the different tests (with increasing number of threads)

We have two things to notice here, the first one is that PXC has a decrease in performance while processing the load, while Aurora has not. The second (you need to zoom the image) the number of commit is floating in PXC while it stays fix in Aurora.

Even bigger surprise comes up when reviewing the connections graphs.

As expected PXC is having all my connections open there and the number of threads running is quite close to what is the number of the threads connected.

app_ingest_pxc_connections

And both of them follow the increasing number of connected threads.

But this is not the case in Aurora.

app_ingest_aurora_connections

Also if my applications are actually trying to open ~800 threads, the Aurora node see only a part of them, and the number of running is fix to 32 threads.

Thing to consider here are the following, first my applications does not connect directly to Aurora instance, but to a connector (MariaDB). Second Aurora, in this case, cap the number of running threads to the number of CPU available on the instance (here 32).

Given that I may expect to have worse performance, but I do not.

The fact that Aurora use one thread for multiple connections seems working quite efficiently here.

See also the number of Rows inserted is consistent with the handler calls and better performing than PXC.

Aurora Rows inserted

app_ingest_aurora_rows

 

 

PXC Rows inserted

 

app_ingest_pxc_rows

 

 

Again we have the same trend, only this time we have Aurora able to perform definitely better than PXC.

 

Third test: OLTP application

When run on the small boxes this test saw PXC performing tons’ time better then Aurora,

The time taken by Aurora was ~3 times the one taken by PXC

app_oltp_exec_time_old

With large box I had the inverse, Aurora is outperforming PXC by many times, being from two up to almost 7 times faster then PXC.

app_oltp_exec_time_new

 

Analyzing the number of commands executed with increasing workload, we can see how PXC is able to perform better than Aurora with a workload of 128 threads, but is starting to have worse performance as the load increase.

On the other hand, Aurora is able to manage the load and in read/write without significant performance loss, that include being able to increase the number of commits/sec.

commands_OLTP

Reviewing the Handler calls, we see gain that the Handler commit calls are significantly less in Aurora as already noticed in the ingest tests.

 

app_oltp_handlers_call

The other thing to notice, is that the number of calls for PXC is significantly higher and not scaling, while Aurora has a nice scaling trend.

Forth Test: TPCC-mysql

Tpcc test is main to test OLTP traffic, with the note that some tables like district my become a hotspot. The tests I run were executed against 400 warehouse and using 128 Threads as maximum for the small box and 2048 threads for the Large.

During this test I hit one of the Aurora limitation and I had escalated that to the Aurora engineers, who are aware of the problem.

Small boxes

tpcc_old

 

In the case of small boxes, there is nothing to say, PXC is able to manage the load more efficiently, also if his trend is not optimal having significant fluctuation. Aurora is just not able to keep the it up.

Large boxes

Different and a bit more complex scenario in the case of the use of large boxes.

I would like to say that Aurora is performing better:

tpcc_new

 

And as you can see this is true for 2 tests over 3, and up to when it got stuck by internal limitation, Aurora was also performing better on the 3td. But then its performance just collapse.

Performing more in depth investigation I noticed that under the hood, Aurora was not performing as well as it looks like.

That comes out quite clear performing comparison between few graphs covering Comm_ execution, Open Files, Handlers and Innodb row lock time.

In all of them is quite evident how PXC is able to keep serving the workload with consistent behavior, while Aurora fails from the second test on (512 threads), and not only on the 3td with 2048 threads.

Aurora

 tppc_aurora_com

PXC

 

tppc_pxc_com

 

It is clear how Aurora was better serving during the test with 256 threads going over the 450K com select serve (in 10 sec interval), comparing with PXC that was not able to go over 350K.

But in the tests after while PXC was able to keep going, also if with decreasing performance, Aurora was starting to struggle, with very inconsistent behavior.

This was also confirmed by the open files graph

 

Aurora

tppc_aurora_files

PXC

tppc_pxc_files

The graphs show the instance of files open during the running, not the one already open.
It reflect the Open_file metric “The number of files that are open. This count includes regular files opened by the server. It does not include other types of files such as sockets or pipes. Also, the count does not include files that storage engines open using their own internal functions rather than asking the server level to do so”. I was quite surprise by the number of files open by Aurora.

Handlers as well were reflecting the same behavior

Aurora

 

tppc_aurora_handlers

PXC

tppc_pxc_handlers

tppc_pxc_handlers

 

Perfectly in line with the Com trend.

So what was instead reversely increasing?

Aurora

 

tppc_aurora_locks

PXC 

tppc_pxc_locks

As you can see from the above, the exactly same workload, had generate an increasing lock row time, from quite low in the test with 256 threads, up to crazy high in the one with 2048 threads.

As mention we know that TPCC has a couple of tables that works as hotspots, and we had already saw with IIbench how Aurora is not working efficiently in that cases.

As additional information during the tests, I was getting a lot of 188 errors, this is an Aurora internal error. When I report it, I was told, they know about it, and they are planning to work on it.

I hope they will do soon, because if this issue is solved it is very likely that Aurora will not only be able to manage the tested workload, but go over it by far.

I am saying this because also with the identified issues Aurora was able to keep going and manage a more then decent response time during the test2 with 512 threads.

 tppc_response_time

Fifth test: Sysbench

I add the sysbench tests to test the scalability, and to see the what happen when the system reaches the saturation point.

This test brought up some limitation existing in the Aurora solution, more related to the connector than the Aurora engine itself.

Aurora has a limit of 16k connection, as said I was looking to see what happens if I got to saturation point or close to it. It doesn’t matter if this is a crazy high number or not.

What happened is that I was able to have Aurora managing traffic up to 4K but the more I was going close to the limit, the more I was having issue in connectivity, more than anything else.

At the end I had to run the test with 8k 12k and 20k threads pointing directly to the Aurora instance, bypassing the connector that was not able to serve the traffic.

After that I was able to hit up to ~15500 Threads but with a lot of inconsistent performance. Given that I am defining the limit of meaningful test to the previous level of 12K threads.

PXC was able to scale up to 16K no problem.

What also is notable here is that Aurora was able to mange the workload more efficiently in terms of transaction handling as transactions executed and latency.

sys_bench_high_threads_trnsactions

 


The number of transaction executed by Aurora were ~three times the one executed by PXC.

 

 

sys_bench_high_threads_latency

 

Also in term of latency Aurora was showing less latency then PXC.

Internally Aurora and PXC operations were once more different in terms of how the workload was handle. The most diverging result was the handlers calls.

sys_bench_high_threads_Handlers_write

Commit calls in Aurora were a fraction of the calls in PXC, while the number of rollback was higher.

The read calls had an even more diverging behavior, with PXC performing high number of read_keys, while Aurora was having a very limited number of them. Read_rnd are very high in PXC but totally absent in Aurora (note that in Aurora, read_rnd are reported but seems not really increasing).  On the other hand, Aurora report a high number of read_rnd_next while PXC has none.

 

sys_bench_high_threads_Handlers_read

HA availability

Fail-over time

 

Both solutions

ha_time  

In this test the fail-over time had seen the solution using Galera and HAProxy to be more efficient. That was happening with limited or mid level load, one assumption is that given Aurora has in any case to verify the status of the data transmitted and its consistency across the 6 data store node, the process is not so fast as it could be.

 

Or, another assumption, it could be that the cluster connector is not as efficient as it should in redirecting the traffic from one node to another. It would be a very interesting exercise to replace it with some other custom solution.

 

Note that I was performing the tests following the Amazon indication to use the following to simulate a real crash:

ALTER SYSTEM CRASH [INSTANCE|NODE]

As such, I was not doing anything strange or out of the ordinary.

 

It is worth mentioning that of the 8 seconds taken by MySQL/Galera to perform the failover, 6 were due to the HAProxy settings which had 3000 ms interval and 2 loops in the settings before executing failover.

Execution latency

The scope of this tests, was to identify the latency existing between the moment that application send the request, and the moment MySQL/Aurora take the request in “charge”.

The expectation is that the more the database will get busy, the longer latency will exist.

For this test I had report both results, the one coming from old test with small box and the new one with large box.

 

Small Boxes  

ha_latency_old

Large boxes

ha_latency_new

It is clear from the graphs that the two tests report a different scenario.

In the first Galera was able to manage the load more efficiently and serve requests with lower latency.

 

For the new tests, I had utilized higher number of threads than the ones for the small box, nevertheless in the second CPUs utilization and the number of running threads drive me to think Aurora was finally able to to utilize the resource more efficiently and the latency, just drop.

To mention, that latency was jumping up again when the number of connection was going above the 12K, but that was expected given previous tests results.

Conclusions

High Availability

The two platforms were shown to be able to manage the failover operation in a limited time frame (below 1 minute).

Nevertheless, MySQL/Galera was shown to be more efficient and consistent.

This result is a direct consequence of the synchronous replication, that by design brings MySQL/Galera in to not allow an active node to fell behind.

In my opinion the replication method used in Aurora, is efficient, and given data is shared across the read replicas, fail-over should happen faster.

I had suffered a lot during the tests because the connector, and I have the feeling that having another solution in place may bring some surprise, and actually I would really like to test that as well.

 

Performance

In this run of tests Aurora was able to invert the results I had in the first test with the small boxes. In almost all cases I had Aurora performing as well or better then PXC. There are still cases where Aurora is penalized and those are the ones where hotspots are present, and contention in Aurora is killing the performance, and raise errors (188). But I hope we will see a significant evolution soon.

 

 

General comments on Aurora

The product is evolving quickly, and benchmark results may become obsolete in very short time, this is why is important to have repeatable and comparable tests.

From my point of view, in this set of tests Aurora had clearly show where it fits better.  

Critical applications that require High Available platform, and a lot of CPU power.

There is no reason to use Aurora in small-mid boxes, the platform is not going to be as efficient as a standard solution like PXC.

But if cost is not an issue, and the application really require a lot of parallelism, Aurora on db.r3.8xlarge is a good solution.

I still see space for improvements, like for cluster connectors, or the time taken to restart a cluster after a full stop, or contention reduction.

But I am also confident that the work lead by the developer team will fix most of my concern (and more) soon.

Final note, it would be nice to have the code open source, to have the community to contribute, also if I understand the business reasons for not to.

 

 

 

 

Last Updated on Thursday, 19 May 2016 12:35
 
About Percona conference in Santa Clara 2016 PDF Print E-mail
Written by Marco Tusa   
Sunday, 20 December 2015 23:10

 

As most of us know, we will have the chance to attend to the MySQL conference in April (from 18 to 21).

For the ones like me that had being there from long, this is a moment of reunion with colleagues and friend. It is also a moment of confrontation and sharing.

In the years this conference had be the moment for the ones surfing the MySQL sea in which things can be put on the table and discuss. Very few matter if it was call MySQL conference or, as it is now Percona Live. What matter is the spirit with which the people participate, and the desire to share.

One of the important aspects was and is, to be able to learn from others experience, innovation and experimentations.

The past year had be a very difficult for me, thankfully only work wise, but I had also be able to be in some interesting exercises, that had allow me to come with a list of proposal that I consider quite interesting, some more some less as usual, depending from the angle you perform your work.

Anyhow given this year Percona had invite the community to express an opinion on the submissions, I decide to share mine and explain a bit each of them.

 

Here we go:

The first element is a tutorial on Performance schema. I know that a lot of people are talking about it, presenting in various way the usage of it, and some of the presentations are really really good. So why I should spend time to prepare a tutorial, and why anyone should attend?
My answer is simple, because I am approaching it from a different angle. Most of the presentations look to it as something isolated, auto reference to MySQL space. I am looking to it as part of a larger design and vision, connecting Performance schema to the USE methodology (see other proposal about it).

What I want to achieve during the tutorial is not only to provide instruction on how to access PS or what is there, but how to contextualize the information in the context of the Server(s) behavior.

Here the tutorial title and link for you to vote :

Learn how to use Performance Schema in MySQL 5.7 the basics and not only.

https://www.percona.com/live/data-performance-conference-2016/sessions/learn-how-use-performance-schema-mysql-57-basics-and-not-only#community-voting

 


 

 

The next one is a presentation that is linked to an article I wrote about the cloud, Galera, Aurora and other solutions.

If you had missed it I suggest you to read it (http://www.tusacentral.net/joomla/index.php/mysql-blogs/175-aws-aurora-benchmarking-blast-or-splash.html).


I had a lot of feedback about that article, from colleagues and from Amazon as well. Given the topic, and given the active evolution that a new product like Aurora is subject to, numbers and conclusion may change in short time.

As such I had plan to perform additional testing during the year (2016), collecting data and present results in articles and presentations.

As mentioned I had a very productive conversation with Amazon about Aurora, and given we all love to have new productive platform, able to perform at the best, I will be more than happy to run those tests with them to help them identify bottlenecks and possible solutions. As usual I will maintain my independence, transparence and impartiality, such that everyone can validate my numbers, and let us see what it will be.

 

The presentation title and link for you to vote:

Comparing synchronous replication solutions in the cloud

https://www.percona.com/live/data-performance-conference-2016/sessions/comparing-synchronous-replication-solutions-cloud#community-voting

 


 

 

I cannot say or count at how many presentations I had attended, talking about Performance, and how to analyze, check or improve it. Most focus on this or that aspect of the specific storage engine, or the new feature deliver in the X MySQL release.
But so far I had NOT attend to a presentation that would help me in define a methodology, an organic approach that I can use and reuse for such analysis.

As such I decide to do two things, first was to write an extensive document to be used by my teams in my ex-company (Pythian), something like an endowment for them to follow and relay on to do what is needed to perform a good performance review.

Second to start to talk about it and present the approach. What is important to understand, is that I am not inventing anything new, but using what had be already well defined and apply to the MySQL world.

In short I will present the USE methodology (utilization, saturation, and errors), should be used early in a performance investigation, to identify systemic bottlenecks.
USE can be summarized this way, for every resource, check utilization, saturation, and errors.

My presentation will explain the USE methodology, and how it can help any administrator during the analysis of performance issues. It will also extend the approach for the MySQL specifics, taking advantage of Performance Schema instruments.

Presentation title and link for you to vote below:

The USE Method and how to boost the way you perform performance tuning on your (MySQL) environment

https://www.percona.com/live/data-performance-conference-2016/sessions/use-method-and-how-boost-way-you-perform-performance-tuning-your-mysql-environment#community-voting

 


 

 

When MySQL 5.6 comes out, I had cover with articles and presentations how to use at the best the new features related to table space managements. Covering how use the features that allow an administrator to play with them, and what kind of issues he may encounter.

You can review them here:

http://www.tusacentral.net/joomla/index.php/mysql-blogs/index.php/mysql-blogs/136-portable-tablespace-in-innodb-i-test-it

http://www.tusacentral.net/joomla/index.php/mysql-blogs/137-portable-tablespace-in-innodb-i-test-it-part2.html

http://www.tusacentral.net/joomla/index.php/mysql-blogs/138-portable-table-space-part-iii-or-we-can-do-it-partition.html

Old presentation here:

http://www.slideshare.net/marcotusa/discard-inport-exchange-table-tablespace

My next presentation is an update to it; I will also release articles about this topic during the next year, with more instructions and details for DBA to follow, also covering General table space and compression.

Presentation title, description and link for you to vote below:

MySQL 5.7 Tablespace management and optimization

https://www.percona.com/live/data-performance-conference-2016/sessions/mysql-57-tablespace-management-and-optimization#community-voting

 


 

 

Finally I had submitted a proposal for a topic that I personally love, which is related to Java development.

Despite most nasty, and often erroneous, comments, Java is not only a very powerful programming language, but also it is used so often and in so many ways that you can easily state that every day you use several applications develop using it.

What I had often found, and what I had fight against, is the very misuse of several abstraction layers, that developers often use, without understanding what they are doing.

Unfortunately this is a cultural issue that had be push and reinforce in years mostly by bad developers, who do not get how important is keep in mind a very basic concept: “Scaling scenario and huge data, are not present in your laptop”.

The basic meaning is, that whenever you develop code, you need to think to the big numbers, not if that functionality works now, but if it will work on a deployment of 200 application servers, and how it will impact the data layer while scaling.

I will produce a series of articles about this in the future, but the first step is to explain how use and how to use correctly one of the most powerful tool we have at the moment, the MySQL Java connector.

Too often I had see application relay on crazy solutions, or even crazier customize code; ignoring what is already available, able to provide quite efficient out of the box  solution.

This presentation, that you need to see as a first step, has the scope to start a journey, in which we will free good developers, able to think and plan application for the future, from the dumb approach often used by dull abstraction tools.

The title and link for you to vote:

Empower your application with sophisticate High Availability features using MySQL Connector/j

https://www.percona.com/live/data-performance-conference-2016/sessions/empower-your-application-sophisticate-high-availability-features-using-mysql-connectorj#community-voting

 


 

Conclusion

I had wait a bit before asking you to vote, this because I want to explain why I had submitted what I had submit.

I was also collecting information, comments and feedback about few topics; to be sure I could provide the time to my submission.

What I would like now, is have YOU spend few minutes and think if any of the topic I had describe above may be of your interest.

If so please follow the link and vote for the presentation, if not do not worry I am not tracing your IP and I am NOT going to send you some Italian Friends to convince you to vote for me.

Thanks in advance anyhow!!!

 

Stay tuned!

Last Updated on Sunday, 20 December 2015 23:54
 
AWS Aurora Benchmarking - Blast or Splash? PDF Print E-mail
Written by Marco Tusa   
Sunday, 01 November 2015 00:00

Summary

In this investigation, three solutions were analyzed: MySQL with MHA (Master High Availability Manager for MySQL), MySQL with Galera Replication (synchronous data replication cross-node), and AWS RDS-Aurora (data solution from Amazon promising HA and read scalability).

These three platforms were evaluated for high availability (HA; how fast the service would recover in case of a crash) and performance in managing both incoming writes and concurrent read and write traffic.

These were the primary items evaluated, followed by an investigation into how to implement a multi-region HA solution, as requested by the customer. This evaluation will be used to assist the customer in selecting the most suitable HA solution for their applications.

 

HA tests

time_failover

MySQL + Galera was proven to be the most efficient solution; in the presence of read and write load, it performed a full failover in 15 seconds compared to 50 seconds for AWS-Aurora, and to more than two minutes for MySQL with MHA.

Performance

Tests indicated that MySQL with MHA is the most efficient platform. With this solution, it is possible to manage read/write operations almost twice as fast as and more efficiently than MySQL Galera, which places second. Aurora consistently places last.

Recommendations

In light of the above tests, the recommendations consider different factors to answer the question, "Which is the best tool for the job?" If HA and very low failover time are the major factors, MySQL with Galera is the right choice. If the focus is on performance and the business can afford several minutes of down time, then the choice is MySQL with MHA.

 

Finally, Aurora is recommended when there is a need for an environment with limited concurrent writes, the need to have significant scale in reads, and the need to scale (in/out) to cover bursts of read requests such as in a high-selling season.

 

 

 

Why This Investigation?

The outcomes presented in this document are the result of an investigation taken in September 2015. The research and tests were conducted as an answer to a growing number of requests for clarification and recommendations around Aurora and EC2. The main objective was to be able to focus on real numbers and scenarios seen in production environments.

Everything possible was done to execute the tests in a consistent way across the different platforms.

Errors were prune by errors executing each test on each platform before collecting the data, this run also had the scope to identify the saturation limit that was different for each tested architecture. During the real test execution, I had repeated the test execution several times to identify and reduce any possible deviation.

 

Things to Answer:

In the investigation I was looking to answer to the following things by platform

  • HA
    • Time to failover
    • Service interruption time
    • Lag in execution at saturation level
  • Ingest & Compliance test
    • Execution time
    • Inserts/sec
    • Selects/sec
    • Deletes/sec
    • Handlers/sec
    • Rows inserted/sec
    • Rows select/sec (compliance only)
    • Rows delete/sec (compliance only)

 

Brief description about MHA, Galera, Aurora

MHA

MHA is a solution that sits on top of the MySQL nodes, checking the status of each the nodes, and using custom scripts to manage the failover. An important thing to keep in mind is that MHA is not acting as the “man in the middle”, and as such, no connection is sent to the MHA Controller.  The MHA controller instead could manage the entry point with a VIP (Virtual IP), as HAProxy settings, or whatever makes sense in the design architecture.

At the MySQL level, the MHA controller will recognize the failing master and will elect the most up-to-date one as the new master. It will also try to re-align the gap between the failed master and the new one using original binary logs if available and accessible.

Scalability is provided via standard MySQL design, having one master in read/write and several servers in read mode. Replication is asynchronous replication, and therefore it could easily lag, leaving the read nodes quite far behind the master.

 

mha

MySQL with Galera Replication

MySQL + Galera works on the concept of a cluster of nodes sharing the same dataset.

What this means is that MySQL+Galera is a cluster of MySQL instances, normally three or five, that share the same dataset, and where data is synchronously distributed between nodes.

The focus is on the data not on the nodes given to each node sharing the same data and status. Transactions are distributed across all active nodes that represent the primary component. Nodes can leave and re-join the cluster, modifying the conceptual view that expresses the primary component (for more details on Galera review my presentation ).

What is of note here, is that each node has the same data at the end of a transaction; given that the application can connect to any node and read/write data in a consistent way.

As previously mentioned replication is (virtually) synchronous, data is locally validated and certified, and conflicts are managed to keep data internally consistent. Failover is more of an external need than a Galera one, meaning that the application can be set to connect to one node only or on all the available nodes, and if one of the nodes is not available the application should be able to utilize the others.

Given that not all applications have this function out of the box, it is common practice to add another layer with HAProxy to manage the application connection, and have HAProxy distribute the connection across nodes or to use a single node as main point of reference and then shift to the others in case of needs. The shift in this case is limited to move the connection points from Node A to Node B.

MySQL/Galera write scalability is limited by the capacity to manage the total amount of incoming writes by the single node. There is no write scaling in adding nodes. MySQL/Galera is simply a write distribution platform, and reads can be performed consistently on each node.

galera

Aurora

Aurora is based on the RDS approach with the ease of management, built-in backup, data on disk autorecovery etc. Amazon also states that Aurora offers a better replication mechanism (~100 ms lag).

This architecture is based on a main node acting as a read/write node and a variable number of nodes that can work as read-only nodes. Given that the replication is claimed to be within ~100ms, reading from the replica nodes should be safe and effective. A read replica can be distributed by AZ (availability zone), but must reside in the same region.

Applications connect to an entry point that will not change. In case of failover, the internal mechanism will move the entry point from the failed node to the new master, and also all the read replicas will be aligned to the new master.

Data is replicated at a low level, and pushed directly from the read/write data store to a read data store, and here there should be a limited delay and very limited lag (~100ms). Aurora replication is not synchronous, and given only one node is active at time, there is no data consistency validation or check.

In terms of scalability, Aurora is not write scaling by adding of the replica nodes. The only way to scale the writes is to scale up, meaning upgrading the master instance to a more powerful one. Given the nature of the cluster, it would be unwise to upgrade only the master.

Reads are scaled by adding new read replicas.

AuroraArch001

 

What about multi-region?

One of the increasing requests I receive is to design architectures that could manage failover across regions.

There are several issues in replicating data across regions, starting from data security down to packet size and frame dimension modification, given we must use existing networks for that.

For this investigation, it is important to note one thing: the only solution that could offer internal replication cross-region, is Galera with the use of Segments. But given the possible issues I normally do not recommend using this solution when we talk of regions across continents. Galera is also the only one that eventually would help optimize asynchronous replication, using multiple nodes to replicate on another cluster.

Aurora and standard MySQL must rely on basic asynchronous replication, with all the related limitations.

Architecture for the investigation

The investigation I conducted used several components:

  • EIP = 1
  • VPC = 1
  • ELB=1
  • Subnets = 4 (1 public, 3 private)
  • HAProxy = 6
  • MHA Monitor (micro ec2) = 1
  • NAT Instance (EC2) =1 (hosting EIP)
  • DB Instances (EC2) = 6 + (2 stand by) (m4.xlarge)
  • Application Instances (EC2) = 6
  • EBS SSD 3000 PIOS
  • Aurora RDS node = 3 (db.r3.xlarge)
  • Snapshots = (at regular intervals)

MySQL_HA_failover - POC architecture

Application nodes connect to the databases either using an Aurora entry point, or HAProxy. For MHA, the controller action was modified to act on an HAProxy configuration instead on a VIP (Virtual IP), modifying his active node and reloading the configuration. For Galera, the shift was driven by the recognition of the node status using the HAProxy check functionality.
As indicated in the above schema, replication was distributed across several AZ. No traffic was allowed to reach the databases from outside the VPC. ELB was connected to the HAProxy instances, which have proven to be a good way to rotate across several HAProxy instances, in case an additional HA layer is needed.
For the scope of the investigation, given each application was hosting a local HAProxy, using the “by tile” approach ELB was tested in the phase dedicated to identify errors and saturation, but not used in the following performance and HA tests. The NAT instance was configured to allow access to each HAProxy web interface only, to review statistics and node status.

 

Tests

Description:

I performed 3 different types of tests:

  • High availability
  • Data ingest
  • Data compliance

High Availability

Test description:

The tests were quite simple; I was running a script that, while inserting, was collecting the time of the command execution, storing the SQL execution time (with NOW()), returning the value to be printed, and finally collecting the error code from the MySQL command.

The result was:

Output

2015-09-30 21:12:49  2015-09-30 21:12:49 0
    /\                         /\       /\
    |                          |        |
 Date time sys           now() MySQL     exit error code (bash)

Log from bash : 2015-09-30 21:12:49 2015-09-30 21:12:49 0

Log from bash : 2015-09-30 21:12:49 2015-09-30 21:12:49 0

Inside the table:

CREATE TABLE `ha` (
  `a` int(11) NOT NULL AUTO_INCREMENT,
  `d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `c` datetime NOT NULL;
  PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=178 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

select a,d from ha; +-----+---------------------+ | a | d | +-----+---------------------+ | 1 | 2015-09-30 21:12:31 |

For Galera, the HAProxy settings were:

server node1 10.0.2.40:3306 check port 3311 inter 3000 rise 1 fall 2  weight 50
server node2Bk 10.0.1.41:3306 check port 3311 inter 3000 rise 1 fall  2   weight 50 backup
server node3Bk 10.0.1.42:3306 check port 3311 inter 3000 rise 1 fall  2   weight 10 backup

I ran the test script on the different platforms, without heavy load, and then close to the MySQL/Aurora saturation point.

 

High Availability Results

time_failover

I think an image is worth millions of words.

MySQL with MHA was taking around 2 minutes to perform the full failover, meaning from interruption to when the new node was able to receive data again.

Under stress, the master was so ahead of the slaves, and replication lag was so significant, that a failover with binlog application was just taking too long to be considered a valid option in comparison to the other two. This result was not a surprise, but had pushed me to analyze the MHA solution more independently given the behaviour was totally diverging from the other two such that it was not comparable.

More interesting was the behavior between MySQL/Galera and Aurora. In this case, MySQL/Galera was consistently more efficient than Aurora, with or without load. It is worth mentioning that of the 8 seconds taken by MySQL/Galera to perform the failover, 6 were due to the HAProxy settings which had 3000 ms interval and 2 loops in the settings before executing failover. Aurora was able to perform a decent failover when the load was low, while under increasing load, the read nodes become less aligned with the write node, and as such less ready for failover.

Note that I was performing the tests following the Amazon indication to use the following to simulate a real crash:

ALTER SYSTEM CRASH [INSTANCE|NODE]

As such, I was not doing anything strange or out of the ordinary.

Also, while doing the tests, I had the opportunity to observe the Aurora replica lag using CloudWatch, which reported a much higher lag value than the claimed ~100 ms.

As you can see below:

replica_lag_16th_compliance_latency

In this case, I was getting almost 18 seconds of lag in the Aurora replication, much higher than ~100 ms!

Or a not clear

dacades_of_lag_inReplica

As you can calculate by yourself, 2E16 is several decades of latency.

Another interesting metric I was collecting was the latency between the application sending the request and the moment of execution.

ha_latency

Once more, MySQL/Galera is able to manage the requests more efficiently. With a high load and almost at saturation level, MySQL/Galera was taking 61 seconds to serve the request, while Aurora was taking 204 seconds for the same operation.

This indicates how high the impact of load can be in case of saturation, for the response time and execution. This is a very important metric to keep under observation to decide when or if to scale up.

 

Exclude What is Not a Full Fit

As previously mentioned, this investigation was intended to answer several questions first of all about the HA and the failover time. Given that, I had to exclude the MySQL/MHA solution from the remaining analysis, because it is so drastically divergent that it will make no sense to compare, also analyzing the performance of the MHA MySQL solution in conjunction of the others, would had flattered the other two. Details about MHA/MySQL are present in the Annex.

Performance tests

Ingest tests

Description:

This set of tests were done to cover how the two platforms behaved in case of a significant amount of inserts.

I used IIbench with a single table and my own StressTool that instead uses several tables (configurable) plus other more configurable options like:

  • Configurable batch inserts
  • Configurable insert rate
  • Different access method to PK (simple PK or composite)
  • Multiple tables and configurable table structure.

The two benchmarking tools differ also in the table definition:

IIBench

CREATE TABLE `purchases_index` (
  `transactionid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `dateandtime` datetime DEFAULT NULL,
  `cashregisterid` int(11) NOT NULL,
  `customerid` int(11) NOT NULL,
  `productid` int(11) NOT NULL,
  `price` float NOT NULL,
  `data` varchar(4000) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`transactionid`),
  KEY `marketsegment` (`price`,`customerid`),
  KEY `registersegment` (`cashregisterid`,`price`,`customerid`),
  KEY `pdc` (`price`,`dateandtime`,`customerid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

StressTool

CREATE TABLE `tbtest1` (
  `autoInc` bigint(11) NOT NULL AUTO_INCREMENT,
  `a` int(11) NOT NULL,
  `uuid` char(36) COLLATE utf8_unicode_ci NOT NULL,
  `b` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `c` char(200) COLLATE utf8_unicode_ci NOT NULL,
  `counter` bigint(20) DEFAULT NULL,
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `partitionid` int(11) NOT NULL DEFAULT '0',
  `date` date NOT NULL,
  `strrecordtype` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`autoInc`,`date`),
  KEY `IDX_a` (`a`),
  KEY `IDX_date` (`date`),
  KEY `IDX_uuid` (`uuid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 CHARSET=utf8 COLLATE=utf8_unicode_ci

IIbench was executed using 32 threads for each application server, while the stress tool was executed running 16/32/64 on each Application node, resulting in 96, 192, and 384 threads, each of which executing the batch insert with 50 insert per batch (19,200 rows).

Ingest Test Results

IIBench
Execution time:

iibenchexectime

Time to insert ~305 ML rows, in one single table using 192 threads.

Rows Inserted/Sec
iibench_row_insert
Insert Time

iibench_insert_time

The result of this test is once more quite clear, with MySQL/Galera able to manage the load in 1/3 of the time Aurora takes. MySQL/Galera was also more consistent in insert time taken with the growth of the rows number inside the table.

It is also of interest to note how the number of rows inserted reduced faster in MySQL/Galera related to Aurora.

 

Java Stress Tool
Execution Time

stress_ingest_exec_time

 

This test was focused on multiple inserts (as IIbench) but using an increasing number of threads, and multiple tables. This test is closer to what could happen in real life given parallelization, and multiple entry points are definitely more common than a single table insert in a relational database.

In this test Aurora performs better and the distance is less evident than the IIbench test.

In this test, we can see that Aurora is able to perform almost as a standard MySQL instance, but this performance does not persist with the increase of concurrent threads. Actually, MySQL/Galera was able to manage the load of 384 threads in 1/3 of the time in respect to Aurora.

Analyzing more in-depth, we can see that MySQL/Galera is able to manage more efficiently the commit phase part, which is surprising keeping in mind MySQL/Galera is using synchronous replication, and it had to manage the data validation and replication.

Row inserted
stress_ingest_rows_count
Com Commit

stress_ingest_com_count

Commit Handler Calls

stress_ingest_handlers_count

In conclusion, I can say that also in this case MySQL/Galera performed better than Aurora.

Compliance Tests

The compliance tests I ran were using Tpcc-mysql with 200 warehouses, and  StressTool with 15 parent tables and 15 child tables generating Select/Insert/Delete on a basic dataset of 100K entries.

All tests were done with the buffer pool saturated.

Tests for tpcc-mysql were using 32, 64, and 128 threads, while for StressTool I was using 16, 32, and 64 threads (multiply for the 3 application machines).

Compliance Tests Results

Java Stress Tool
Execution Time

stress_comp_exectime

In this test, we have the applications performing concurrent access and action in read/write rows on several tables. It is quite clear from the picture above that MySQL/Galera was able to process the load more efficiently than Aurora. Both platforms had a significant increase in the execution time with the increase of the concurrent threads.

Both platforms reach saturation level with this test, using a total of 192 concurrent threads. Saturation was at different moment and hitting a different resource during the 192 concurrent threads test; in the case of Aurora it was CPU-bound and there was replication lag; for MySQL/Galera the bottlenecks were i/o and flow control.

Rows Insert and Read

stress_comp_rows

In relation to the rows managed, MySQL/Galera performed better in terms of quantity of rows managed, but this trend went significantly down, while increasing the concurrency. Both read and write operations were affected, while Aurora was managing less in terms of volume, but became more consistent while concurrency increased.

Com Select Insert Delete

stress_comp_com

Analyzing the details by type of command, it is possible to identify that MySQL/Galera was more affected in the read operations, while writes showed less variation.

Handlers Calls

stress_comp_handlers

In write operation Aurora was inserting a significantly less volume of rows, but it was more consistent with concurrency increase. This is probably because the load exceed the capacity of the platform, and Aurora was acting at its limit. MySQL/Galera was instead able to manage the load with 2 times the performance of Aurora, also if the increasing concurrency was affecting negatively the trend.

TPCC-mysql

Transactions

tpcc_trx

The Tpcc-mysql is emulating the CRUD activities of transaction users against N warehouses. In this test, I used 200 warehouses; each warehouse has 10 terminals, with 32, 64, and 128 threads.

Once more, MySQL/Galera is consistently better than Aurora in terms of volume of transactions. Also the average per-second results were consistently over almost 2.6 times better than Aurora. On the other hand, Aurora shows less fluctuation in serving the transactions, having a more consistent trend for each execution.

Average Transactions

tpcc_avg_insert

 

Conclusions

High Availability

MHA excluded, the other two platforms were shown to be able to manage the failover operation in a limited time frame (below 1 minute); nevertheless MySQL/Galera was shown to be more efficient and consistent, especially in consideration of the unexpected and sometime not fully justified episodes of Aurora replication lags. This result is a direct consequence of the synchronous replication, that by design brings MySQL/Galera in to not allow an active node to fell behind.

In my opinion the replication method used in Aurora, is efficient, but it still allows node misalignments, which is obviously not optimal when there is the need to promote a read only node to become a read/write instance.

Performance

MySQL/Galera was able to outperform Aurora in all tests -- by execution time, number of transactions, and volumes of rows managed. Also, scaling up the Aurora instance did not have the impact I was expecting. Actually it was still not able to match the EC2 MySQL/Galera performance, with less memory and CPUs.

Note that while I had to exclude the MHA solution because of the failover time, the performance achieved using standard MySQL were by far better than MySQL/Galera or Aurora, please see Appendix 1.

General Comment on Aurora

The Aurora failover mechanism is not so much different as other solutions. In case of a crash of a node another node will be elected as new primary node, on the basis of the "most up-to-date" rule.

Replication is not illustrated clearly in the documentation, but it seems to be a combination of block device distribution and semi-sync replication, meaning that a primary is not really affected by the possible delay in the copy of a block once its dispatched. What is also interesting is the way the data of a volume will be fixed in case of issues that will happen copying the data over from another location or volume that hosts the correct data. This resembles the HDFS mechanism, and may well be exactly it; what is relevant is that if HDFS, the latency in the operation may be relevant. What is also relevant in this scenario is the fact that if the primary node crashes, during the election of a secondary to primary, there will be service interruption; this can be up to 2 minutes according to documentation and verified in the tests.

About replication, it is stated in the documentation that the replication can take ~100 ms, but that is not guaranteed and is dependent on the level of traffic in writes incoming to the primary server. I have already reported that this is not true, and replication can take significantly longer.

What happened during the investigation is that, the more writes the more possible distance could exists between the data visible in the primary and the one visible in the replica (replication lag). No matter how efficient the replication mechanism is, this is not synchronous replication, and this does not guarantee consistent data reading by transaction.

Finally, replication across regions is not even in the design of the Aurora solution, and it must rely on standard replication between servers as with asynchronous MySQL replication. Aurora is nothing more, nothing less than an RDS with steroids, and with smarter replication.

Aurora is not performing any kind of scaling in writes, scaling is performed in reads. The way it scales in write is by scaling up the box, so more power and memory = more writes, nothing new and obviously scaling up is also more cost. That said, I think Aurora is a very valuable solution when in need to have a platform that requires extensive read scaling (in/out), and for rolling out a product in phase 1.

Appendix MHA

MHA performance Graphs

As previously mention MySQL/MHA was acting significantly better the other two solutions. IIBench test complete in 281 seconds against the 4039 of Galera.

IIBench Execution Time
MHA_exec_time

MySQL/MHA execution time (Ingest & Compliance)

MHA_APP_exec_time

The execution of the Ingest and compliance test in MySQL/MHA had been 3 time faster than MySQL/Galera.

MySQL/MHA Rows Inserted (Ingest & Compliance)

MHA_app_rows_inserted

The number of inserted rows is consistent with the execution time, being 3 times the one allowed in the MySQL/Galera solution. MySQL/MHA was also able to better manage the increasing concurrency with simple inserts or in the case of concurrent read/write operations.

Last Updated on Monday, 02 November 2015 00:47
 
Percona Live Amsterdam 2015 PDF Print E-mail
Written by Marco Tusa   
Sunday, 20 September 2015 15:27

On Monday 21 September Percona Live will start in Amsterdam.

The program is full of interesting topics and I am sure a lot of great discussions will follow.

I whish all my best to all my colleagues, friends and customers that will attend it. Have fun guys and drink a couple of beer for me as well.

 

That is it, I had decided to do not submit speech(es) and to do not come this year, not only to Percona Live but to most or all the conferences.

I want to stay focus on my customers for now, and be present as much as I can for my teammates.

We have so much going on that an effort in that direction must be done, and the few time left ... well I have to read a lot of intersting stuff not Tech related.

 

So have fun, learn, teach, listen and talk ... but on top of all share and keep the spirit high, these are hard times and events like Percona Live are important.

I will miss it ... but as said sometime we have to choose our priorities.

 

Great MySQL (still talking about MySQL right??) to everybody

Last Updated on Sunday, 20 September 2015 15:54
 
Why you should be careful when Loading data in MySQL with Galera. PDF Print E-mail
Written by Marco Tusa   
Saturday, 08 August 2015 18:01

An old story that is not yet solve.

 

Why this article.

Some time ago I had open a bug report to codership through Seppo.

The report was about the delay existing in executing data load with FK. (https://bugs.launchpad.net/codership-mysql/+bug/1323765).

The delay I was reporting at that time were such to scare me a lot, but I know talking with Alex and Seppo that they were aware of the need to optimize the approach an some work was on going.

After some time I had done the test again with newer version of PXC and Galera library.

This article is describing what I have found, in the hope that share information is still worth something, nothing less nothing more.

The tests

Tests had being run on a VM with 8 cores 16GB RAM RAID10 (6 spindle 10KRPM).

I have run 4 types of tests:

  • Load from file using SOURCE and extended inserts
  • Load from SQL dump and extended inserts
  • Run multiple threads operating against employees tables with and without FK
  • Run single thread operating against employees tables with and without FK

 For the test running against the employees’ db and simulating the client external access, I had used my own stresstool.

The tests have been done during a large period of time, given I was testing different versions and I had no time to stop and consolidate the article. Also I was never fully convinced, as such I was doing the tests over and over, to validate the results.

I have reviewed version from:

Server version:                        5.6.21-70.1-25.8-log Percona XtraDB Cluster binary (GPL) 5.6.21-25.8, Revision 938, wsrep_25.8.r4150

To

Server version:                        5.6.24-72.2-25.11-log Percona XtraDB Cluster binary (GPL) 5.6.24-25.11, Revision, wsrep_25.11

With consistent behavior.

 

What happened

The first test was as simple as the one I did for the initial report, and I was mainly loading the employees db in MySQL.

time mysql -ustress -ptool -h 127.0.0.1 -P3306 < employees.sql

Surprise surprise … I literally jump on the chair the load takes 37m57.792s.

Yes you are reading right, it was taking almost 38 minutes to execute.

I was so surprise that I did not trust the test, as such I did it again, and again, and again.

Changing versions, changing machines, and so on.

No way… the time remain surprisingly high.

Running the same test but excluding the FK and using galera was complete in 90 seconds, while with FK but not loading the Galera library 77 seconds.

Ok something was not right. Right?

I decide to dig a bit starting from analyzing the time taken, for each test.

See image below:

load

 

 

From all the tests the only one not align was the data loading with FK + Galera .

I had also decided to see what was the behavior in case of multiple threads and contention.

As such I prepare a test using my StressTool and run two class of tests, one with 8 threads pushing data, the other single threaded.

As usual I have also run the test with FK+Galera, NOFK+Galera, FK+No Galera.

The results were what I was expecting this time and the FK impact was minimal if any, see below:

threads

 

 

The distance between execution was minimal and in line with expectations.

Also it was consistent between versions, so no surprise, I relaxed there and I could focus on something else.

On what?

Well why on the case of the load from file, the impact was so significant.

The first thing done was starting to dig on the calls, and what each action was really doing inside MySQL.

To do so I have install some tools like PERF and OPROFILE, and start to dig into it.

First test with FK+Galera taking 38 minutes, was constantly reporting a different sequence of calls/cost from all other tests.

57.25%  [kernel]                      [k] hypercall_page

35.71%  libgcc_s-4.4.7-20120601.so.1  [.] 0x0000000000010c61

2.73%  libc-2.12.so                  [.] __strlen_sse42

0.16%  mysqld                        [.] MYSQLparse(THD*)

0.14%  libgcc_s-4.4.7-20120601.so.1  [.] strlen@plt

0.12%  libgalera_smm.so              [.] galera::KeySetOut::KeyPart::KeyPart(galera::KeySetOut::KeyParts&, galera::KeySetOut&, galera::K

0.12%  mysqld                        [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned

0.09%  libc-2.12.so                  [.] memcpy

0.09%  libc-2.12.so                  [.] _int_malloc

0.09%  mysqld                        [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long,

0.08%  mysql                         [.] read_and_execute(bool)

0.08%  mysqld                        [.] ha_innobase::wsrep_append_keys(THD*, bool, unsigned char const*, unsigned char const*)

0.07%  libc-2.12.so                  [.] _int_free

0.07%  libgalera_smm.so              [.] galera::KeySetOut::append(galera::KeyData const&)

0.06%  libc-2.12.so                  [.] malloc

0.06%  mysqld                        [.] lex_one_token(YYSTYPE*, THD*)

 

Comparing this with the output of the action without FK but still with Galera:

75.53%  [kernel]                      [k] hypercall_page

1.31%  mysqld                        [.] MYSQLparse(THD*)

0.81%  mysql                         [.] read_and_execute(bool)

0.78%  mysqld                        [.] ha_innobase::wsrep_append_keys(THD*, bool, unsigned char const*, unsigned char const*)

0.66%  mysqld                        [.] _Z27wsrep_store_key_val_for_rowP3THDP5TABLEjPcjPKhPm.clone.9

0.55%  mysqld                        [.] fill_record(THD*, Field**, List<Item>&, bool, st_bitmap*)

0.53%  libc-2.12.so                  [.] _int_malloc

0.50%  libc-2.12.so                  [.] memcpy

0.48%  mysqld                        [.] lex_one_token(YYSTYPE*, THD*)

0.45%  libgalera_smm.so              [.] galera::KeySetOut::KeyPart::KeyPart(galera::KeySetOut::KeyParts&, galera::KeySetOut&, galera::K

0.43%  mysqld                        [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, unsigned long*, unsigned long,

0.43%  mysqld                        [.] btr_search_guess_on_hash(dict_index_t*, btr_search_t*, dtuple_t const*, unsigned long, unsigned

0.39%  mysqld                        [.] trx_undo_report_row_operation(unsigned long, unsigned long, que_thr_t*, dict_index_t*, dtuple_t

0.38%  libgalera_smm.so              [.] galera::KeySetOut::append(galera::KeyData const&)

0.37%  libc-2.12.so                  [.] _int_free

0.37%  mysqld                        [.] str_to_datetime

0.36%  libc-2.12.so                  [.] malloc

0.34%  mysqld                        [.] mtr_add_dirtied_pages_to_flush_list(mtr_t*)

 

What comes out is the significant difference in the FK parsing.

The galera function

 

KeySetOut::KeyPart::KeyPart (KeyParts&  added, 
                             KeySetOut&     store,
                             const KeyPart* parent,
                             const KeyData& kd,
                             int const      part_num) 

 

 

is the top consumer before moving out to share libraries.

After it the server is constantly calling the strlen function, as if evaluating each entry in the insert multiple times.

This unfortunate behavior happens ONLY when the FK exists and require validation, and ONLY if the Galera library is loaded.

It is logic conclusion that the library is adding the overhead, probably in some iteration, and probably a bug.

 

Running the application tests, using multiple clients and threads, this delay is not happening, at least with this level of magnitude.

During the application tests, I had be using batching insert up to 50 insert for SQL command, as such I could have NOT trigger the limit, that is causing the issue in Galera.

As such, I am not still convinced that we are “safe” there and I have in my to do list to add this test soon, in the case of significant result I will append the information, but I was feeling the need to share in the meanwhile.

 

The other question was, WHY the data load from SQL dump was NOT taking so long?

That part is easy, comparing the load files we can see that in the SQL dump the FK and UK are disable while loading, as such the server skip the evaluation of the FK in full.

That’s it, adding:

 

SET FOREIGN_KEY_CHECKS=0, UNIQUE_CHECKS=0;

 

To the import and setting them back after, remove the delay and also the function calls become “standard”.

 

 

Conclusions

This short article has the purpose of:

  • Alert all of you of this issue in Galera and let you know this is going on from sometime and has not being fix yet.
  • Provide you a workaround. Use SET FOREIGN_KEY_CHECKS=0, UNIQUE_CHECKS=0; when performing data load, and rememeber to put them back (SET FOREIGN_KEY_CHECKS=1, UNIQUE_CHECKS=1;).
    Unfortunately, as we all know, not always we can disable them, Right? This brings us to the last point.
  • I think that Codership and eventually Percona, should dedicate some attention to this issue, because it COULD be limited to the data loading, but it may be not.

 

 

 

 

I have more info and oprofile output that I am going to add in the bug report, with the hope it will be processed.

 

Great MySQL to everyone …

Last Updated on Saturday, 08 August 2015 18:22
 
«StartPrev12345678910NextEnd»

Page 1 of 11
 

Who's Online

We have 18 guests online