What I think about the Percona Live conference 2013.
Written by Marco Tusa   



No need to say that as many others I have enjoy this conference a lot.

For me was also a personal success because I was finally able to have my company sponsoring the event and bring new members of my team to the conference as speakers, obviously all their merit, but I was happy as responsible of the MySQL cluster in Pythian knowing how much have cost to them to be there.


We also had a lot of people at the community dinner, last head count I did was over 120 people. In this regard I have to clarify some misunderstanding and confusion we had there.

Pedro’s requests to us to have the last head count by Tuesday morning, given that I have closed the count from the comments in the web-site and from the emails around noon.  The number I pass to Pedro’s base on that was of 70-80 people, but we had additional people registering after that time and also last moment show, given that we have to manage additional 40 people that has to be located in different areas.  Quite obvious that I did not give clear enough instructions, I personally apology for that, I (we) will do better next year. I hope you had good time and good food also if a little bit detach from the other tables.


What about the conference?

My feeling is that this event consolidates what is the “core” of the MySQL community.

We have seen many companies providing the same service, one close to the other sitting and talking with positive spirit and attitude.  I have been personally chatting all the time with people from SkySQL, Percona and others, all of us with open and friendly attitude.

I have seen Oracle people participate to the conference (hurray!!!), as IOUG committee member I know very well the number of time we have said to Oracle to be there, and they were! This was good, period.


MySQL where are you going?

In relation of what is happening to MySQL, and where is that leading us, I have confirm my idea that nowadays we are not talking anymore of LAMP or Full stack, when we talk about MySQL.

What customers, companies and users are expecting is a more complex and articulate environment. Interaction between different blocks is now not an optional but a fact.

When we approach an existing environment or when we have to build a new one, we now think in term of hundreds of application servers, terabytes of data to store or to process, many different client platform to support and impressive amount of data to analyze for reporting.


Feel free to fill the boxes with the name of the product that you like most, but for sure you will not limit yourself to MySQL or LAMP.

Already today I have customers using MySQL, Oracle Database, MongoDB, Hadoop and more, all in one single connected environment.


Thinking in term of MySQL only when we think to product, service, monitoring or design is too limitative.


For instance a tools that monitor MySQL but do not catch his interaction with other element like Hadoop, is going to provide only part of the picture. That partial picture referring only to MySQL metrics will be close to be useless because it will not be able to provide all the require information needed to perform a valid analysis and eventually projection.

In other terms will cover the basic of the single block and will not help us to get the big picture. That is it, still useful to keep the block in decent state but will let you blind for what is going on in the whole context.

This is for many aspects true also for the products, each block, MySQL included, must become more and more flexible to exchange data with the others. This can be achieve developing specific interfaces, or by defining a common protocol/format of communication that is shared between the different blocks.


In MySQL universe (or MariaDB), this also means to keep consistency and to remain as open as possible to facilitate the creation of additional plug-in/engines.

But what really worries me, given also is “my” field, is “service”. Those environments require support, design and so on. We know very well how complex a MySQL environment could be, what about it when we start to have many other actors involve? What really scares me is the level of knowledge is required to cover all of them or just a segment.

I am convince that we will have to work around it, because users/customers/companies will ask us to provide the support for all the element in their architecture, actually it is already happening, and the real risk is to have or become generalist instead of high profile experts.


If you don’t do it, if you do not differentiate, the risk is to be isolate by the market, and yes be very smart on that specific area, but not able to understand the big picture, ergo useless.

On the other hand, trying to do too much could drives you (as company) to disperse the resource and have an average level, that is good but not Excellent with capital “E”.

The possible solution is to have a huge monster with hundred and hundred of people, and division per technology … and … well I have seen already several of them starting and die. Starting good with very high service quality and then become big, heavy and so slow that customers moves out.


No that is not the solution, solution reside in being able to balance correctly what you can do with your resources, and reasonable growth, and what not.

I am working in a multi technology company, and I know very well of what I am talking about when I say that balance is the key.  The future will need to see two things: one is the companies improving their capacity to cover more then just MySQL, the other is open the space to collaboration, company covering different technologies must start to interact more, and offer better service and results to the users/customers in a cooperative way.

That will allow the single company to remain focus on few things and keep a high level of expertise on the chosen areas. Working in a cooperative way is the key.

All this needs to happen, and require coordination.


Flexibility and coordination are the keywords for the future. The MySQL community have shown already how much energy it has, how strong it could be in difficult moment and how much we really care about our customer/users.

What I see for the future is us working all together gathering all the actors involve, and give life to a new ecosystem which will help to facilitate the evolution of the next generation of data and applications.


What about the Speeches

In term of talks I have to say I was expecting a little bit more, not from the speakers only (me included), but also from the product companies.

As said I think is time to move to the next step and I was expecting more talks about interactions between technologies.


I am not saying that we should not cover the base; we must do it, but having more talks on how MySQL and MongoDB coexist, or how we could help Terabytes of data to be process between A and B; well that would have be nice.

Not only as what we have now, but also what we are planning for the future, including new features and ideas for the development.


In this regards the only relevant speeches I have seen were, the ones done by my colleague Danil Zburivsky on Hadoop/MySQL , and the other about Json by Anders Karlsson during the MariaDB/SkySQL event.

Thanks guys you see the future, and shame on Marco that was thinking about it and could have done it but did not … may be for the next conference.


Said that, the level of the speeches was good, I have being talking with the people attending, most of them satisfy, but let us wait and see what the evaluations will reveal.

What I can say is that I really enjoy the tutorial on Xtradb and Galera done by Jay Janssen, that helps me to feel less alone in the Galera implementation adventure; and I regret to have miss the “InnoDB: A journey to the core” by Davi Arnaut  and Jeremy Cole. But I was presenting at the same slot, and would have not be nice for me to say to the people there, ok let us move all to the next room.


What about the Expo

WoooW, first time as sponsor and first time with a boot. A lot of talk, a lot of possible new friends and a better understanding on what we need to do to be more effective next time. T-shirts first!!! Lesson learned we bring to few … we could have cover all the bay area with LOVE YOUR DATA, we miss the target this year, we will not do the same the next one.


I think that this year we had a well-balanced expo, with less show, but more focus, I must also mention the presence outside the expo area of  Julian Cash (http://jceventphoto.com/index.html), which takes a lot of cool shot of most of us.

I know Julian was there also during other conferences but I never met him before. I did this year and was a great experience, I hate to takes photos but with Julian I was having so much fun that at the end I love it.


What about the Lightening Talks?

Another well establish event at MySQL conference, and every year we have fun. This year I have enjoy the Shlomi one, and absolutely AWSOME was the performance from the Tokutek team.


About the Lightening talks and just to confirm what Dave Apgar was saying in his really good presentation, shit happens and you never know when. I had place my video camera and register the WHOLE event, and guess what… my new SD card just failed, and nothing I mean NOTHING was there after. Next time I will come with TWO video cameras and will setup redundancy!



Finally during the conference we had two very significant announcements.

The first one was the expected merge of MariaDB and SkySQL, nothing new, but it is good to see that SkySQL is defining his identity with more determination, but not only this merge is very important because all MariaDB users now have a clear referring point, that will hep them and the community to better adopt and improve MariaDB. Way to go guys well done!


The second one is about Tokutek (http://www.tokutek.com/), finally open source. I have tested it the first time 3 years ago, and was a very interesting technology, but hard to have implemented because customers where reluctant to go for non-open source code.

Just a note, I wrote open source, not free. Open source doesn’t mean free, and here the concept was very clear, customers were willing to “eventually” pay, but not for close code.

Tokutek move is not only smart because will allow the company to have substantial help from the community in identify issues, improve utilization and identify new trends, but it is smart also because remove the last philosophical barrier in the software adoption.


From the technical point of view, the presentations have shown a significant improvement in respect to the previous years, and I was very impress from the presentation done by Gerry Narvaja during the SkySQL/MariaDB event.

One thing is sure, I have customers that could take huge benefit from Tokutek and I will give a try right away starting next week.


Winner and looser

No doubt from my side, I was not even mentioning before because for me is a given.

On the 12 of March 2011 I have written this article http://www.tusacentral.net/joomla/index.php/mysql-blogs/96-a-dream-on-mysql-parallel-replication, which was my dream about replication. At the time of writing I was not aware/testing ANY software able to do what I was asking.

On the 23 November 2011 I wrote another article http://www.tusacentral.net/joomla/index.php/mysql-blogs/119-galera-on-red-hat-is-123-part-1, and that was my first approach with Galera.

On the 29 September 2012 I have presented the first results of a POC done on customer environment http://www.slideshare.net/marcotusa/scaling-with-sync-replication-2012.

Next week, I must implement another MySQL Cluster, base on Galera replication.


The winner for me is the Galera solution (http://www.codership.com/), whatever version you may like; from my side I have found that the Percona version is the more stable, and using the Severalnines tool to manage the cluster (http://www.severalnines.com/clustercontrol) is also helpful.



Who is the looser then?

All the ones that have believe MySQL was over on the 2010 (Oracle take over).

We have MySQL from Oracle, we have MariaDB from Monty, we have companies developing their storage engines and tools, we have a more complex ecosystem that is growing day by day.

No MySQL is not over at all.

One note only, whoever leads the development from any side, reminds that you MUST allow the community to use and develop code on top of yours, modify interfaces without documenting, or not be fully explicit on what to do, how to do, and which direction, well it is not fair.

Percona Live, MySQL Conference in Santa Clara was a great conference, done by great people. We can do better all of us, always, but what makes me feel good is that I know we will do better next year.


Last note…

Did not you miss some one? Did you as I did, feel as that something was not right? Was not a face missed?

I mean … yes! He! Baron Schwartz!!  Hey man we miss you!!! Or at list I miss you and your block notes during the presentations, come back ASAP.


Happy MySQL to everyone.











And more ...



Last Updated on Sunday, 18 August 2013 17:04
Amazon EC2 - RDS quick comparison
Written by Marco Tusa   

1. What this is about

The following is a review of the real status about Amazon RDS in comparison with EC2.

The purpose is to have a better understanding of possible limitation in the platform usage, and what is a possible fit and what not.



2. Why

I did a first review an year ago for internal purpose, but now we are receiving the same questions over and over from different customers.

Given that and given a lot of things could happen in one year, I have decide to repeat the review and perform the tests once again.

What needs to be underline is that, I am doing this in consideration of a usage in PRODUCION, not as QA or development.

So my considerations are obviously focus on more demanding scenarios.



3. About EC2 and RDS.

3.1. Machine configuration

There are different ways that we can choose to start our EC2 or RDS, both have different cost and “virtual” physical characteristics, the list for both is below:


T1 Micro (t1.micro)      Free tier eligibleUp to 2 ECUs1 Core613 MiB
M1 Small (m1.small)1 ECU1 Core1.7 GiB
M1 Medium (m1.medium)2 ECUs1 Core3.7 GiB
M1 Large (m1.large)4 ECUs2 Cores7.5 GiB
M1 Extra Large (m1.xlarge)8 ECUs4 Cores15 GiB
M3 Extra Large (m3.xlarge)13 ECUs4 Cores15 GiB
M3 Double Extra Large (m3.2xlarge)26 ECUs8 Cores30 GiB
M2 High-Memory Extra Large (m2.xlarge)6.5 ECUs2 Cores17.1 GiB
M2 High-Memory Double Extra Large (m2.2xlarge)13 ECUs4 Cores34.2 GiB
M2 High-Memory Quadruple Extra Large (m2.4xlarge)26 ECUs8 Cores68.4 GiB
C1 High-CPU Medium (c1.medium)5 ECUs2 Cores1.7 GiB
C1 High-CPU Extra Large (c1.xlarge)20 ECUs8 Cores7 GiB 
High Storage Eight Extra Large (hs1.8xlarge)35 ECUs16 Cores117 GiB

Micro DB Instance: 630 MB memory, Up to 2 ECU (for short periodic bursts), 64-bit platform, Low I/O Capacity
Small DB Instance: 1.7 GB memory, 1 ECU (1 virtual core with 1 ECU), 64-bit platform, Moderate I/O Capacity
Medium DB Instance: 3.75 GB memory, 2 ECU (1 virtual core with 2 ECU), 64-bit platform, Moderate I/O Capacity
Large DB Instance: 7.5 GB memory, 4 ECUs (2 virtual cores with 2 ECUs each), 64-bit platform, High I/O Capacity
Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual cores with 2 ECUs each), 64-bit platform, High I/O Capacity
High-Memory Extra Large DB Instance 17.1 GB memory, 6.5 ECU (2 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity
High-Memory Double Extra Large DB Instance: 34 GB of memory, 13 ECUs (4 virtual cores with 3,25 ECUs each), 64-bit platform, 
High I/O CapacityHigh-Memory Quadruple Extra Large DB Instance: 68 GB of memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit platform, High I/O Capacity

3.2. Embedded features

EC2 imply that we do by ourselves the installation, setting up and maintenance of our system and Database software, but for RDS, Amazon provide few “features” that is important to keep in mind and have in mind for later discussions.


The most relevant are:

Pre-configured Parameters – Amazon RDS DB Instances are pre-configured with a sensible set of parameters and settings appropriate for the DB Instance class we select.

Monitoring and Metrics – Amazon RDS provides Amazon CloudWatch metrics for your DB Instance deployments at no additional charge. You can use the AWS Management Console to view key operational metrics for your DB Instance deployments, including compute/memory/storage capacity utilization, I/O activity, and DB Instance connections.

Automated Backups – Turned on by default, the automated backup feature of Amazon RDS enables point-in-time recovery for your DB Instance.

DB Snapshots – DB Snapshots are user-initiated backups of your DB Instance.

Multi-Availability Zone (Multi-AZ) Deployments – Amazon RDS Multi-AZ deployments provide enhanced availability and durability for Database (DB) Instances, making them a natural fit for production database workloads.

When we provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ).

Each AZ runs on its own physically distinct, independent infrastructure, in case of an infrastructure failure (for example, instance crash, storage failure, or network disruption), Amazon RDS performs an automatic failover to the standby so that you can resume database operations as soon as the failover is complete.

Read Replicas – This replication feature makes it easy to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. Amazon RDS uses MySQL’s native replication to propagate changes made to a source DB Instance to any associated Read Replicas.


3.3. Storage

As shown before in RDS we cannot do too much regarding the storage, we can just choose between different instances, and if we want to have provisioned IOPS.

On EC2 we obviously have both, but we also can choose how to define and use our storage solution.


3.4. MySQL configuration

Amazon presents to us the Pre-configured parameters as a cool “feature” but this is just one side of the coin. The other side is that we cannot really adjust some of the critical parameters for MySQL, or that their values are not as define in standard MySQL.

The parameters in discussion are:

binlog_format | STATEMENT expire_logs_days | 0 
(calc)| innodb_buffer_pool_size | 3921674240 
innodb_doublewrite | ON 
innodb_file_format_max | Antelope 
innodb_locks_unsafe_for_binlog | OFF 
innodb_log_file_size | 134217728 
innodb_log_files_in_group | 2 
innodb_log_group_home_dir | /rdsdbdata/log/innodb (Max 300)
innodb_open_files | 300 
max_binlog_size | 134217728 (max 4294967295)
max_join_size | 4294967295 
open_files_limit | 65535

Most concerning are, the binlog format, InnoDB Log related ones.


3.5. Multi-AZ implementation

From the architectural point of view, I do not have a clear way of HOW the Multi-AZ is implemented, and I am really interested in discovering how in Amazon they have achieved the declared Synchronous replication.

I am just guessing here but some base replication using DRBD Primary/Secondary seems the most probable. What could be concerning here is the protocol level use for such replication, and level of block transmission acknowledge, given a full protocol C will be probably to expensive, also if the ONLY really safe in the case of DRBD usage. But given I don’t have clear if the solution is really using it, let me just say it will be good to have better insight.


3.6. Standard replication

We cannot use standard replication in RDS, we need to rely on Read-Replicas, or not use replication at all. The only solution is to use external solution like Continuent Tungsten (http://scale-out-blog.blogspot.ca/2013/01/replicating-from-mysql-to-amazon-rds.html).

It is important to note that RDS replication between master and Read-replica is using the STATEMENT binlog format, and it cannot be change, as direct consequence we do have inefficient replication between master and replicas for all non-deterministic statements, and in case of mixed transactions between storage engines.


3.7. Tests done

The test performed where not too intensive, and I was mainly focus on identify what will be the safe limit of usage/load for RDS in comparison to an EC2 instance properly set.

As such I have choose to use the Large Instance set for both EC2 and RDS, with 2 virtual CPU 7.5GB virtual RAM, High I/O capacity for RDS.

For EC2 the only difference will reside in the fact I perform the tests using 1 EBS for the data directory in one case, and a raid5 of 4 EBS in the other.

Also in regards of the MySQL configuration I have “standardize” the configuration of the different instance using the same parameters.

Only differences was that I was not using SSL in the MySQL EC2 instance, while it cannot be turn off in RDS because Amazon security is relying on it.

The test was using a variable number of concurrent threads:

Writing on 5 main tables and on 5 child table.

Read on main table joining 4 tables to main, filtering the results by IN clause in on test, and by RANGE in another.


The structures of the tables are the following:


mysql> DESCRIBE tbtest1; 
 FIELD         | Type         | NULL | KEY | DEFAULT           | Extra 
 autoInc       | bigint(11)   | NO   | PRI | NULL              | AUTO_INCREMENT 
 a             | int(11)      | NO   | MUL | NULL              | 
 uuid          | char(36)     | NO   | MUL | NULL              |  
 b             | varchar(100) | NO   |     | NULL              | 
 c             | char(200)    | NO   |     | NULL              | 
 counter       | bigint(20)   | YES  |     | NULL              | 
 time          | timestamp    | NO   |     | CURRENT_TIMESTAMP | ON UPDATE 
 partitionid   | int(11)      | NO   |     | 0                 | 
 strrecordtype | char(3)      | YES  |     | NULL              | 





 mysql> DESCRIBE tbtest_child1; 
 FIELD        | Type         | NULL | KEY | DEFAULT           | Extra | 
 a            | int(11)      | NO   | PRI | NULL              | 
 bb           | int(11)      | NO   | PRI | NULL              | AUTO_INCREMENT 
 partitionid  | int(11)      | NO   |     | 0                 | 
 stroperation | varchar(254) | YES  |     | NULL              | 
 time         | timestamp    | NO   |     | CURRENT_TIMESTAMP | ON UPDATE



The filling factor for each table after the initial write was:

Table    tbtest1    total    1046    least    745    bytes per char: 3
Table    tbtest2    total    1046    least    745    bytes per char: 3
Table    tbtest3    total    1046    least    745    bytes per char: 3
Table    tbtest4    total    1046    least    745    bytes per char: 3
Table    tbtest5    total    1046    least    745    bytes per char: 3

Table    tbtest_child1    total    779    least    648    bytes per char: 3
Table    tbtest_child2    total    779    least    648    bytes per char: 3
Table    tbtest_child3    total    779    least    648    bytes per char: 3
Table    tbtest_child4    total    779    least    648    bytes per char: 3


Finally the total size of the data set was of 20Gb.


The inserts were using batch approach of 50 inserts per Insert command for all the platforms.

Below the summary of the tests to run

oltp 5 + 4 table write 4 -> 32

oltp 5 + 4 table read (IN) 4 -> 32

oltp 5 + 4 table read (RANGE) 4 -> 32

oltp 5 + 4 table write/read(IN) 4 -> 32

4. Results

Results for write using 4 to 32 concurrent threads


4.1. Write Execution time

(High value is bad)


As the graph clearly shows, the behavior of the RDS and EC2 with one EBS is quite similar, while the EC2 running a RAID of EBS is maintaining good response time and scales in writes, the other two have a collapse point at 16 Threads, after which performance are becoming seriously affected.


4.2. Rows inserted

(High values is good)



Consistently the number of Rows inserted in a defined period of time, see again the EC2 with RAID5 performing in the optimal way in relation to the other two.

During this test the performance loss starts at 8 threads, for EC2 solutions, while for the RDS solution it is with the increase of concurrency that we immediately see the performance degradation.



4.3. Select Execution time with IN

(High value is bad)



Using the select with IN given the high efficiency of the IN approach, and the reduce number of reads that require to be executed on disk, all the instance maintain a good level of performance.


4.4. Rows reads with IN

(High value is good)



In this case all the instances are consistently performing, but the EC2 with RAID solution can serve a larger amount of requests, almost 1/3 larger then of the RDS.



4.5. Select Execution time with RANGE

(High value is bad)



In the case of range selects and heavy access on disks, the RDS and EC2 with 1 EBS, are absolutely not able to perform at the same level of the RAID solution. This quite obviously related to the amount of data needs to be read from disks, and the limitation existing in RDS and 1 EBS solutions.



4.6. Rows reads with RANGE

(High value is good)


The volume test confirms and highlights the different behavior between EC2 RAID and the others, at 32 concurrent threads the RDS solution tends to collapse, while the EC2 RAID is serving successfully the traffic also if with less efficiency.

4.7. Select Execution time with mix of SELECT and INSERT

(High value is bad)



4.8. Rows reads with mix of SELECT and INSERT

(High value is good)



In a mix workload, I had unexpected results, with EC2 1 EBS behaving very badly when working with more then 16 threads, this given the I/O contention and possible RDS optimizations, implemented by Amazon to prevent single EBS problems.

Except that the RDS and the EC2 with RAID behave as I was expecting, with EC2 able to manage a larger volume of traffic, and the Inserts limiting the number of reads, as expected.



5. Conclusions

The comparison between RDS and EC2, cover several areas, from performance to High Availability.


My conviction is that RDS is not implementing a solid and trustable HA solution given the not clear way synchronous replication is implemented.

RDS is not applying correct best practices for replication given the use of STATEMENT format and the limitation existing in the replication management.

Finally RDS is not really efficient in managing large volume of traffic, or applications with a large number of highly concurrent threads.


Never the less it could be a temporary solution for very basic utilization in application that do not have demanding requirements.

RDS can probably further optimize, but I am sure it will never be enough to consider RDS production ready.


EC2 is more flexible and allow better tuning and control of the platform, multiple HA solutions and full control of replication and MySQL in general. All these define the significant difference with RDS, and draw the line for the right use of the tool.

Also there is not a difference in what kind of MySQL distribution we will implement, given that the source of the issue is on the platform.


My final advice is to use RDS for development or as temporary solution in a start-up, but it should not be use in the case of critical system or consolidated mature application, which require high available, scalable database support.










Last Updated on Sunday, 18 August 2013 17:09

Page 10 of 19

Who's Online

We have 6 guests online