My MySQL tips

ProxySQL support for MySQL caching_sha2_password

In our time, every day we use dozens if not hundreds of applications connecting to some kind of data repository. This simple step is normally executed over the network and given so, it is subject to possible sniffing with all the possible related consequences. brokenlock

Given that it is normally better to protect your connection using data encryption like SSL, or at the minimum, make the information you pass to connect less easy to be intercepted.

At the same time it is best practice to not store connection credential in clear text, not even inside a table in your database. Doing that is the equivalent of writing your password over a sticky note on your desk. Not a good idea.

The main options are instead in either transforming the passwords to be less identifiable like hashing or to store the information in an external centralized vault.

In MySQL the passwords are transformed in order to not be clear text, and several different plugins are used to authenticate the user. From version 8 MySQL uses caching_sha2_password as default authentication plugin. The caching_sha2_password and sha256_password authentication plugins provide more secure password encryption than the mysql_native_password plugin, and caching_sha2_password provides better performance than sha256_password. Due to these superior security and performance characteristics of caching_sha2_password, it is as of MySQL 8.0 the preferred authentication plugin, and is also the default authentication plugin rather than mysql_native_password.

In this regard recently I got the same question again “Can we use ProxySQL with MySQL 8 authorization mechanism?”, and I decided it was time to write this short blog post.

The short answer is “Yes you can”, however do not expect to have full caching_sha2_password support.

This is because ProxySQL does not fully support the caching_sha2_password mechanism internally and given that a “trick” must be used.

So, what should we do when using MySQL 8 and ProxySQL?

In the text below we will see what can be done to continue to use ProxySQL with MySQL and Percona server 8.

Note that I have used the Percona proxysql_admin tool to manage the users except in the last case.
Percona proxysql_admin tool is a nice tool that helps you to manage ProxySQL and in regard to user it also manage and synchronize users from your Percona or MySQL

In the following examples:

Proxysql is on 192.168.4.191

User name/password is msandbox/msandbox

Using hashing.

By default MySQL comes with caching_sha2_password as such if I create a user names msandbox I will have:

DC1-1(root@localhost) [(none)]>select user,host, authentication_string,plugin from mysql.user order by 1,2;
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+
| user                       | host               | authentication_string                                                  | plugin                |
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+
| msandbox                   | %                  | $A$005$Z[z@l'O%[Q5t^ EKJDgxjWXJjDpDEUv91oL7Hoh/0NydTeCzpV.aI06C9.      | caching_sha2_password |      <---- this user     
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+

Then I use percona_scheduler_admin to sync the users:

./percona-scheduler-admin --config-file=config.toml --syncusers 
Syncing user accounts from PXC(192.168.4.205:3306) to ProxySQL
Removing existing user from ProxySQL: msandbox
Adding user to ProxySQL: msandbox

Synced PXC users to the ProxySQL database!

mysql> select * from mysql_users ;
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+
| username   | password                                                               | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | attributes | comment                     |
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+
| msandbox   | $A$005$Z[z@l'O%[Q5t^ EKJDgxjWXJjDpDEUv91oL7Hoh/0NydTeCzpV.aI06C9       | 1      | 0       | 100               | NULL           | 0             | 1                      | 0            | 1       | 1        | 10000           |            |                             |
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+

And set the query rules:

insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1048,6033,'msandbox',100,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1050,6033,'msandbox',101,1,3,'^SELECT.*$',1);

load mysql query rules to run;save mysql query rules to disk;

Now I try to connect passing by ProxySQL:

# mysql -h 192.168.4.191 -P6033  -umsandbox -pmsandbox
ERROR 1045 (28000): ProxySQL Error: Access denied for user 'msandbox'@'192.168.4.191' (using password: YES)

My account will fail to connect given failed authentication.

To fix this I need to drop the user and recreate it with a different authentication plugin in my MySQL server:

drop user msandbox@'%';
create user 'msandbox'@'%' identified with mysql_native_password  BY 'msandbox';
grant select on *.* to 'msandbox'@'%';

select user,host, authentication_string,plugin from mysql.user order by 1,2;
+----------+--------------------+-------------------------------------------+-----------------------+
| user     | host               | authentication_string                     | plugin                |
+----------+--------------------+-------------------------------------------+-----------------------+
| msandbox | %                  | *6C387FC3893DBA1E3BA155E74754DA6682D04747 | mysql_native_password |
+----------+--------------------+-------------------------------------------+-----------------------+

At this point I can re-run

./percona-scheduler-admin --config-file=config.toml --syncusers

if I try to connect again:

# mysql -h 192.168.4.191 -P6033  -umsandbox -pmsandbox
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 6708563
Server version: 8.0.28 (ProxySQL). <---------------------------- Connecting to proxysql

Copyright (c) 2000, 2022, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show global variables like 'version%';
+-------------------------+------------------------------------------------------------------------------------+
| Variable_name           | Value                                                                              |
+-------------------------+------------------------------------------------------------------------------------+
| version                 | 8.0.25-15.1         <--- Percona/MySQL version                                     |
| version_comment         | Percona XtraDB Cluster binary (GPL) 8.0.25, Revision 8638bb0, WSREP version 26.4.3 |
| version_compile_machine | x86_64                                                                             |
| version_compile_os      | Linux                                                                              |
| version_compile_zlib    | 1.2.11                                                                             |
| version_suffix          | .1                                                                                 |
+-------------------------+------------------------------------------------------------------------------------+
6 rows in set (0.02 sec)

This is the only way to keep the password hashed in MySQL and in ProxySQL.

Not using Hashing

What if you cannot use mysql_native_password for the password in your MySQL server?

There is a way to still connect, however I do not recommend it given for me is highly insecure, but for completeness I am going to illustrate it.

First of all disable password hashing in Proxysql:

update global_variables set Variable_Value='false' where Variable_name='admin-hash_passwords';

At this point instead sync the user you can locally create the user like:

insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('msandbox','msandbox',1,100,'mysql',1,'generic test for security'); 
mysql> select * from runtime_mysql_users where username ='msandbox'; 
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+ 
| username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | attributes | comment                   | 
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+ 
| msandbox | msandbox | 1      | 0       | 100               | mysql          | 0             | 1                      | 0            | 1       | 0        | 10000           |            | generic test for security | 
| msandbox | msandbox | 1      | 0       | 100               | mysql          | 0             | 1                      | 0            | 0       | 1        | 10000           |            | generic test for security | 
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+

As you can see doing that will prevent the password to be hashed and instead it will be clear text.

At this point you will be able to connect to MySQL 8 using the caching_sha2_password, but the password is visible in ProxySQL.

Let me repeat, I DO NOT recommend using it this way, because for me it is highly insecure.

Conclusion

While it is still possible to configure your user in MySQL to connect using ProxySQL, it is obvious that we have a gap in the way ProxySQL supports security.

The hope is that this gap will be filled soon by the ProxySQL development team, also if looking to the past issues this seems pending from years now.

References

https://proxysql.com/documentation/mysql-8-0/

https://github.com/sysown/proxysql/issues/2580

https://www.percona.com/blog/upgrade-your-libraries-authentication-plugin-caching_sha2_password-cannot-be-loaded/

Zero impact on index creation with Aurora 3

aurora ddl notes Last quarter of 2021 AWS released Aurora version 3. This new version aligns Aurora with the latest MySQL 8 version porting many of the advantages MySQL 8 has over previous versions.

While this brings a lot of new interesting features for Aurora, what we are going to cover here is to see how DDLs behave when using the ONLINE option. With a quick comparison with what happens in MySQL 8 standard and with Group Replication.

Tests

All tests were run on an Aurora instance r6g.large with secondary availability zone.
The test was composed by:

4 connections

- #1 to perform ddl
- #2 to perform insert data in the table I am altering
- #3 to perform insert data on a different table
- #4 checking the other node operations

In the Aurora instance, a sysbench schema with 10 tables and 5 million rows was created, just to get a bit of traffic. While the test table with 5ml rows as well was:

CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`),
  KEY `idx_1` (`uuid`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC

The executed commands:

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills3  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";mysql --defaults-file=./my.cnf -h <secondary aurora instance> -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Operations:
1) start inserts from connections
2) start commands in connections 4 - 5 on the other nodes
3) execute: DC1-1(root@localhost) [windmills_large]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;

With this, what I was looking to capture is the operation impact in doing a common action as creating an Index. My desired expectation is to have no impact when doing operations that are declared “ONLINE” such as creating an index, as well as data consistency between nodes.

Let us see what happened…

Results

While running the insert in the same table, performing the alter:

mysql>  ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (16.51 sec)
Records: 0  Duplicates: 0  Warnings: 0

Is NOT stopping the operation in the same table or any other table in the Aurora instance.

We can only identify a minimal performance impact:

[root@ip-10-0-0-11 tmp]# while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
.347
.283
.278
.297
.291
.317
.686  ← start
<Snip>
.512  ← end
.278
.284
.279

The secondary node is not affected at all, and this is because Aurora managed at storage level the data replication. Given that there is no such thing as Apply from Relaylog, as we have in standard MySQL asynchronous or data replicated with Group Replication.

The result is that in Aurora 3, we can have zero impact index (or any other ONLINE/INSTANT) operation, with this I include the data replicated in the other instances for High Availability.

If we compare this with Group replication (see blog):

                               	 	 	  GR         Aurora 3
Time on hold for insert for altering table   	~0.217 sec   ~0.523 sec
Time on hold for insert for another table   	~0.211 sec   ~0.205 sec

However, keep in mind that MySQL with Group Replication will still need to apply the data on the Secondaries. This means that if your alter was taking 10 hours to build the index, the Secondary nodes will be misaligned with the Source for approximately another 10 hours.

With Aurora 3 or with PXC, changes will be there when Source has completed the operation.

What about Percona XtraDB Cluster (PXC)? Well, with PXC we have a different scenario:

                               	 	 	 PXC(NBO)     Aurora 3
Time on hold for insert for altering table   	~120 sec      ~0.523 sec
Time on hold for insert for another table   	~25  sec      ~0.205 sec

We will have a higher impact while doing the Alter operation, but the data will be on all nodes at the same time maintaining a high level of consistency in the cluster.

Conclusion

Aurora is not for all use, and not for all budgets, however it has some very good aspects like the one we have just seen. The Difference between standard MySQL and Aurora is not in the time of holding/locking (aka operation impact), but on the HA aspects. If I have my data/structure on all my Secondary at the same time of the Source, I will feel much more comfortable, than having to wait an additional T time.

This is why PXC in that case is a better alternative if you can afford locking time, if not, well Aurora 3 is your solution, just do your math properly and be conservative with the instance resources.

A face to face with semi-synchronous replication

Last month I performed a review of the Percona Operator for MySQL Server (https://www.percona.com/doc/kubernetes-operator-for-mysql/ps/index.html) which is still Alpha. That operator is based on Percona Server and uses standard asynchronous replication, with the option to activate semi-synchronous replication to gain higher levels of data consistency between nodes.

The whole solution is composed as:

Additionally, Orchestrator (https://github.com/openark/orchestrator) is used to manage the topology and the settings to enable on the replica nodes, the semi-synchronous flag if required.
While we have not too much to say when using standard Asynchronous replication, I want to spend two words on the needs and expectations on the semi-synchronous (semi-sync) solution.

A look into semi-synchronous

Difference between Async and Semi-sync.
Asynchronous:

The above diagram represents the standard asynchronous replication. This method is expected by design, to have transactions committed on the Source that are not present on the Replicas. The Replica is supposed to catch-up when possible.

It is also important to understand that there are two steps in replication:

Data copy, which is normally very fast. The Data is copied from the binlog of the Source to the relay log on the Replica (IO_Thread).
Data apply, where the data is read from the relay log on the Replica node and written inside the database itself (SQL_Thread). This step is normally the bottleneck and while there are some parameters to tune, the efficiency to apply transactions depends on many factors including schema design.

Production deployments that utilize the Asynchronous solution are typically designed to manage the possible inconsistent scenario given data on Source is not supposed to be on Replica at commit. At the same time the level of High Availability assigned to this solution is lower than the one we normally obtain with (virtually-)synchronous replication, given we may need to wait for the Replicare to catch-up the gap accumulated in the relay-logs before performing the fail-over.

Semi-sync:

The above diagram represents the Semi-sync replication method.The introduction of semi-sync adds a checking step on the Source before it returns the acknowledgement to the client.
This step happens at the moment of the data-copy, so when the data is copied from the Binary-log on Source to the Relay-log on Replica.

This is important, there is NO mechanism to ensure a more resilient or efficient data replication, there is only an additional step, that tells the Source to wait a given amount of time for an answer from N replicas, and then return the acknowledgement or timeout and return to the client no matter what.

This mechanism is introducing a possible significant delay in the service, without giving the 100% guarantee of data consistency.

In terms of availability of the service, when in presence of high load, this method may lead the Source to stop serving the request while waiting for acknowledgements, significantly reducing the availability of the service itself.

At the same time only acceptable settings for rpl_semi_sync_source_wait_point is AFTER_SYNC (default) because: In the event of source failure, all transactions committed on the source have been replicated to the replica (saved to its relay log). An unexpected exit of the source server and failover to the replica is lossless because the replica is up to date.

All clear? No? Let me simplify the thing.

In standard replication you have two moments (I am simplifying)
- Copy data from Source to Replica
- Apply data in the Replica node
There is no certification on the data applied about its consistency with the Source
With asynchronous the Source task is to write data in the binlog and forget
With semi-sync the Source writes the data on binlog and waits T seconds to receive acknowledgement from N servers about them having received the data.

To enable semi-sync you follow these steps: https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html

In short:

Register the plugins
Enable Source rpl_semi_sync_source_enabled=1
Enable Replica rpl_semi_sync_replica_enabled = 1
- If replication is already running STOP/START REPLICA IO_THREAD

And here starts the fun, be ready for many “wait whaaat?”.

What is the T and N I have just mentioned above?

Well the T is a timeout that you can set to avoid having the source wait forever for the Replica acknowledgement. The default is 10 seconds. What happens if the Source waits for more than the timeout?
rpl_semi_sync_source_timeout controls how long the source waits on a commit for acknowledgment from a replica before timing out and reverting to asynchronous replication.

Careful of the wording here! The manual says SOURCE, so it is not that MySQL revert to asynchronous, by transaction or connection, it is for the whole server.

Now analyzing the work-log (see https://dev.mysql.com/worklog/task/?id=1720 and more in the references) the Source should revert to semi-synchronous as soon as all involved replicas are aligned again.

However, checking the code (see https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844 and following), we can see that we do not have a 100% guarantee that the Source will be able to switch back.

Also in the code:
But, it is not that easy to detect that the replica has caught up. This is caused by the fact that MySQL's replication protocol is asynchronous, meaning that if thesource does not use the semi-sync protocol, the replica would not send anything to thesource.

In all the runned tests the Source was not able to switch back. In short Source was moving out from semi-sync and that was forever, no rollback. Keep in mind that while we go ahead.

What is the N I mentioned above? It represents the number of Replicas that must provide the acknowledgement back.

If you have a cluster of 10 nodes you may need to have only 2 of them involved in the semi-sync, no need to include them all. But if you have a cluster of 3 nodes where 1 is the Source, relying on 1 Replica only, is not really secure. What I mean here is that if you choose to be semi-synchronous to ensure the data replicates, having it enabled for one single node is not enough, if that node crashes or whatever, you are doomed, as such you need at least 2 nodes with semi-sync.

Anyhow, the point is that if one of the Replica takes more than T to reply, the whole mechanism stops working, probably forever.

As we have seen above, to enable semi-sync on Source we manipulate the value of the GLOBAL variable rpl_semi_sync_source_enabled.

However if I check the value of rpl_semi_sync_source_enabled when the Source shift to simple Asynchronous replication because timeout:

select @@rpl_semi_sync_source_enabled;

select @@rpl_semi_sync_source_enabled;
+--------------------------------+
| @@rpl_semi_sync_source_enabled |
+--------------------------------+
|                              1 |
+--------------------------------+

As you can see the Global variable reports a value of 1, meaning that semi-sync is active also if not.

In the documentation it is reported that to monitor the semi-sync activity we should check for Rpl_semi_sync_source_status. Which means that you can have Rpl_semi_sync_source_status = 0 and rpl_semi_sync_source_enabled =1 at the same time.

Is this a bug? Well according to documentation:
When the source switches between asynchronous or semisynchronous replication due to commit-blocking timeout or a replica catching up, it sets the value of the Rpl_semi_sync_source_status or Rpl_semi_sync_source_status status variable appropriately. Automatic fallback from semisynchronous to asynchronous replication on the source means that it is possible for the rpl_semi_sync_source_enabled or rpl_semi_sync_source_enabled system variable to have a value of 1 on the source side even when semisynchronous replication is in fact not operational at the moment. You can monitor the Rpl_semi_sync_source_status or Rpl_semi_sync_source_status status variable to determine whether the source currently is using asynchronous or semisynchronous replication.

It is not a bug. However, because you documented it, it doesn’t change the fact this is a weird/unfriendly/counterintuitive way of doing, that opens the door to many, many possible issues. Especially given you know the Source may fail to switch semi-synch back.

Just to close this part, we can summarize as follows:

You activate semi-sync setting a global variable
Server/Source can disable it (silently) without changing that variable
Server will never restore semi-sync automatically
The way to check if semi-sync works is to use the Status variable
When Rpl_semi_sync_source_status = 0 and rpl_semi_sync_source_enabled =1 you had a Timeout and Source is now working in asynchronous replication
The way to reactivate semi-sync is to set rpl_semi_sync_source_enabled to OFF first then rpl_semi_sync_source_enabled = ON.
Replicas can be set with semi-sync ON/OFF but unless you do not STOP/START the replica_IO_THREAD the state of the variable can be inconsistent with the state of the Server.

What can go wrong?

Semi-synchronous is not seriously affecting the performance

Others had already discussed semi-sync performance in better details. However I want to add some color given the recent experience with our operator testing.
In the next graphs I will show you the behavior of writes/reads using Asynchronous replication and the same load with Semi-synchronous.
For the record the test was a simple Sysbench-tpcc test using 20 tables, 20 warehouses, 256 threads for 600 seconds.

The one above indicates a nice and consistent set of load in r/w with minimal fluctuations. This is what we like to have.

The graphs below, represent the exact same load on the exact same environment but with semi-sync activated and no timeout.

Aside from the performance loss (we went from Transaction 10k/s to 3k/s), the constant stop/go imposed by the semi-sync mechanism has a very bad effect on the application behavior when you have many concurrent threads and high loads. I challenge any serious production system to work in this way.

Of course results are inline with this yoyo game:

In the best case, when all was working as expected, and no crazy stuff happening I had something around the 60% loss. I am not oriented to see this as a minor performance drop.

But at least your data is safe

As already stated at the beginning the scope of semi-synchronous replication is to guarantee that the data in server A reaches server B before returning the OK to the application.

In short, given a period of 1 second we should have minimal transactions in flight and limited transactions in the apply queue. While for standard replication (asynchronous), we may have … thousands.

In the graphs below we can see two lines:

The yellow line represents the number of GTIDs “in flight” from Source to destination, Y2 axes. In case of Source crash, those transactions are lost and we will have data loss.
The blue line represents the number of GTIDs already copied over from Source to Replica but not applied in the database Y1 axes. In case of Source crash we must wait for the Replica to process these entries, before making the node Write active, or we will have data inconsistency.

Asynchronous replication:

As expected we can see a huge queue in applying the transactions from relay-log, and some spike of transactions in flight.

Using Semi-synchronous replication:

Yes, apparently we have reduced the queue and no spikes so no data loss.

But this happens when all goes as expected, and we know in production this is not the normal.
What if we need to enforce the semi-sync but at the same time we cannot set the Timeout to ridiculous values like 1 week?

Simple, we need to have a check that puts back the semi-sync as soon as it is silently disabled (for any reason).
However doing this without waiting for the Replicas to cover the replication gap, cause the following interesting effects:

Thousands of transactions queued and shipped with the result of having a significant increase of the possible data loss and still a huge number of data to apply from the relay-log.

So the only possible alternative is to set the Timeout to a crazy value, However this can cause a full production stop in the case a Replica hangs or for any reason it disables the semi-sync locally.

Conclusion

First of all I want to say that the tests on our operator using Asynchronous replication, shows a consistent behavior with the standard deployments in the cloud or premises. It has the same benefits, like better performance and same possible issues as longer time to failover when it needs to wait a Replica to apply the relay-log queue.

The semi-synchronous flag in the operator is disabled, and the tests I have done bring me to say “keep it like that!”. At least unless you know very well what you are doing and are able to deal with a semi-sync timeout of days.

I was happy to have the chance to perform these tests, because they gives me a way/time/need to investigate more on the semi-synchronous feature.
Personally, I was not convinced about the semi-synchronous replication when it came out, and I am not now. I never saw a less consistent and less trustable feature in MySQL as semi-sync.

If you need to have a higher level of synchronicity in your database just go for Group Replication, or Percona XtraDB Cluster and stay away from semi-sync.

Otherwise, stay on Asynchronous replication, which is not perfect but it is predictable.

References

https://www.percona.com/blog/2012/01/19/how-does-semisynchronous-mysql-replication-work/

https://www.percona.com/blog/percona-monitoring-and-management-mysql-semi-sync-summary-dashboard/

https://www.percona.com/blog/2012/06/14/comparing-percona-xtradb-cluster-with-semi-sync-replication-cross-wan/

https://datto.engineering/post/lossless-mysql-semi-sync-replication-and-automated-failover

https://planetscale.com/blog/mysql-semi-sync-replication-durability-consistency-and-split-brains

https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html

https://dev.mysql.com/worklog/task/?id=1720

https://dev.mysql.com/worklog/task/?id=6630

https://dev.mysql.com/worklog/task/?id=4398

https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844

https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L881

Online DDL with Group Replication In MySQL 8.0.27

Last April 2021 I wrote an article about Online DDL and Group Replication. At that time we were dealing with MySQL 8.0.23 and also opened a bug report which did not have the right answer to the case presented.

Anyhow, in that article I have shown how an online DDL was de facto locking the whole cluster for a very long time even when using the consistency level set to EVENTUAL.

This article is to give justice to the work done by the MySQL/Oracle engineers to correct that annoying inconvenience.

Before going ahead, let us remember how an Online DDL was propagated in a group replication cluster, and identify the differences with what happens now, all with the consistency level set to EVENTUAL (see).

In MySQL 8.0.23 we were having:

While in MySQL 8.0.27 we have:

As you can see from the images we have 3 different phases. Phase 1 is the same between version 8.0.23 and version 8.0.27.

Phase 2 and 3 instead are quite different. In MySQL 8.0.23 after the DDL is applied on the Primary it is propagated to the other nodes, but a metalock was also acquired and the control was NOT returned. The result was that not only the session executing the DDL was kept on hold, but also all the other sessions performing modifications.

Only when the operation was over on all secondaries, the DDL was pushed to Binlog and disseminated for Asynchronous replication, lock raised and operation can restart.

Instead, in MySQL 8.0.27, once the operation is over on the primary the DDL is pushed to binlog, disseminated to the secondaries and control returned. The result is that the write operations on primary have no interruption whatsoever and the DDL is distributed to secondary and Asynchronous replication at the same time.

This is a fantastic improvement, available only with consistency level EVENTUAL, but still, fantastic.

Let's see some numbers.

To test the operation, I have used the same approach used in the previous tests in the article mentioned above.

Connection 1:
    ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
    ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
    
Connection 2:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:
 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmill8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4-5:
     while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done

Modifying a table with ~5 million rows:

node1-DC1 (root@localhost) [windmills_large]>select count(*) from  windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+

The numbers below represent the time second.milliseconds taken by the operation to complete. While I was also catching the state of the ALTER on the other node I am not reporting it here given it is not relevant.

EVENTUAL (on the primary only)
-------------------
Node 1 same table:
.184
.186 <--- no locking during alter on the same node
.184
<snip>
.184
.217 <--- moment of commit
.186
.186
.186
.185

Node 1 another table :
.189
.198 <--- no locking during alter on the same node
.188
<snip>
.191
.211  <--- moment of commit
.194

As you can see there is just a very small delay at the moment of commit but other impact.

Now if we compare this with the recent tests I have done for PXC Non Blocking operation (see here) with same number of rows and same kind of table/data:

Action	Group Replication	PXC (NBO)
Time on hold for insert in altering table	~ 0.217 sec	~ 120 sec
Time on hold for insert in another table	~ 0.211 sec	~ 25 sec

However, yes there is a however, PXC was maintaining consistency between the different nodes during the DDL execution, while MySQL 8.0.27 with Group Replication was postponing consistency on the secondaries, thus Primary and Secondary were not in sync until full DDL finalization on the secondaries.

Conclusions

MySQL 8.0.27 comes with this nice fix that significantly reduces the impact of an online DDL operation on a busy server. But we can still observe a significant misalignment of the data between the nodes when a DDL is executing.

On the other hand PXC with NBO is a bit more “expensive” in time, but nodes remain aligned all the time.

At the end is what is more important for you to choose one or the other solution, consistency vs. operational impact.

Great MySQL to all.

A look into Percona XtraDB Cluster Non Blocking Operation for Online Schema Upgrade

Percona XtraDB Cluster 8.0.25 has introduced a new option to perform online schema modifications: NBO (Non Blocking Operation).

When using PXC the cluster relies on wsrep_OSU_method parameter to define the Online Schema Upgrade (OSU) method the node uses to replicate DDL statements. breaking bariers

Until now we normally have 3 options:

Use Total Isolation Order (TOI, the default)
Use Rolling Schema Upgrade (RSU)
Use Percona’s online schema change tool (TOI + PTOSC)

Each method has some positive and negative aspects. TOI will lock the whole cluster from being able to accept data modifications for the entire time it takes to perform the DDL operation. RSU will misalign the schema definition between the nodes, and in any case the node performing the DDL operation is still locked. Finally TOI+PTOSC will rely on creating triggers and copying data, so in some cases this can be very impactful.

The new Non Blocking Operation (NBO) method is to help to reduce the impact on the cluster and make it easier to perform some DDL operations.

At the moment we only support a limited set of operations with NBO like:

ALTER INDEX
CREATE INDEX
DROP INDEX

Any other command will result in an error message ER_NOT_SUPPORTED_YET.

But let us see how it works and what is the impact while we will also compare it with the default method TOI.

What we will do is working with 4 connections:

1 to perform ddl
2 to perform insert data in the table being altered
3 to perform insert data on a different table
4-5 checking the other two nodes operations

PXC must be at least Ver 8.0.25-15.1.

The table we will modify is :

DC1-1(root@localhost) [windmills_s]>show create table windmills_test\G
*************************** 1. row ***************************
       Table: windmills_test
Create Table: CREATE TABLE `windmills_test` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `millid` smallint NOT NULL,
  `kwatts_s` int NOT NULL,
  `date` date NOT NULL,
  `location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `active` tinyint NOT NULL DEFAULT '1',
  `time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_millid` (`millid`,`active`),
  KEY `IDX_active` (`id`,`active`),
  KEY `kuuid_x` (`uuid`),
  KEY `millid_x` (`millid`),
  KEY `active_x` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=8199260 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)

And contains ~5 million rows.

DC1-1(root@localhost) [windmills_s]>select count(*) from windmills_test;
+----------+
| count(*) |
+----------+
|  5002909 |
+----------+
1 row in set (0.44 sec)

The commands.
Connection 1:

  ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
  ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;

Connection 2:

while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills_test  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connection 3:

 while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills8  select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done

Connections 4-5:

while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_s)"|grep -i -v localhost;sleep 1;done

Operations:

start inserts from connections
start commands in connections 4 - 5 on the other nodes

execute:

for TOI

DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=TOI;

for NBO

DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=NBO;

For both

DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=shared;

Let us run it

Altering a table with TOI.

DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
Query OK, 0 rows affected (1 min 4.74 sec)
Records: 0  Duplicates: 0  Warnings: 0

Inserts in the altering table (connection 2):

.450
.492
64.993 <--- Alter blocks all inserts on the table we are altering
.788
.609

Inserts on the other table (connection 3):

.455
.461
64.161 <--- Alter blocks all inserts on all the other tables as well
.641
.483

On the other nodes at the same time of the ALTER we can see:

Id  User             db         Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
15	system user		windmills_s	Query	102	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	102238	0	       0    <--- time from start

So in short we have the whole cluster locked for ~64 seconds. During this period of time, all the operations to modify data or structure were on hold.

Let us now try with NBO

Inserts in the altering table:

.437
.487
120.758 <---- Execution time increase
.617
.510

Inserts on the other table:

.468
.485
25.061 <---- still a metalock, but not locking the other tables for the whole duration 
.494
.471

On the other nodes at the same time of the ALTER we can see:

Id      User         db             Command Time  State             Info                                                                            Time_ms Rows_sent Rows_examined 
110068	system user	 windmills_s	Connect	86	  altering table	ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE	 120420	 0	          0

In this case what is also interesting to note is that:

We have a moment of metalock:

110174 pmm 127.0.0.1:42728 NULL Query 2 Waiting for table metadata lock  SELECT x FROM information_schema.tables  WHERE TABLE_SCHEMA = 'windmills_s' 1486 10    0

110068  system user connecting host windmills_s Connect 111 closing tables ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 111000 0 0

The execution time is longer

Summarizing:

                               	   	 	      TOI            NBO
Time on hold for insert for altering table   	~64 sec    	~120 sec
Time on hold for insert for another table   	~64 sec      ~25 sec 
metalock                       			whole time  	 only at the end

What is happening, what are the differences and why takes longer with NBO?

Let see at very high level how the two works:

TOI: when you issue a DDL like ADD INDEX a metadata lock is taken on the table and it will be released only at the end of the operation. During this time, you cannot:
- Perform DMLs on any cluster node
- Alter another table in the cluster
NBO: the metadata lock is taken at the start and at the end for a very brief period of time. The ADD INDEX operation will then work on each node independently. The lock taken at the end is to have all the nodes agree on the operation and commit or roll back (using cluster error voting). This final phase costs a bit more in time and is what adds a few seconds to the operation execution. But during the operation:
- You can alter another table (using NBO)
- You can continue to insert data, except in the table(s) you are altering.
- On node crash the operation will continue on the other nodes, and if successful it will persist.

In short the cluster server behavior changes significantly when using NBO, offering significant flexibility compared to TOI. The cost in time should not linearly increase with the dimension of the table, but more in relation to the single node efficiency in performing the ALTER operation.

Conclusion

NBO can be significantly helpful to reduce the impact of DDL on the cluster, for now limited to the widely used creation/modification/drop of an index. But in the future … we may expand it.

The feature is still a technology preview, so do not trust in production, but test it and let us know what you think.

Final comment. Another distribution has introduced NBO, but only if you buy the enterprise version.

Percona, which is truly open source with facts not just words, has implemented NBO in standard PXC, and the code is fully open source. This is not the first one, but just another of the many features Percona is offering for free, while others ask you to buy the enterprise version.

Enjoy the product and let us have your feedback!

Great MySQL to all!

Sidebar

Main Menu Mobile

MySQL Blogs

ProxySQL support for MySQL caching_sha2_password

Using hashing.

Not using Hashing

Conclusion

References

Zero impact on index creation with Aurora 3

Tests

Results

Conclusion

A face to face with semi-synchronous replication

A look into semi-synchronous

Semi-synchronous is not seriously affecting the performance

But at least your data is safe

Conclusion

References

Online DDL with Group Replication In MySQL 8.0.27

Let's see some numbers.

Conclusions

A look into Percona XtraDB Cluster Non Blocking Operation for Online Schema Upgrade

Let us run it

Altering a table with TOI.

Let us now try with NBO

What is happening, what are the differences and why takes longer with NBO?

Conclusion

More Articles ...

Related Articles

login