Support for Percona XtraDB Cluster in ProxySQL (Part One)
In recent times I have been designing several solutions focused on High Availability and Disaster Recovery. Some of them using Percona Server for MySQL with group replication, some using Percona XtraDB Cluster (PXC). What many of them had in common was the use of ProxySQL for the connection layer. This is because I consider the use of a layer 7 Proxy preferable, given the possible advantages provided in ReadWrite split and SQL filtering.
The other positive aspect provided by ProxySQL, at least for Group Replication, is the native support which allows us to have a very quick resolution of possible node failures.
ProxySQL has Galera support as well, but in the past, that had shown to be pretty unstable, and the old method to use the scheduler was still the best way to go.
After Percona Live Online 2020 I decided to try it again and to see if at least the basics were now working fine.
What I Have Tested
I was not looking for complicated tests that would have included different levels of transaction isolation. I was instead interested in the more simple and basic ones. My scenario was:
1 ProxySQL node v2.0.15 (192.168.4.191)
1 ProxySQL node v2.1.0 (192.168.4.108)
3 PXC 8.20 nodes (192.168.4.22/23/233) with internal network (10.0.0.22/23/33)
ProxySQL was freshly installed.
All the commands used to modify the configuration are here. Tests were done first using ProxySQL v2.015 then v2.1.0. Only if results diverge I will report the version and results.
PXC- Failover Scenario
As mentioned above I am going to focus on the fail-over needs, period. I will have two different scenarios:
- Node crash
From the ProxySQL point of view I will have three scenarios always with a single Primary:
- Writer is NOT a reader (option 0 and 2)
- Writer is also a reader
The configuration of the native support will be:
Galera host groups:
- Writer: 100
- Reader: 101
- Backup_writer: 102
- Offline_hostgroup: 9101
Before going ahead let us analyze the Mysql Servers settings. As you can notice I am using the weight attribute to indicate ProxySQL which is my preferred write. But I also use weight for the READ Host Group to indicate which servers should be used and how.
Given that we have that:
- 192.168.4.22 is the preferred Primary
- 192.168.4.23 is the first failover
- 192.168.4.233 is the last chance
- 192.168.4.233/23 have the same weight and load should be balanced between the two of them
- The 192.168.4.22 given is the preferred writer should NOT receive the same load in reads and have a lower weight value.
The first test is to see how the cluster will behave in the case of 1 Writer and 2 readers, with the option writer_is_also_reader = 0.
To achieve this the settings for proxysql will be:
As soon as I load this to runtime, ProxySQL should move the nodes to the relevant Host Group. But this is not happening, instead, it keeps the readers in the writer HG and SHUN them.
This is, of course, wrong. But why does it happen?
The reason is simple. ProxySQL is expecting to see all nodes in the reader group with READ_ONLY flag set to 1.
In ProxySQL documentation we can read:
writer_is_also_reader=0: nodes with read_only=0 will be placed either in the writer_hostgroup and in the backup_writer_hostgroup after a topology change, these will be excluded from the reader_hostgroup.
This is conceptually wrong.
A PXC cluster is a tightly coupled replication cluster, with virtually synchronous replication. One of its benefits is to have the node “virtually” aligned with respect to the data state.
In this kind of model, the cluster is data-centric, and each node shares the same data view.
What it also means is that if correctly set the nodes will be fully consistent in data READ.
The other characteristic of the cluster is that ANY node can become a writer anytime. While best practices indicate that it is better to use one Writer a time as Primary to prevent certification conflicts, this does not mean that the nodes not currently elected as Primary, should be prevented from becoming a writer.
Which is exactly what READ_ONLY flag does if activated.
Not only, the need to have READ_ONLY set means that we must change it BEFORE we have the node able to become a writer in case of fail-over.
This, in short, means the need to have either a topology manager or a script that will do that with all the relative checks and logic to be safe. Which in time of fail-over means it will add time and complexity when it’s not really needed and that goes against the concept of the tightly-coupled cluster itself.
Given the above, we can say that this ProxySQL method related to writer_is_also_reader =0, as it is implemented today for Galera, is, at the best, useless.
Why is it working for Group Replication? That is easy; because Group Replication internally uses a mechanism to lock/unlock the nodes when non-primary, when using the cluster in single Primary mode. That internal mechanism was implemented as a security guard to prevent random writes on multiple nodes, and also manage the READ_ONLY flag.
Let us move on and test with writer_is_also_reader = 2. Again from the documentation:
writer_is_also_reader=2 : Only the nodes with read_only=0 which are placed in the backup_writer_hostgroup are also placed in the reader_hostgroup after a topology change i.e. the nodes with read_only=0 exceeding the defined max_writers.
Given the settings as indicated above, my layout before using Galera support is:
So node ending with 22 (the Primary elected) is not in the reader pool. Which can be ok, I assume.
But what is not OK at all is that the READERS have now a completely different weight. Nodes x.23 and x.233 are NOT balancing the load any longer, because the weight is not the same or the one I define. It is instead copied over from the WRITER settings.
Well of course this is wrong and not what I want. Anyhow, let’s test the READ failover.
I will use sysbench read-only:
And application failed:
FATAL: mysql_drv_query() returned error 9001 (Max connect timeout reached while reaching hostgroup 101 after 10000ms) for query ‘SELECT id, millid, date,active,kwatts_s FROM windmills2 WHERE id=9364’
Now, this may be like this by design, but I have serious difficulties understanding what the reasoning is here, given we allow a platform to fail serving while we still have a healthy server.
Last but not least I am not allowed to decide WHICH the backup_writers are, ProxySQL will choose them from my writer list of servers. SO why not also include the one I have declared as Primary, at least in case of needs? ¯\_(ツ)_/¯
Ok last try with writer_is_also_reader = 1.
And now I have:
Now as you may have already realized, the point here is that, YES I have my node x.22 (Primary) able to get the READS as well, but the node was taking the whole load from the beginning. This is because of the shift ProxySQL had done in regards to the weight.
This happens because while internally ProxySQL initially populates the internal table mysql_servers_incoming with the data from the mysql_servers, after several steps that information is overwritten using the data coming from the writer also for the readers.
Messing up the desired results.
Failover due to maintenance. In this case, I will set the writer pxc_maint_mode = MAINTENANCE to failover to another writer.
The sysbench command used:
As we can see ProxySQL does the failover to node x.23 as expected. But it also adds the node in the HG 9101, which is supposed to host the offline servers.
So why move the Primary there?
Once maintenance is over, disable pxc_main_mode will restore the master. In short, ProxySQL will fail-back.
The whole process will be not impactful if the application is NOT using sticky connection, otherwise, the application will have to deal with:
- Error with the connection
- Retry cycle to re-run the drop DML
Failover Because of a Crash
To check the next case I will add –mysql-ignore-errors=all to sysbench, to be able to see how many errors I will have and for how long, when in the need to failover. To simulate a crash I will KILL -9 the mysqld process on the writer.
Check below the data taken one after the other nodeS start to take over, keep in mind here the PRIMARY was node 192.168.4.233:
In the end, only one node will remain as Primary, but for an X amount of time, both were serving also if only ONE node was declared ONLINE.
A Problem Along the Road… (only with v2.0.15)
While I was trying to “fix” the issue with the weight for READERS…
Let’s say we have this:
And I want to release some of the READ load from WRITER (currently 192.168.4.23).
If I simply do:
it changes the value (randomly) also for the HG 102 which will impact also the WRITER HG causing a failover. At this point, I stopped testing. Too many things making a failover scenario too unpredictable.
ProxySQL has a great concept behind it and is for sure covering a really needed gap existing in the MySQL environment, optimizing and powering up the connection layer between the application layer to the data layer.
But, in regards to the Galera support, we are not there. The support provided is not only limited, it is fallacious, and could lead to serious and/or unexpected problems. Also using the option writer_is_also_reader=1, which is the only one worthy of usage, we still see too many issues in how the nodes are managed in case of serious events as failover.
ProxySQL v2.1.0 seems to have fixed some instabilities, but we still have too many open issues to trust the Galera native support. My advice is to stay away from it and use the scheduler to deal with the Galera cluster. Write a robust script that will cover your specific needs if you must customize the actions. A scheduler will serve you well.
If too lazy to do so, there is a sample in Percona-Lab. This is the old script used in ProxySQL 1.4.x modified to work with ProxySQL 2.x. I have also written one a long time ago that can help as well here. Both come without any guarantee and I advise you to use them as examples for your own, see Part 2 of this post for details.
Finally, let me say that ProxySQL is a great tool, but no tool can cover all. People like me that have been around for long enough have seen this happening many times, and it is of no surprise.
Great MySQL to all.