Home MySQL Blogs
MySQL

My MySQL tipsvalid-rss-rogers




ProxySQL – Percona Cluster (Galera) integration PDF Print E-mail
Written by Marco Tusa   
Sunday, 11 September 2016 15:57

ProxySQL is design to do not perform any specialized operation in relation to the servers it communicate with.

Instead it has scheduled events that can be used to extend functionalities and cover any special need.

Given that specialized product like PXC, are not managed by ProxySQL and require the design and implementation of good/efficient extensions.

In this article I will illustrate how PXC/Galera can be integrated with ProxySQL to get the best from both.

 

Brief digression

Before discussing the PXC integration, we need to review a couple of very important concept in ProxySQL.ProxySQL has a very important logical component, the Hostgroup(s) (HG).
An hostgroup as a relation of :

 

+-----------+       +------------------------+
|Host group +------>|Server (1:N)            |
+-----------+       +------------------------+

 

Not only, in ProxySQL you can use QueryRules (QR) that can be directly map to an HG.

Such that you can define a specific user to go ONLY to that HG, for instance you may want to have user app1_user go only on Servers A-B-C.

The only thing you need to do is to set a QR that say this user (app1_user) had destination hostgroup 5.

 

Where HG 5 has the servers A-B-C.

 

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',5,3306,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',5,3306,10);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('app1_user',5,1);

 

 

 

Easy isn't it?

Another important concept in ProxySQL also related to HG is ReplicationHostgroup(s) (RHG).

This is a special HG that ProxySQL use to automatically manage the nodes that are connected by replication and configured in Write/Read and Read_only mode.

 

What it means?

Let us say you have 4 nodes A-B-C-D, connected by standard asynchronous replication.

Where A is the master and B-C-D are the slaves.

What you want is to have you application pointing to server A for all writes, and to B-C the reads keeping off D because is a backup slave.

Also you don't want to have any read to go to B-C if the replication delay is more than 2 seconds. 

Using RHG in conjunction with HG, ProxySQL will manage all these for you. 

 

To achieve that we only have to instruct proxy to:use RHGdefine the value of the maximum latencyUsing the example above:

 

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.5',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.6',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.7',5,3306,10,2);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.8',10,3306,10,2);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('app1_user',5,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('app1_user',6,1);
INSERT INTO mysql_replication_hostgroups VALUES (5,6);

 

From now on ProxySQL will split the R/W using the RHG and the nodes defined in HG 5. 

The flexibility that the use of the HG introduce, is obviously not limited to what I had mention here, and will play a good part in the integration between PXC and ProxySQL that I will illustrate below.

 

PXC/Galera integration

In a PXC cluster a node can have many different state and conditions that will affect if and if yes how, your application should operate on the node.

The most common one is when a node become a DONOR.

Whoever had install PXC or any Galera implementation, had face that when a node become a DONOR it will change it state to DESYNC, and/or if the node is heavy loaded, the DONOR process may affect the node itself.

But that is just one of the thing.

A node can be in state 3 JOINED but not sync, it can have wsrep_rejectqueries, wsrep_donorrejectqueries, wsrep_ready (off), it can be in a different segment and last but not least the number of nodes per segment is relevant. 

To show what can be done, and how, we will use the following setup:

5 nodes

2 segment

Applications requiring R/W split

And finally two options:

single writer node

multiple writers node

 

What will analyze here is how Proxy will behave under the use of a script run by the ProxySQL scheduler.

The use of a script is necessary to have ProxySQL act correctly to PXC state modifications.

ProxySQL comes with two scripts for galera, both of them are too basic and not considering a lot of relevant conditions.

 

As example I have wrote a more complete script https://github.com/Tusamarco/proxy_sql_tools galera_check.pl

 

The script is designed to manage a X number of nodes that belong to a given Hostgroup (HG).

The script works by HG and as such it will perform isolated actions/checks by HG.

It is not possible to have more than one check running on the same HG.

The check will create a lock file {proxysql_galera_check_${hg}.pid} that will be used by the check to prevent duplicates.

Galera_check will connect to the ProxySQL node and retrieve all the information regarding the Nodes/proxysql configuration.

It will then check in parallel each node and will retrieve the status and configuration.At the moment galera_check analyze and manage the following:

Node states:

read_only

wsrep_status

wsrep_rejectqueries

wsrep_donorrejectqueries

wsrep_connected

wsrep_desinccount

wsrep_ready

wsrep_provider

wsrep_segment

Number of nodes in by segment

Retry loop

 

As mention the number of Nodes inside a segment is relevant, if a node is the only one in a segment, the check will behave accordingly.

IE if a node is the only one in the MAIN segment, it will not put the node in OFFLINE_SOFT when the node become donor to prevent the cluster to become unavailable for the applications. 

In the script it is possible to declare a segment as MAIN, quite useful when managing prod and DR site, because the script will manage the segment acting as main in a more conservative way. 

The check can be configured to perform retries after a given interval.

Where the interval is the time define in the ProxySQL scheduler.

 

As such if the check is set to have 2 retry for UP and 3 for down, it will loop that number before doing anything.PXC/Galera does some action behind the hood, some of them not totally correct.

This feature is useful in some not well known cases where Galera behave weird.

IE whenever a node is set to READ_ONLY=1, galera desync and resync the node.

 

A check not taking this into account will cause a node to be set OFFLINE and back for no reason. 

Another important differentiation for this check is that it use special HGs for maintenance, all in range of 9000.

So if a node belong to HG 10 and the check needs to put it in maintenance mode, the node will be moved to HG 9010.

Once all is normal again, the Node will be put back on his original HG. This check does NOT modify any state of the Nodes.

 

Meaning It will NOT modify any variables or settings in the original node.

It will ONLY change node states in ProxySQL. 

 

Multiwriter mode

The MOST and recommended way to use galera is to have it in multiwriter mode.

Then play with the weight to have a node act as MAIN node and prevent/reduce certification failures and Brutal force Abort from PXC.

The configuration to use is:

 

DELETE FROM mysql_replication_hostgroups WHERE writer_hostgroup=500 ;
DELETE FROM mysql_servers WHERE hostgroup_id IN (500,501);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',500,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',501,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',500,3306,1000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',500,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',501,3306,1000000000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.8',501,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',500,3306,1);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.9',501,3306,1);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL TO DISK;

 

 

 

In this test we will NOT use Replication HostGroup.

We will do that later when testing single writer, for now ... focus on multi-writer.

Segment 1 cover HG 500 and 501 while segment 2 is only 501.
Weight for servers in HG 500 is progressive from 1 to 1 billion, this to reduce the possible random writes on the non main node.

As such Nodes:

 

HG 500
S1 192.168.1.5 - 1.000.000.000
S1 192.168.1.6 - 1.000.000
S1 192.168.1.7 - 100
S2 192.168.1.8 - 1
S2 192.168.1.9 - 1
 
HG 501
S1 192.168.1.5 - 100
S1 192.168.1.6 - 1000000000
S1 192.168.1.7 - 1000000000
S2 192.168.1.8 - 1
S2 192.168.1.9 - 1

 

 

 

The following command can be used to view what ProxySQL is doing:

 

watch -n 1 'mysql -h 127.0.0.1 -P 3310 -uadmin -padmin -t -e "select * from stats_mysql_connection_pool where hostgroup in (500,501,9500,9501) order by hostgroup,srv_host ;" -e " select hostgroup_id,hostname,status,weight,comment from mysql_servers where hostgroup_id in (500,501,9500,9501)  order by hostgroup_id,hostname ;"'

 

 

Download the check from git (https://github.com/Tusamarco/proxy_sql_tools) and activate it in ProxySQL.

 

Be sure to set the parameter that match your installation:

 

DELETE FROM scheduler WHERE id=10;
INSERT  INTO scheduler (id,active,interval_ms,filename,arg1) VALUES (10,0,2000,"/var/lib/proxysql/galera_check.pl","-u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0  --log=/var/lib/proxysql/galeraLog");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

 

 

If you want activate it:

 

UPDATE scheduler SET active=1 WHERE id=10;
LOAD SCHEDULER TO RUNTIME;

 

 

The following is the kind of scenario we have:

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 413        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 420        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 227        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 230        |
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 9        | 29     | 0       | 107358   | 8629123         | 0               | 206        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 12602425 | 613371057       | 34467286486     | 413        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 6        | 7      | 0       | 12582617 | 612422028       | 34409606321     | 420        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 6        | 6      | 0       | 18580675 | 905464967       | 50824195445     | 227        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 6        | 14     | 0       | 18571127 | 905075154       | 50814832276     | 230        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 169570   | 8255821         | 462706881       | 206        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

To generate load we will use the following commands (or whatever you like but do 2 different for read only and reads/writes)

 

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=stress_RW --mysql-password=test --mysql-db=test_galera --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=50000 --max-requests=0 --max-time=9000 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10 --mysql-ignore-errors=all run

 

 

Now the most common thing that could happen to a cluster node is to become a donor, this is a planned activity for a PXC, and it is suppose to be manage in the less harmful way.


To simulate that we will choose a node and crash it, forcing the crash node to elect as DONOR our main node (the one with highest WEIGHT).

To do so we need to have the parameter wsrep_sst_donor set in the node that will request the SST data transfer.

 

 

SHOW global VARIABLES LIKE 'wsrep_sst_donor';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| wsrep_sst_donor | node1 | <---
+-----------------+-------+

 

 

 

Activate the check if not already done:

 

UPDATE scheduler SET active=1 WHERE id=10;

 

 

And now run traffic.

Then check load:

 

SELECT * FROM stats_mysql_connection_pool WHERE hostgroup IN (500,501,9500,9501) ORDER BY hostgroup,srv_host ;
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 30     | 0       | 112662   | 9055479         | 0               | 120        | <--- our Donor
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 10     | 0       | 12654    | 1016975         | 0               | 111        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 115        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 316        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 329        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 10     | 0       | 257271   | 12533763        | 714473854       | 120        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 10       | 18     | 0       | 18881582 | 920200116       | 51688974309     | 111        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 6        | 9      | 0       | 18927077 | 922317772       | 51794504662     | 115        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 8      | 0       | 12595556 | 613054573       | 34447564440     | 316        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 3        | 6      | 0       | 12634435 | 614936148       | 34560620180     | 329        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

On one of the node:

kill mysql

remove the content of the data directory

restart the node

The node will go in SST and our galera_check script will manage it:

+--------------+-------------+--------------+------------+--------------------------------------------------+
| hostgroup_id | hostname    | STATUS       | weight     | comment                                          |
+--------------+-------------+--------------+------------+--------------------------------------------------+
| 500          | 192.168.1.5 | OFFLINE_SOFT | 1000000000 | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <---- the donor
| 500          | 192.168.1.6 | ONLINE       | 1000000    |                                                  |
| 500          | 192.168.1.7 | ONLINE       | 100        |                                                  |
| 500          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 500          | 192.168.1.9 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.5 | OFFLINE_SOFT | 100        | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.7 | ONLINE       | 1000000000 |                                                  |
| 501          | 192.168.1.8 | ONLINE       | 1          |                                                  |
| 501          | 192.168.1.9 | ONLINE       | 1          |                                                  |
+--------------+-------------+--------------+------------+--------------------------------------------------+

 

We can also check the galera_check log and see what happened:

 

2016/09/02 16:13:27.298:[WARN] Move node:192.168.1.5;3306;500;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:13:27.303:[WARN] Move node:192.168.1.5;3306;501;3010 SQL: UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'

 

 

 

The node will remain in OFFLINE_SOFT while the other node (192.168.1.6 having the 2nd WEIGHT) serves the writes, untill the node is in DONOR state.


All as expected, the node was set in OFFLINE_SOFT state, which mean the existing connections where able to finish, whole the node was not accepting any NEW connection.


As soon the node ends to send data to the Joiner, it was moved back and traffic restart:

 

2016/09/02 16:14:58.239:[WARN] Move node:192.168.1.5;3306;500;1000 SQL: UPDATE mysql_servers SET STATUS='ONLINE' WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/02 16:14:58.243:[WARN] Move node:192.168.1.5;3306;501;1000 SQL: UPDATE mysql_servers SET STATUS='ONLINE' WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 37     | 0       | 153882   | 12368557        | 0               | 72         | <---
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 10     | 0       | 16008    | 1286492         | 0               | 42         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 1398     | 112371          | 0               | 96         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 24545  | 791     | 24545    | 122725          | 0               | 359        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 15108    | 1214366         | 0               | 271        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 11     | 0       | 2626808  | 128001112       | 7561278884      | 72         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 7        | 20     | 0       | 28629516 | 1394974468      | 79289633420     | 42         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 8        | 10     | 0       | 29585925 | 1441400648      | 81976494740     | 96         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 16779  | 954     | 12672983 | 616826002       | 34622768228     | 359        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 4        | 6      | 0       | 13567512 | 660472589       | 37267991677     | 271        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

This was easy, and more or less managed also by the standard script.


But what would happened if my donor was set to DO NOT serve query when in donor state?


Wait what?? Yes PXC (Galera in general) can be set to refuse any query when the Node goes in DONOR state.

If not managed this will cause issue because the Node will simply reject queries but ProxySQL see he node alive.

 

Let me show you:

 

SHOW global VARIABLES LIKE 'wsrep_sst_donor_rejects_queries';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| wsrep_sst_donor_rejects_queries | ON    |
+---------------------------------+-------+

 

 

 

For a moment let us deactivate the check.

Then, do the same stop and delete of the data dir, then restart the node... SST take place.

And sysbench will report:

 

ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3
ALERT: mysql_drv_query() returned error 2013 (Lost connection to MySQL server during query) for query 'BEGIN'
FATAL: failed to execute function `event': 3
 

 

 

 

But and ProxySQL?

 

+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 0        | 0        | 101    | 0       | 186331   | 14972717        | 0               | 118        | <-- no writes in wither HG
| 500       | 192.168.1.6 | 3306     | ONLINE  | 0        | 9        | 10     | 0       | 20514    | 1648665         | 0               | 171        |  |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 0        | 1        | 3      | 0       | 5881     | 472629          | 0               | 134        |  |
| 500       | 192.168.1.8 | 3306     | ONLINE  | 0        | 0        | 205451 | 1264    | 205451   | 1027255         | 0               | 341        |  |
| 500       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 15642    | 1257277         | 0               | 459        |  -
| 501       | 192.168.1.5 | 3306     | ONLINE  | 1        | 0        | 13949  | 0       | 4903347  | 238627310       | 14089708430     | 118        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 20     | 0       | 37012174 | 1803380964      | 103269634626    | 171        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 11       | 13     | 0       | 38782923 | 1889507208      | 108288676435    | 134        |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 208452 | 1506    | 12864656 | 626156995       | 34622768228     | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 3        | 6      | 0       | 14451462 | 703534884       | 39837663734     | 459        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
mysql> SELECT * FROM mysql_server_connect_log WHERE hostname IN ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  ORDER BY time_start_us DESC LIMIT 10;
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                                                                          |
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827444621954 | 1359                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827444618883 | 0                       | Can't connect to MySQL server on '192.168.1.8' (107)                                                   |
| 192.168.1.7 | 3306 | 1472827444615819 | 433                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827444612722 | 538                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827444606560 | 473                     | NULL                                                                                                   | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827384621463 | 1286                    | NULL                                                                                                   |
| 192.168.1.8 | 3306 | 1472827384618442 | 0                       | Lost connection to MySQL server at 'handshake: reading inital communication packet', system error: 107 |
| 192.168.1.7 | 3306 | 1472827384615317 | 419                     | NULL                                                                                                   |
| 192.168.1.6 | 3306 | 1472827384612241 | 415                     | NULL                                                                                                   |
| 192.168.1.5 | 3306 | 1472827384606117 | 454                     | NULL                                                                                                   | <-- donor is seen as up
+-------------+------+------------------+-------------------------+--------------------------------------------------------------------------------------------------------+
select * from mysql_server_ping_log where hostname in ('192.168.1.5','192.168.1.6','192.168.1.7','192.168.1.8','192.168.1.9')  order by time_start_us desc limit 10;
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.9 | 3306 | 1472827475062217 | 311                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827475060617 | 0                    | Can't connect TO MySQL server ON '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827475059073 | 108                  | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827475057281 | 102                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827475054188 | 74                   | NULL                                                 | <-- donor is seen as up
| 192.168.1.9 | 3306 | 1472827445061877 | 491                  | NULL                                                 |
| 192.168.1.8 | 3306 | 1472827445060254 | 0                    | Can't connect to MySQL server on '192.168.1.8' (107) |
| 192.168.1.7 | 3306 | 1472827445058688 | 53                   | NULL                                                 |
| 192.168.1.6 | 3306 | 1472827445057124 | 131                  | NULL                                                 |
| 192.168.1.5 | 3306 | 1472827445054015 | 98                   | NULL                                                 | <-- donor is seen as up
+-------------+------+------------------+----------------------+------------------------------------------------------+

 

 

 

As you can see all seems ok also if it is not :)

Let us turn on the galera_check and see what happens.

Run some load in read and write.

And now let me do the stop-delete-restart-SST process again

 

kill -9 <mysqld_safe_pid> <mysqld_pid>; rm -fr data/*;rm -fr logs/*;sleep 2;./start

 

 

A soon the node goes down ProxySQL Shun the node.

 

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE  | 7        | 3        | 34     | 0       | 21570   | 1733833         | 0               | 146        |
| 500       | 192.168.1.6 | 3306     | ONLINE  | 1        | 8        | 12     | 0       | 9294    | 747063          | 0               | 129        |
| 500       | 192.168.1.7 | 3306     | ONLINE  | 1        | 0        | 4      | 0       | 3396    | 272950          | 0               | 89         |
| 500       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 12      | 966             | 0               | 326        | <-- crashed
| 500       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 2      | 0       | 246     | 19767           | 0               | 286        |
| 501       | 192.168.1.5 | 3306     | ONLINE  | 0        | 1        | 2      | 0       | 772203  | 37617973        | 2315131214      | 146        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 9        | 3        | 12     | 0       | 3439458 | 167514166       | 10138636314     | 129        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 12       | 13     | 0       | 3183822 | 155064971       | 9394612877      | 89         |
| 501       | 192.168.1.8 | 3306     | SHUNNED | 0        | 0        | 1      | 6       | 11429   | 560352          | 35350726        | 326        | <-- crashed
| 501       | 192.168.1.9 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 312253  | 15227786        | 941110520       | 286        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

 

Immediately after the glera_check identify the node is requesting the SST and that the DONOR is our writer, given is NOT the only writer in the HG, and given it has the variable wsrep_sst_donor_rejects_queries active, it cannot be set OFFLINE_SOFT, and we do not want to have ProxySQL consider it OFFLINE_HARD (because it is not).
As such the script will move it to a special HG:

 

2016/09/04 16:11:22.091:[WARN] Move node:192.168.1.5;3306;500;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9500 WHERE hostgroup_id=500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:11:22.097:[WARN] Move node:192.168.1.5;3306;501;3001 SQL: UPDATE mysql_servers SET hostgroup_id=9501 WHERE hostgroup_id=501 AND hostname='192.168.1.5' AND port='3306'
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| hostgroup_id | hostname    | port | STATUS | weight     | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment                                          |
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000    | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.7 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 500          | 192.168.1.8 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 500          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              |                                                  |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; |
| 9500         | 192.168.1.5 | 3306 | ONLINE | 1000000000 | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
| 9501         | 192.168.1.5 | 3306 | ONLINE | 100        | 0           | 1000            | 0                   | 0       | 0              | 500_W_501_R_retry_up=0;500_W_501_R_retry_down=0; | <-- Special HG
+--------------+-------------+------+--------+------------+-------------+-----------------+---------------------+---------+----------------+--------------------------------------------------+

 

 

In this way the Donor will continue to serve the Joiner, but applications will not see it.


What is also very important is what the applications will see.

The Applications doing WRITEs will see:

 

[ 10s] threads: 10, tps: 9.50, reads: 94.50, writes: 42.00, response time: 1175.77ms (95%), errors: 0.00, reconnects: 0.00
...
[ 40s] threads: 10, tps: 2.80, reads: 26.10, writes: 11.60, response time: 3491.45ms (95%), errors: 0.00, reconnects: 0.10
[ 50s] threads: 10, tps: 4.80, reads: 50.40, writes: 22.40, response time: 10062.13ms (95%), errors: 0.80, reconnects: 351.60 <--- Main writer moved to another HG
[ 60s] threads: 10, tps: 5.90, reads: 53.10, writes: 23.60, response time: 2869.82ms (95%), errors: 0.00, reconnects: 0.00
...

 

 

 

At the moment of the shift from one node to another the applications will have to manage the RE-TRY, but it will be a very short moment that will cause limited impact on the production flow.

Application readers will see no errors:

 

[ 10s] threads: 10, tps: 0.00, reads: 13007.31, writes: 0.00, response time: 9.13ms (95%), errors: 0.00, reconnects: 0.00
[ 50s] threads: 10, tps: 0.00, reads: 9613.70, writes: 0.00, response time: 10.66ms (95%), errors: 0.00, reconnects: 0.20 <-- just a glitch in reconnect
[ 60s] threads: 10, tps: 0.00, reads: 10807.90, writes: 0.00, response time: 11.07ms (95%), errors: 0.00, reconnects: 0.20
[ 70s] threads: 10, tps: 0.00, reads: 9082.61, writes: 0.00, response time: 23.62ms (95%), errors: 0.00, reconnects: 0.00
...
[ 390s] threads: 10, tps: 0.00, reads: 13050.80, writes: 0.00, response time: 8.97ms (95%), errors: 0.00, reconnects: 0.00

 

 

 

When the Donor had end to provide SST it comes back and the script manage it, Glara_check will put it in the right HG:

 

2016/09/04 16:12:34.266:[WARN] Move node:192.168.1.5;3306;9500;1010 SQL: UPDATE mysql_servers SET hostgroup_id=500 WHERE hostgroup_id=9500 AND hostname='192.168.1.5' AND port='3306'
2016/09/04 16:12:34.270:[WARN] Move node:192.168.1.5;3306;9501;1010 SQL: UPDATE mysql_servers SET hostgroup_id=501 WHERE hostgroup_id=9501 AND hostname='192.168.1.5' AND port='3306'

 

 

 

The crashed node, will be re-start by the SST process, as such the node will be up.

But if the level of load in the cluster is mid/high it will remain in JOINED state for sometime, becoming visible by the ProxySQL again, while ProxySQL will not correctly recognize the state.

 

 

2016-09-04 16:17:15 21035 [Note] WSREP: 3.2 (node4): State transfer from 1.1 (node1) complete.2016-09-04 16:17:15 21035 [Note] WSREP: Shifting JOINER -> JOINED (TO: 254515)

 

 

To avoid issue the script will move it to special HG, allowing it to recovery without interfering with real load.

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 2        | 15     | 0       | 3000    | 241060          | 0               | 141        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 13     | 0       | 13128   | 1055268         | 0               | 84         |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 4      | 0       | 3756    | 301874          | 0               | 106        |
| 500       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 4080    | 327872          | 0               | 278        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 256753  | 12508935        | 772048259       | 141        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 5116844 | 249191524       | 15100617833     | 84         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 11       | 13     | 0       | 4739756 | 230863200       | 13997231724     | 106        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 496524  | 24214563        | 1496482104      | 278        |
| 9500      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
| 9501      | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 331        |<-- Joined not Sync
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
Once Node fully recover, galera_check put it back IN the original HG, ready serve requests:
 
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 15     | 0       | 3444    | 276758          | 0               | 130        |
| 500       | 192.168.1.6 | 3306     | ONLINE | 0        | 9        | 13     | 0       | 13200   | 1061056         | 0               | 158        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 0        | 0        | 4      | 0       | 3828    | 307662          | 0               | 139        |
| 500       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 500       | 192.168.1.9 | 3306     | ONLINE | 0        | 0        | 2      | 0       | 4086    | 328355          | 0               | 336        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 2      | 0       | 286349  | 13951366        | 861638962       | 130        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 5239212 | 255148806       | 15460951262     | 158        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 13       | 13     | 0       | 4849970 | 236234446       | 14323937975     | 139        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<-- up again
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 507910  | 24768898        | 1530841172      | 336        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

Summarizing the logical steps are:

 

                +---------+
                |  Crash  |
                +----+----+
                     |
                     v
            +--------+-------+
            |  ProxySQL      |
            |  shun crashed  |
            |      node      |
            +--------+-------+
                     |
                     |
                     v
   +-----------------+-----------------+
   |  Donor has one of the following?  |
   |  wsrep_sst_dono _rejects_queries  |
   |  OR                               |
   |  wsrep_reject_queries             |
   +-----------------------------------+
      |No                            |Yes
      v                              v
+-----+----------+       +-----------+----+
| Galera_check   |       | Galera_check   |
| put the donor  |       | put the donor  |
| in OFFLINE_SOFT|       | in special HG  |
+---+------------+       +-----------+----+
    |                                |
    |                                |
    v                                v
+---+--------------------------------+-----+
|            Donor SST ends                |
+---+---------------+----------------+-----+
    |               |                |
    |               |                |
+---+------------+  |    +-----------+----+
| Galera_check   |  |    | Galera_check   |
| put the donor  |  |    | put the donor  |
| ONLINE         |  |    | in Original HG |
+----------------+  |    +----------------+
                    |                
                    |                 
+------------------------------------------+
|           crashed SST ends               |
+-------------------+----------------------+
                    |
                    |
       +------------+-------------+
       |  Crashed node back but   +<------------+
       |  Not Sync?               |             |
       +--------------------------+             |
          |No                   |Yes            |
          |                     |               |
          |                     |               |
+---------+------+       +------+---------+     |
| Galera_check   |       | Galera_check   |     |
| put the node   |       | put the node   +-----+
| back orig. HG  |       | Special HG     |
+--------+-------+       +----------------+
         |
         |
         |
         |      +---------+
         +------>   END   |
                +---------+
 

 

 

As mention in this integration with galera_check we can manage several node states.


Another case is when we need to have the node not accepting ANY query.

We may need that for several reasons, including preparing the node for maintenance or whatever.


In PXC and other Galera implementation we can set the value of wsrep_reject_queries to:

Valid Values
NONE

ALL

ALL_KILL

Let see how it works:

run some load then on the main writer node (192.168.1.5)

 

SET global wsrep_reject_queries=ALL;

 

 

This will block any new queries to be executed while the running will be completed.

Do a simple select on the node :

 

(root@localhost:pm) [test]>select * FROM tbtest1;
ERROR 1047 (08S01): WSREP has NOT yet prepared node FOR application USE

 

 

 

Point is , as you should have understand by now, that ProxySQL do not see these conditions:

 

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005467628001 | 35                   | NULL       | <--- ping ok
| 192.168.1.5 | 3306 | 1473005437628014 | 154                  | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005467369575 | 246                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005407369441 | 353                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+

 

 

The script Galera check will instead manage it:

 

+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 8343   | 0       | 10821   | 240870          | 0               | 93         | <--- galera check put it OFFLINE
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 15     | 0       | 48012   | 3859402         | 0               | 38         | <--- writer
| 500       | 192.168.1.7 | 3306     | ONLINE       | 0        | 1        | 6      | 0       | 14712   | 1182364         | 0               | 54         |
| 500       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 2      | 0       | 1092    | 87758           | 0               | 602        |
| 500       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 4      | 0       | 5352    | 430152          | 0               | 238        |
| 501       | 192.168.1.5 | 3306     | OFFLINE_SOFT | 0        | 0        | 1410   | 0       | 197909  | 9638665         | 597013919       | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE       | 2        | 10       | 12     | 0       | 7822682 | 380980455       | 23208091727     | 38         |
| 501       | 192.168.1.7 | 3306     | ONLINE       | 0        | 13       | 13     | 0       | 7267507 | 353962618       | 21577881545     | 54         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 241641  | 11779770        | 738145270       | 602        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 1        | 0        | 1      | 0       | 756415  | 36880233        | 2290165636      | 238        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

In this case the script will put the node in OFFLINE_SOFT, given the "set global wsrep_reject_queries=ALL" means do not accept NEW complete the existing, as OFFLINE_SOFT.

The script manage also in the case of "set global wsrep_reject_queries=ALL_KILL;" .

Which from ProxySQL point of view do not exists as well:

 

+-------------+------+------------------+----------------------+------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error |
+-------------+------+------------------+----------------------+------------+
| 192.168.1.5 | 3306 | 1473005827629069 | 59                   | NULL       |<--- ping ok
| 192.168.1.5 | 3306 | 1473005797628988 | 57                   | NULL       |
+-------------+------+------------------+----------------------+------------+
+-------------+------+------------------+-------------------------+---------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error |
+-------------+------+------------------+-------------------------+---------------+
| 192.168.1.5 | 3306 | 1473005827370084 | 370                     | NULL          | <--- connect ok
| 192.168.1.5 | 3306 | 1473005767369915 | 243                     | NULL          |
+-------------+------+------------------+-------------------------+---------------+
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |<--- galera check put it in special HG
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0       | 0               | 0               | 0          |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
 

 

 

The difference here is that the script moves the node to the special HG to isolate it, instead let it be in the original HG.


As you can see the integration between ProxySQL and PXC(galera) in the case of multi-writer, works perfectly, if you have a script as galera_check that manage correctly the different PXC/Galera states.

 

ProxySQL and PXC using Replication HostGroup

Sometimes we may need to have the 100% of the write going to one node only a time.

As explain above ProxySQL use weight to redirect a % of the load to a specific node.

In most of the case it will be enough to set the weight in the main writer to a very high value, like 10 billions and on the next node to 1 thousands, to achieve an almost single writer.

But this is not 100%, it still allow ProxySQL to send a query once every X to the other node(s).

The solution to this and to keep it consistent with the ProxySQL logic, is to use replication Hostgroups.

Replication HG are special HG that Proxy see as connected for R/W operations.

ProxySQL analyze the value of the READ_ONLY variables and assign to the READ_ONLY HG the nodes that have it enable.

While the node having READ_ONLY=0, reside in both HG.

 

As such the first thing we need to modify is to say to ProxySQL that our two HG 500 and 501 are replication HG.

 

INSERT INTO mysql_replication_hostgroups VALUES (500,501,'');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK; 
SELECT * FROM mysql_replication_hostgroups ;
+------------------+------------------+---------+
| writer_hostgroup | reader_hostgroup | comment |
+------------------+------------------+---------+
| 500              | 501              |         |
+------------------+------------------+---------+

 

 

Now whenever I will set the value of READ_ONLY on a node ProxySQL will move the node accordingly.

Let see how.

Current:

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 6        | 1        | 7      | 0       | 16386    | 1317177         | 0               | 97         |
| 500       | 192.168.1.6 | 3306     | ONLINE | 1        | 9        | 15     | 0       | 73764    | 5929366         | 0               | 181        |
| 500       | 192.168.1.7 | 3306     | ONLINE | 1        | 0        | 6      | 0       | 18012    | 1447598         | 0               | 64         |
| 500       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 2      | 0       | 1440     | 115728          | 0               | 341        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1210029  | 58927817        | 3706882671      | 97         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 1        | 11       | 12     | 0       | 16390790 | 798382865       | 49037691590     | 181        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 15357779 | 748038558       | 45950863867     | 64         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1247662  | 60752227        | 3808131279      | 341        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1766309  | 86046839        | 5374169120      | 422        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

Set global READ_ONLY=1;

on the following nodes 192.168.1.6/7/8/9

After:

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 25980    | 2088346         | 0               | 93         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1787979  | 87010074        | 5473781192      | 93         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 18815907 | 916547402       | 56379724890     | 79         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 1        | 12       | 13     | 0       | 17580636 | 856336023       | 52670114510     | 131        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 15324    | 746109          | 46760779        | 822        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 16210    | 789999          | 49940867        | 679        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

In this scenario, IF a reader node crash, the application will not suffer at all given the redundancy.


But if the writer is going to crash THEN the issue exists, because there will be NO node available to manage the failover.

The solution is either do the node election manually, or to have the script elect the node with the lowest read weight in the same segment as new writer.

 

The one below is what is going to happen when a node crash (bird-eye view):

 

                         +---------+
                         |  Crash  |
                         +----+----+
                              |
                              v
                     +--------+-------+
                     |  ProxySQL      |
                     |  shun crashed  |
                     |      node      |
                     +--------+-------+
                              |
                              |
                              v
            +-----------------+-----------------+
+----------->   HostGroup has another active    |
|           |   Node in HG writer?              |
|           +--+--------------+---------------+-+
|              |              |               |
|              |              |               |
|              |No            |               |Yes
|              |              |               |
|        +-----v----------+   |   +-----------v----+
|        |ProxySQL will   |   |   |ProxySQL will   |
|        |stop serving    |   |   |redirect load   >--------+
|        |writes          |   |   |there           |        |
|        +----------------+   |   +----------------+        |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |ProxySQL checks |                    |
|                     |READ_ONLY on    |                    |
|                     |Reader HG       |                    |
|                     |                |                    |
|                     +-------+--------+                    |
|                             |                             |
|                             v                             |
|                     +-------+--------+                    |
|                     |Any Node with   |                    |
|                     |READ_ONLY = 0 ? |                    |
|                     +----------------+                    |
|                      |No            |Yes                  |
|                      |              |                     |
|           +----------v------+    +--v--------------+      |
|           |ProxySQL will    |    |ProxySQL will    |      |
|           |continue to      |    |Move node to     |      |
+<---------<+do not serve     |    |Writer HG        |      |
|           |Writes           |    |                 |      |
|           +-----------------+    +--------v--------+      |
|                                           |               |
+-------------------------------------------+               |
                         +---------+                        |
                         |   END   <------------------------+
                         +---------+

 

 

The script should act in the step immediately after ProxySQL SHUNNED the node, just replacing the READ_ONLY=1 with READ_ONLY=0, on the reader node with the lowest READ WEIGHT.


ProxySQL will do the rest, copying the Node into the WRITER HG, keeping low weight, such that WHEN/IF the original node will comeback, the new node will not compete for traffic.

I had included that special function in the check, the feature will allow automatic fail-over.

This experimental feature is active only if explicitly set in the parameter that the scheduler will pass to the script.

To activate it just add --active_failover list of arguments that is pass over the script in the scheduler.

 

My recommendation is to have two entries in the scheduler and activate the one with --active_failover for test, remember to deactivate the other.

 

Let see the manual procedure first

Process will be:

1 Generate some load

2 Kill the writer node

3 Manually elect a reader as writer

4 Recover crashed node

 

Current load:

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 30324   | 2437438         | 0               | 153        |
| 501       | 192.168.1.5 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 1519612 | 74006447        | 4734427711      | 153        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 8        | 12     | 0       | 7730857 | 376505014       | 24119645457     | 156        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 2        | 10       | 12     | 0       | 7038332 | 342888697       | 21985442619     | 178        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 612523  | 29835858        | 1903693835      | 337        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 611021  | 29769497        | 1903180139      | 366        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

Kill the main node 192.168.1.5

 

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 1565987 | 76267703        | 4879938857      | 119        |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8023216 | 390742215       | 25033271548     | 112        |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 7306838 | 355968373       | 22827016386     | 135        |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 638326  | 31096065        | 1984732176      | 410        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 636857  | 31025014        | 1982213114      | 328        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
+-------------+------+------------------+----------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | ping_success_time_us | ping_error                                           |
+-------------+------+------------------+----------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640798571 | 0                    | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070610798464 | 0                    | Can't connect TO MySQL server ON '192.168.1.5' (107) |
+-------------+------+------------------+----------------------+------------------------------------------------------+
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| hostname    | port | time_start_us    | connect_success_time_us | connect_error                                        |
+-------------+------+------------------+-------------------------+------------------------------------------------------+
| 192.168.1.5 | 3306 | 1473070640779903 | 0                       | Can't connect to MySQL server on '192.168.1.5' (107) |
| 192.168.1.5 | 3306 | 1473070580779977 | 0                       | Can't connect TO MySQL server ON '192.168.1.5' (107) |
+-------------+------+------------------+-------------------------+------------------------------------------------------+

 

 

 

When the node is killed ProxySQL Shun it and also report issues with the checks (connect and ping)

During this time frame the Application will experience issues, and if is not designed to manage the retry, and eventually a queue, it will crash.


Sysbench report the errors:

 

Writes
 
[  10s] threads: 10, tps: 6.70, reads: 68.50, writes: 30.00, response time: 1950.53ms (95%), errors: 0.00, reconnects:  0.00
...
[1090s] threads: 10, tps: 4.10, reads: 36.90, writes: 16.40, response time: 2226.45ms (95%), errors: 0.00, reconnects:  1.00  <-+ killing the node
[1100s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1110s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1120s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1130s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |-- Gap waiting for a node to become
[1140s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |   READ_ONLY=0
[1150s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1160s] threads: 10, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 1.00, reconnects:  0.00         |
[1170s] threads: 10, tps: 4.70, reads: 51.30, writes: 22.80, response time: 80430.18ms (95%), errors: 0.00, reconnects:  0.00 <-+
[1180s] threads: 10, tps: 8.90, reads: 80.10, writes: 35.60, response time: 2068.39ms (95%), errors: 0.00, reconnects:  0.00
...
 [1750s] threads: 10, tps: 5.50, reads: 49.80, writes: 22.80, response time: 2266.80ms (95%), errors: 0.00, reconnects:  0.00 -- No additional errors

 

 

 

I decided to promote node 192.168.1.6 given in this setup the weight for readers was equal and as such no difference.

 

(root@localhost:pm) [(none)]>set global read_only=0;
Query OK, 0 rows affected (0.00 sec)

 

 

 

Checking proxySQL:

 

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 1848    | 148532          | 0               | 40         |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 72      | 1565987 | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 2        | 10       | 12     | 0       | 8843069 | 430654903       | 27597990684     | 40         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 1        | 11       | 12     | 0       | 8048826 | 392101994       | 25145582384     | 83         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 725820  | 35371512        | 2259974847      | 227        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 723582  | 35265066        | 2254824754      | 290        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

As the READ_ONLY value is modified, ProxySQL move it to the writer Hostgroup and writes can take place again.

At this point in time production activities are recovered.

 

Reads had just a minor glitch:

 

Reads
[  10s] threads: 10, tps: 0.00, reads: 20192.15, writes: 0.00, response time: 6.96ms (95%), errors: 0.00, reconnects:  0.00
...
[ 410s] threads: 10, tps: 0.00, reads: 16489.03, writes: 0.00, response time: 9.41ms (95%), errors: 0.00, reconnects:  2.50
...
[ 710s] threads: 10, tps: 0.00, reads: 18789.40, writes: 0.00, response time: 6.61ms (95%), errors: 0.00, reconnects:  0.00

 

 

 

 

when node 192.168.1.6 was copied over to HG 500, but no interruptions or errors.

At this point let us put back the crashed node, which coming back elect Node2 (192.168.1.6) as donor.

This was a PXC/Galera choice and we have to accept and manage it.


Note that the other basic scripts ( will put the node in OFFLINE_SOFT, given the node will become a DONOR).

Galera_check will recognize that Node2 (192.168.1.6) is the only active node in the segment for that specific HG (writer), while is not the only present for the READER HG.

As such it will put the node in OFFLINE_SOFT only for the READER HG, trying to reduce load on the node, but it will keep it active in the WRITER HG, to prevent service interruption.

 

Node restart and ask for a donor:

 

2016-09-05 12:21:43 8007 [Note] WSREP: Flow-control interval: [67, 67]
2016-09-05 12:21:45 8007 [Note] WSREP: Member 1.1 (node1) requested state transfer from '*any*'. Selected 0.1 (node2)(SYNCED) as donor.
2016-09-05 12:21:46 8007 [Note] WSREP: (ef248c1f, 'tcp://192.168.1.8:4567') turning message relay requesting off
2016-09-05 12:21:52 8007 [Note] WSREP: New cluster view: global state: 234bb6ed-527d-11e6-9971-e794f632b140:324329, view# 7: Primary, number of nodes: 5, my index: 3, protocol version 3

 

 

 

Galera_check will set OFFLINE_SOFT  192.168.1.6 only for the READER HG, and ProxySQL will use the others to serve reads.

 

+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS       | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE       | 10       | 0        | 10     | 0       | 7746     | 622557          | 0               | 86         |
| 501       | 192.168.1.5 | 3306     | ONLINE       | 0        | 0        | 1      | 147     | 1565987  | 76267703        | 4879938857      | 38         |
| 501       | 192.168.1.6 | 3306     | OFFLINE_SOFT | 0        | 0        | 12     | 0       | 9668944  | 470878452       | 30181474498     | 86         | <-- Node offline
| 501       | 192.168.1.7 | 3306     | ONLINE       | 9        | 3        | 12     | 0       | 10932794 | 532558667       | 34170366564     | 62         |
| 501       | 192.168.1.8 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 816599   | 39804966        | 2545765089      | 229        |
| 501       | 192.168.1.9 | 3306     | ONLINE       | 0        | 1        | 1      | 0       | 814893   | 39724481        | 2541760230      | 248        |
+-----------+-------------+----------+--------------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

When SST donor task is over and Galera_check moves the 192.168.1.6 back ONLINE as expected.

But at the same time, it moves the recovering node to the special HG to avoid to have it included in any activity until ready.

 

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer FROM 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 1554     | 124909          | 0               | 35         |
| 501       | 192.168.1.6 | 3306     | ONLINE | 2        | 8        | 22     | 0       | 10341612 | 503637989       | 32286072739     | 35         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 9        | 12     | 0       | 12058701 | 587388598       | 37696717375     | 13         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 890102   | 43389051        | 2776691164      | 355        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 887994   | 43296865        | 2772702537      | 250        |
| 9500      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
| 9501      | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 57         | <-- Special HG for recover
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+

 

 

 

Once finally the node is in SYNC with the group it is put back online in the READER HG and in the writer HG:

 

2016-09-05 12:22:36 27352 [Note] WSREP: 1.1 (node1): State transfer FROM 0.1 (node2) complete.
2016-09-05 12:22:36 27352 [Note] WSREP: Shifting JOINER -> JOINED (TO: 325062)
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries  | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0        | 0               | 0               | 0          | <-- Back on line
| 500       | 192.168.1.6 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 402      | 32317           | 0               | 68         |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 6285     | 305823          | 19592814        | 312        | <-- Back on line
| 501       | 192.168.1.6 | 3306     | ONLINE | 4        | 6        | 22     | 0       | 10818694 | 526870710       | 33779586475     | 68         |
| 501       | 192.168.1.7 | 3306     | ONLINE | 0        | 12       | 12     | 0       | 12492316 | 608504039       | 39056093665     | 26         |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 942023   | 45924082        | 2940228050      | 617        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 939975   | 45834039        | 2935816783      | 309        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+----------+-----------------+-----------------+------------+
+--------------+-------------+------+--------+------------+
| hostgroup_id | hostname    | port | STATUS | weight     |
+--------------+-------------+------+--------+------------+
| 500          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 500          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.5 | 3306 | ONLINE | 100        |
| 501          | 192.168.1.6 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.7 | 3306 | ONLINE | 1000000000 |
| 501          | 192.168.1.8 | 3306 | ONLINE | 1          |
| 501          | 192.168.1.9 | 3306 | ONLINE | 1          |
+--------------+-------------+------+--------+------------+

 

 

But given is coming back with its READER WEIGHT, it will NOT compete with the previously elected WRITER.

The recovered node will stay on "hold" waiting for a DBA to act and eventually put it back, or be set as READ_ONLY and as such be fully removed from the WRITER HG.

 

 

Let see the Automatic procedure now

As such let for the moment just stay stick to the MANUAL failover process.

Process will be:

1 Generate some load

2 Kill the writer node

3 Script will do auto-failover

4 Recover crashed node

Check our scheduler config:

 

+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| id | active | interval_ms | filename | arg1 | arg2 | arg3 | arg4 | arg5 | comment |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+
| 10 | 1 | 2000 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --active_failover --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | | <--- Active
| 20 | 0 | 1500 | /var/lib/proxysql/galera_check.pl | -u=admin -p=admin -h=192.168.1.50 -H=500:W,501:R -P=3310 --execution_time=1 --retry_down=2 --retry_up=1 --main_segment=1 --debug=0 --log=/var/lib/proxysql/galeraLog | NULL | NULL | NULL | NULL | |
+----+--------+-------------+-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+------+------+------+---------+

 

 

Active is the one with auto-failover

Start load and check Current load:

 

+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.5 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 952     | 76461           | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 53137   | 2587784         | 165811100       | 167        |
| 501       | 192.168.1.6 | 3306     | ONLINE | 5        | 5        | 11     | 0       | 283496  | 13815077        | 891230826       | 109        |
| 501       | 192.168.1.7 | 3306     | ONLINE | 3        | 7        | 10     | 0       | 503516  | 24519457        | 1576198138      | 151        |
| 501       | 192.168.1.8 | 3306     | ONLINE | 1        | 0        | 1      | 0       | 21952   | 1068972         | 68554796        | 300        |
| 501       | 192.168.1.9 | 3306     | ONLINE | 0        | 1        | 1      | 0       | 21314   | 1038593         | 67043935        | 289        |
+-----------+-------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

Kill the main node 192.168.1.5

 

+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host    | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 500       | 192.168.1.6 | 3306     | ONLINE  | 10       | 0        | 10     | 0       | 60      | 4826            | 0               | 0          |
| 501       | 192.168.1.5 | 3306     | SHUNNED | 0        | 0        | 1      | 11      | 177099  | 8626778         | 552221651       | 30         |
| 501       | 192.168.1.6 | 3306     | ONLINE  | 3        | 7        | 11     | 0       | 956724  | 46601110        | 3002941482      | 49         |
| 501       | 192.168.1.7 | 3306     | ONLINE  | 2        | 8        | 10     | 0       | 1115685 | 54342756        | 3497575125      | 42         |
| 501       | 192.168.1.8 | 3306     | ONLINE  | 0        | 1        | 1      | 0       | 76289   | 3721419         | 240157393       | 308        |
| 501       | 192.168.1.9 | 3306     | ONLINE  | 1        | 0        | 1      | 0       | 75803   | 3686067         | 236382784       | 231        |
+-----------+-------------+----------+---------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

When the node is killed the node is SHUNNED, but this time the script had already set the new node 192.168.1.6 ONLINE

See script log

 

2016/09/08 14:04:02.494:[INFO] END EXECUTION Total Time:102.347850799561
2016/09/08 14:04:04.478:[INFO] This Node Try to become a WRITER set READ_ONLY to 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] This Node NOW HAS READ_ONLY = 0 192.168.1.6:3306:HG501
2016/09/08 14:04:04.479:[INFO] END EXECUTION Total Time:71.8140602111816

 

 

 

More important the application experience

 

Writes
 
[  10s] threads: 10, tps: 9.40, reads: 93.60, writes: 41.60, response time: 1317.41ms (95%), errors: 0.00, reconnects:  0.00
[  20s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1350.96ms (95%), errors: 0.00, reconnects:  0.00
[  30s] threads: 10, tps: 8.30, reads: 74.70, writes: 33.20, response time: 1317.81ms (95%), errors: 0.00, reconnects:  0.00
[  40s] threads: 10, tps: 7.80, reads: 70.20, writes: 31.20, response time: 1407.51ms (95%), errors: 0.00, reconnects:  0.00
[  50s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 2259.35ms (95%), errors: 0.00, reconnects:  0.00
[  60s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 3275.78ms (95%), errors: 0.00, reconnects:  0.00
[  70s] threads: 10, tps: 5.70, reads: 60.30, writes: 26.80, response time: 1492.56ms (95%), errors: 0.00, reconnects:  1.00 <-- just a reconnect experience
[  80s] threads: 10, tps: 6.70, reads: 60.30, writes: 26.80, response time: 7959.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 10, tps: 6.60, reads: 59.40, writes: 26.40, response time: 2109.03ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 10, tps: 6.40, reads: 57.60, writes: 25.60, response time: 1883.96ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 10, tps: 5.60, reads: 50.40, writes: 22.40, response time: 2167.27ms (95%), errors: 0.00, reconnects:  0.00

 

 

No errors no huge delay, our application (managing reconnect) had only glitch, and had to reconnect.

Read had no errors or reconnects.

The connection errors were managed by ProxySQL and given it found 5 in 1sec it SHUNNED the node.

The galera_script was able to promote a reader, and given it is a failover no delay with retry loop.

The whole thing is done in such brief time that application barely see it.

 

Obviously an application with thousands of connections/sec will experience more impact, but the time-window will be very narrow, and that was our scope.

Once the failed node is ready to come back, either we choose to start it with READ_ONLY=1, as such it will come back as reader.

Or we will keep it as it is and it will come back as writer.

 

No matter what the script will manage the case as it had done in the previous (manual) exercise.


Conclusions

As shown ProxySQL and galera_check working together are quite efficient in managing the cluster and its different scenario.

When using the Single-Writer mode, solving the manual part of the failover dramatically improve the efficiency in performing the recovery of the production state, going from few minutes to seconds or less.

The Multiwriter mode remain the preferred and most recommended way to use ProxySQL/PXC given it will perform failover without the need of additional scripts or extension, also if a script is still required to manage the integration with ProxySQL.

In both cases the use of a script able to identify the multiple state of PXC and the mutable node scenario, is a crucial part of the implementation without which ProxySQL may not behave correctly.

 

 

 

 

 

 

 

 

Last Updated on Tuesday, 13 September 2016 11:49
 
ProxySQL and MHA integration PDF Print E-mail
Written by Marco Tusa   
Sunday, 11 September 2016 15:32

ProxySQL and MHA integration
MHA (Master High Availability Manager and tools for MySQL), is almost fully integrated with the ProxySQL process.


What it means is that you can count on the MHA standard feature to manage the failover, and on ProxySQL to manage the traffic and shift from one server.

This is one of the main difference between using MHA and VIP vs MHA and ProxySQL.

There is no need to move IPs or re-define DNS.

The following is an example of configuration file for MHA in case you use it with ProxySQL:

 

[server default]
    user=mha
    password=mha
    ssh_user=root
    repl_password=replica
    manager_log=/tmp/mha.log
    manager_workdir=/tmp
    remote_workdir=/tmp
    master_binlog_dir=/opt/mysql_instances/mha1/logs
    client_bindir=/opt/mysql_templates/mysql-57/bin
    client_libdir=/opt/mysql_templates/mysql-57/lib
    master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover
    master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change
    log_level=debug
 
    [server1]
    hostname=mha1r
    ip=192.168.1.104
    candidate_master=1
 
    [server2]
    hostname=mha2r
    ip=192.168.1.107
    candidate_master=1
 
    [server3]
    hostname=mha3r
    ip=192.168.1.111
    candidate_master=1
 
    [server4]
    hostname=mha4r
    ip=192.168.1.109
    no_master=1

 

The only thing you need to be sure is to comment out the "FIX ME " lines in the sample/scripts.

In the MHA installation direcotry look for :

mha4mysql-manager/samples/scripts/master_ip_failover
mha4mysql-manager/samples/scripts/master_ip_online_change

 

After that just install MHA as you are used to.
In ProxySQL be sure to have mha users and the servers set.

One very important thing when using ProxySQL with standard replication is that we need to set additional privileges to the ProxySQL monitor user.
It must have "Replication Client" set or it will fail to check the SLAVE LAG.

Also the Servers MUST have a defined value for the attribute max_replication_lag, or the check will be ignore.

 

As a reminder:

 

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10);
INSERT INTO mysql_replication_hostgroups VALUES (600,601);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('mha_W',600,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('mha_R',601,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) VALUES('mha_RW',600,1,3,'^SELECT.*FOR UPDATE');
INSERT INTO mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) VALUES('mha_RW',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('mha_W','test',1,600,'test_mha');
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('mha_R','test',1,601,'test_mha');
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('mha_RW','test',1,600,'test_mha');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

 

 

Ok all is ready let us rock'n'roll.


Controlled fail-over.First of all the masterha_manager should not be running or you will get error.

Then let us start some traffic.

 

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all  run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10  --mysql-ignore-errors=all run
 

 

 

Let it run for a bit then check:

 

mysql> SELECT * FROM stats_mysql_connection_pool WHERE hostgroup BETWEEN 600 AND 601 ORDER BY hostgroup,srv_host DESC;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.104 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 551256  | 44307633        | 0               | 285        | <--- current Master
| 601       | 192.168.1.111 | 3306     | ONLINE | 5        | 3        | 11     | 0       | 1053685 | 52798199        | 4245883580      | 1133       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 3        | 5        | 10     | 0       | 1006880 | 50473746        | 4052079567      | 369        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1040524 | 52102581        | 4178965796      | 604        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 7        | 1        | 16     | 0       | 987548  | 49458526        | 3954722258      | 285        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

And now let us try to perform the failover.
To do we will instruct MHA to do a switch, and to set the OLD master as new slave:

 

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

 

 

Now let us check what happened:

 

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects:  0.00 <--- moment of the switch
[ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects:  0.00
[ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects:  1.00
[ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects:  0.00

 

 

Check ProxySQL :

 

mysql> SELECT * FROM stats_mysql_connection_pool WHERE hostgroup BETWEEN 600 AND 601 ORDER BY hostgroup,srv_host DESC;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.107 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 123457  | 9922280         | 0               | 658        | <--- new master
| 601       | 192.168.1.111 | 3306     | ONLINE | 2        | 6        | 14     | 0       | 1848302 | 91513537        | 7590137770      | 1044       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 5        | 3        | 12     | 0       | 1688789 | 83717258        | 6927354689      | 220        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1834415 | 90789405        | 7524861792      | 658        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 6        | 2        | 24     | 0       | 1667252 | 82509124        | 6789724589      | 265        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

 

 

In this case the servers were not behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue given the need to move around an VIP or change name resolution, have a very limited hiccup.

Read operations were not affected, at all.

Nice eh?

Do you know how long it takes to do a switch in this conditions?

real 0m2.710s yes 2.7 seconds.

 

This is another evidence that most of the time in case of switch with MHA is cause by the need to redirect traffic from A to B using the network.

 

Crash fail-over

What happened if instead have a nice switch we have to cover a real failover?

First of all let us start masterha_manager:

 

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

 

 

Then let us start some load again.

Finally go to the MySQL node that us master xxx.xxx.xxx.107

ps aux|grep mysql
mysql    18755  0.0  0.0 113248  1608 pts/0    S    Aug28   0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf
mysql    21975  3.2 30.4 4398248 941748 pts/0  Sl   Aug28  93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306

 

 

 

And kill the MySQL process.

 

kill -9 21975 18755

 

 

 

As before let us check what happened application side:

 

[  80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects:  0.00 <-- issue starts
[ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects:  0.00
[ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.40         <-- total stop in write
[ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects:  0.00 <-- writes restart
[ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects:  0.00
[ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects:  0.00

 

 

So it takes ~10 seconds to perform failover.

To better understand let see what happened on MHA-land:

 

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf..
... Read conf and start
Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock..
Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
... Wait for errors
Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time
Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s)..                                     <--- Finally MHA decide to do something 
Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker!
Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable!
Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable.
Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

 

 

MHA sees the server failing at xx:47, but because the retry and checks validation it actually fully acknowledge the downtime at xx:56.

As such ~8 seconds after.

To perform the whole failover it then takes only ~2 seconds (again).


No movable IP or dns involve and as such the operations were fast.

This is obviously true in case the servers have the binary-log there, different story may be if MHA has also to manage and push data from binarylog to MySQL.


As you can see ProxySQL may help a lot in reducing the timing also for this scenario, totally skipping the network related operations, that as we know, are the ones causing more trouble in these cases.


Last Updated on Tuesday, 13 September 2016 11:43
 
How to stop an offending query with ProxySQL PDF Print E-mail
Written by Marco Tusa   
Saturday, 20 August 2016 15:40


halt_manAll of us are very good in writing good queries, we know that ;) but sometimes a bad query may escape our control and hit (badly) our database.

There is also that probie, who just join the company and is writing all his code with SELECT * and no WHERE.

We have told him millions of time that doing this is bad, but he seems not listening.


Ah the other day we had another code injection that developers will take some time to fix, and that take them some time to isolate the part of the code sending the killing query to our database.

All the above are true stories, things that happen every day in at least few environments.


The main problem in that case is not to isolate the bad query, that is something that we can do very fast, but to identify the code that is generating it and disable that part of the code without killing the whole application.

That part can takes days.

ProxySQL allow us to act fast and stop any offending query in seconds.

 

 

I will show you how.


Let us say we have an offending query that does this:

SELECT * FROM history;

 

Where history is a table of 2 Tb partitioned by year in our DWH.

 

A query like that will certainly create some issue on the database, it is obviously bad design, and easy to identify.

Unfortunately it was inserted in the ETL process that use multi thread approach and autorecovery.

As such whenever you kill it, the process will restart it, and the developers will take some time to stop that.

In the meantime your reporting system serving your company real time is so slooow or down.

 

With ProxySQL you will stop that query in 1 second:

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, error_msg, apply) VALUES (89,1,'^SELECT \* from history$','Query not allowed',1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

Done, your DB will never receive that query again, and the application will get a message saying that the query is not allowed.

 

But it is possible to do things even better:

INSERT INTO mysql_query_rules (rule_id, active, match_digest, flagOUT, apply) VALUES (89,1,'^SELECT \* FROM history', 100, 0);
INSERT INTO mysql_query_rules (rule_id, active, flagIN, match_digest, destination_hostgroup, apply) VALUES (1001,1, 100, 'WHERE', 502, 1);
INSERT INTO mysql_query_rules (rule_id, active, flagIN, error_msg, apply) VALUES (1002,1, 100, 'Query not allowed', 1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

In this case ProxySQL will check for any query having SELECT * FROM history.

If the query has a WHERE clause then it will redirect to the server for execution.

If the query does not have a WHERE it will be stop and an error message sent to the application.


Conclusion

The one above is a very simple almost basic example of offending query.

But I am sure it makes clear how ProxySQL can come in help to any DBA to stop them quickly in case of emergency.

Giving the DBAs and the developers time to coordinate a better plan of action to permanently fix the issue.

 

References

https://github.com/sysown/proxysql
http://www.proxysql.com/2015/09/proxysql-tutorial-setup-in-mysql.html
https://github.com/sysown/proxysql/blob/v1.2.2/doc/configuration_howto.md
https://github.com/sysown/proxysql/blob/v1.2.2/INSTALL.md

Last Updated on Wednesday, 24 August 2016 08:17
 
Sharding with ProxySQL PDF Print E-mail
Written by Marco Tusa   
Saturday, 20 August 2016 10:58

mysql_proxysqlRecently a colleague of mine ask me to provide a simple example on how ProxySQL can perform sharding.
ProxySQL is a very powerful platform that allow us to manipulate and manage our connections and query in a simple but effective way.
In this article I will show you how.

 

Before starting is better to clarify some basic concepts.

ProxySQL organize its internal set of servers in Host Groups (HG), each HG can be associate to users and to Query Rules (QR).

Each QR can be final (apply = 1) or can let ProxySQL continue to parse other QR.

A QR can be a rewrite action, or can be a simple match, it can have a specific target HG, or be generic, finally QR are defined using regex.

You can see QR as a sequence of filters and transformation that you can arrange as you like.

 

These simple basic rules give us enormous flexibility, and allow us to create very simple actions, like a simple query re-write or very complex chains that could see dozens of QR concatenated.

Documentation can be found here

The information related to HG or QR is easily accessible using the the ProxySQL administrator interface, in the tables mysql_servers, mysql_query_rules and stats.stats_mysql_query_rules; the last one allow us to evaluate if and how the rule(s) is used.

 

About sharding, what ProxySQL can do for us to achieve what we need in a (relatively easy) way?

Some people/company include sharding logic in the application, and use multiple connection to reach the different targets, or have some logic to split the load across several schemas/tables.

ProxySQL allow us to simplify the way connectivity and query distribution is suppose to work reading data in the query or accepting HINTS.

No matter which kind of requirements the sharding exercise can be summarize in few different categories.

  •  By split the data inside the same container (like having a shard by State where each State is a schema)
  •  By physical data location (this can have multiple mysql servers in the same room, as well as having them geographically distributed)
  •  Combination of the two, where I do split by state using a server dedicated and again split by schema/table by whatever (say by gender)

In the following examples I will show how to use ProxySQL to cover the three different scenario defined above and a bit more.


The example below will report text from the Admin ProxySQL interface and from MySQL console.I will mark each one as follow:

  • Mc for MySQL console
  • Pa for ProxySQL Admin

Please note that mysql console MUST use the -c flag to pass the comments in the query. This because the default behaviour, in mysql console, is to remove the comments.

 

I am going to illustrate procedures that you can replicate on your laptop.

This because I want you to test directly the ProxySQL functionalities.

For the example describe below I have used PrxySQL v1.2.2 that is going to become the master in few days.

You can download it from :

  1. git clone https://github.com/sysown/proxysql.git
  2. git checkout v1.2.2
  3. Then to compile :
  4. cd <path to proxy source code>make
  5. make install

 

If you need full instructions on how to install and configure ProxySQL than read here and here

Finally you need to have the WORLD test db loaded, world test DB can be found here


First example/exercise is :

Shard inside the same MySQL Server using 3 different schemas split by continent.

Obviously you can have any number of shards and number of relative schemas.

What is relevant here is to demonstrate how traffic can be redirect to different targets (schemas) maintaining the same structure (tables).

This discriminating the target on the base of some relevant information in the Data or pass by the application.

Ok let us roll the ball.

Having :

[Mc]
+---------------+-------------+
| Continent     | count(Code) |
+---------------+-------------+
| Asia          |          51 | <--
| Europe        |          46 | <--
| North America |          37 | 
| Africa        |          58 | <-- 
| Oceania       |          28 |
| Antarctica    |           5 |
| South America |          14 |
+---------------+-------------+

 

For this exercise you can use single host or multiple servers in replica.

Summarizing you will need:

  • 3 hosts: 192.168.1.[5-6-7] (only one needed now but the others are for future use)
  • 3 schemas: Continent X + world schema
  • 1 user : user_shardRW
  • 3 hostgroups: 10, 20, 30 (for future use)

We will create the following Schemas Asia, Africa, Europe first.

[Mc]
CREATE schema [Asia|Europe|Africa];
CREATE TABLE Asia.City AS SELECT a.* FROM  world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Asia' ;
CREATE TABLE Europe.City AS SELECT a.* FROM  world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Europe' ;
CREATE TABLE Africa.City AS SELECT a.* FROM  world.City a JOIN Country ON a.CountryCode = Country.code WHERE Continent='Africa' ;
 
CREATE TABLE Asia.Country AS SELECT * FROM  world.Country WHERE Continent='Asia' ;
CREATE TABLE Europe.Country AS SELECT * FROM  world.Country WHERE Continent='Europe' ;
CREATE TABLE Africa.Country AS SELECT * FROM  world.Country  WHERE Continent='Africa' ;

 

Create the user:

[Mc]
GRANT ALL ON *.* TO user_shardRW@'%' IDENTIFIED BY 'test';

 

Now let us start to configure the ProxySQL:

[Pa]
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema) VALUES ('user_shardRW','test',1,10,'test_shard1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
 
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.5',10,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.6',20,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.7',30,3306,100);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

 

With this we have defined the User, the servers and the Host groups.

Let us start to define the logic with the query rules:

[Pa]
DELETE FROM mysql_query_rules WHERE rule_id > 30;
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (31,1,'user_shardRW',"^SELECT\s*(.*)\s*from\s*world.(\S*)\s(.*).*Continent='(\S*)'\s*(\s*.*)$","SELECT \1 from \4.\2 WHERE 1=1 \5",1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

I am now going to query the master (or a single node) but I am expecting ProxySQL to redirect the query to the right shard catching the value of the Continent.

[Mc]
 SELECT name,population FROM world.City  WHERE Continent='Europe' AND CountryCode='ITA' ORDER BY population DESC LIMIT 1;
+------+------------+
| name | population |
+------+------------+
| Roma |    2643581 |
+------+------------+

 

Well you can say ... "hey you are querying the schema world, of course you get back the correct data".

But this is not what had really happened, ProxySQL did not query the schema world but the schema Europe.

Let see the details:

[Pa]
SELECT * FROM stats_mysql_query_digest;
Original    :SELECT name,population FROM world.City  WHERE Continent='Europe' AND CountryCode='ITA' ORDER BY population DESC LIMIT 1;
Transformed :SELECT name,population FROM Europe.City WHERE ?=? AND CountryCode=? ORDER BY population DESC LIMIT ?

 

Let me explain what happened.
Rule 31 in ProxySQL will take all the FIELDS we will pass in the query, it will catch the CONTINENT in the where clause, it will take any condition after the WHERE and it will reorganize the query all using the RegEx.

 

Does this works for any table in the sharded schemas
Of course  it does.
A query like:

SELECT name,population FROM world.Country WHERE Continent='Asia' ;

 

Will be transformed into: 

SELECT name,population FROM Asia.Country WHERE ?=?

 

[Mc]
+----------------------+------------+
| name                 | population |
+----------------------+------------+
| Afghanistan          |   22720000 |
| United Arab Emirates |    2441000 |
| Armenia              |    3520000 |
<snip ...>
| Vietnam              |   79832000 |
| Yemen                |   18112000 |
+----------------------+------------+

 

Another possible a approach to instruct ProxySQL to shard is:

Pass a hint inside a comment.

Let see how.

First let me disable the rule I have just insert, this is not really needed but so you can see how :)

[Pa]
mysql> UPDATE mysql_query_rules SET active=0 WHERE rule_id=31;
Query OK, 1 row affected (0.00 sec)
mysql> LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.00 sec)

Done.

 

Now what I want to have is that *ANY* query that contains comment /* continent=X */ should go to the continent X schema, same server.

To do so, I will instruct ProxySQL to replace any reference to the world schema inside the the query I am going to submit.

[Pa]
DELETE FROM mysql_query_rules WHERE rule_id IN (31,33,34,35,36);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (31,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Asia\s*\*.*",NULL,0,23,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (32,1,'user_shardRW','world.','Asia.',0,23,23);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (33,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Europe\s*\*.*",NULL,0,25,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (34,1,'user_shardRW','world.','Europe.',0,25,25);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagOUT,FlagIN) VALUES (35,1,'user_shardRW',"\S*\s*\/\*\s*continent=.*Africa\s*\*.*",NULL,0,24,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,FlagIN,FlagOUT) VALUES (36,1,'user_shardRW','world.','Africa.',0,24,24);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

How this works?

I have defined mainly to concatenated rules.

The first capture the incoming query that contains the desired value (like continent = Asia).

If the match is there ProxySQL will exit that action, but while doing so it will read the Apply field and if Apply is 0 it will read the FlagOUT value. At this point it will go to the first rule (in sequence) that has the value of FlagIN equal to the FlagOUT.

The second rule will get the request and will replace the value of world with the one I have define.

In short it will replace whatever is in the match_pattern with the value that is in the replace_pattern.
Now what happens here is that ProxySQL implement the Re2 google library for RegEx.

Re2 is very fast but has some limitations, like it does NOT support (at the time of the writing) the flag option g.

In other words if I have a select with many tables and as such several "world.Re2 will replace ONLY the first instance.

 

As such a query like:

SELECT /* continent=Europe */ * FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='ITA' ;

 

Will be transformed into :

SELECT /* continent=Europe */ * FROM Europe.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='ITA' ;

And fail.

 

The other day with Rene' we were discussing how to solve this given the lack of implementation in Re2. Finally we had opted for recursive actions.

What this means?

It means that ProxySQL from v1.2.2 now has a new functionality that allow recursive calls to a Query Rule, the maximum number of iterations that ProxySQL can run, is managed by the option (global variable) mysql-query_processor_iterations. 

Mysql-query_processor_iterations define how many operation, a query process can execute as whole (from start to end).

This new implementation allow us to reference a Query Rule to itself in order to be executed multiple times.

If you go back you will noticed that QR 34 has FlagIN and FlagOUT pointing to the same value of 25 and Apply =0.

This will bring ProxySQL to recursively call rule 34 until it change ALL the value of the word world.

 

The result is the following:

[Mc]
SELECT /* continent=Europe */ Code, City.Name, City.population  FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE City.population > 10000 GROUP BY Name ORDER BY City.Population DESC LIMIT 5;
+------+---------------+------------+
| Code | Name          | population |
+------+---------------+------------+
| RUS  | Moscow        |    8389200 |
| GBR  | London        |    7285000 |
| RUS  | St Petersburg |    4694000 |
| DEU  | Berlin        |    3386667 |
| ESP  | Madrid        |    2879052 |
+------+---------------+------------+

 

You can see ProxySQL internal information using the following queries:

[Pa]
 SELECT active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 1    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      | <--
| 1      | 4    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      | <--
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

 

And

[Pa]
SELECT * FROM stats_mysql_query_digest;
<snip AND taking only digest_text>
SELECT Code, City.Name, City.population FROM Europe.Country JOIN Europe.City ON Europe.City.CountryCode=Europe.Country.Code WHERE City.population > ? GROUP BY Name ORDER BY City.Population DESC LIMIT ?

 

As you can see ProxySQL has nicely replace the word world

And executed only on the desired schema.

 

How I can shard redirecting the queries to an Host?

(Instead of a schema)This is even easier :)

The main point is that whatever match the rule, should go to a defined HG.No rewrite imply which means less work. 

So how this is done?As said before I have 3 NODES 192.168.1.[5-6-7]For this example I will use world db (no continent schema), distributed in each node, and I wil retrieve the node bind IP to be sure I am going on the right place.

What I will do is to instruct ProxySQL to send my query by using a HINT to a specific host.

I choose the hint "shard_host_HG" and I am going to inject it in the query as comment.

 

As such the Query Rules will be:

[Pa]
DELETE FROM mysql_query_rules WHERE rule_id IN (40,41,42, 10,11,12);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (10,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Europe\s*\*.",10,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (11,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Asia\s*\*.",20,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,destination_hostgroup,apply) VALUES (12,1,'user_shardRW',"\/\*\s*shard_host_HG=.*Africa\s*\*.",30,0);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;

 

While the queries I am going to test are:

[Mc]
SELECT /* shard_host_HG=Europe */ City.Name, City.Population FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='ITA' LIMIT 5; SELECT * /* shard_host_HG=Europe */ FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE 'bind%';
SELECT /* shard_host_HG=Asia */ City.Name, City.Population FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='IND' LIMIT 5; SELECT * /* shard_host_HG=Asia */ FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE 'bind%';
SELECT /* shard_host_HG=Africa */ City.Name, City.Population FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='ETH' LIMIT 5; SELECT * /* shard_host_HG=Africa */ FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE 'bind%';

 

Running the query for Africa, I will get:

[Mc]
SELECT /* shard_host_HG=Africa */ City.Name, City.Population FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='ETH' LIMIT 5; SELECT * /* shard_host_HG=Africa */ FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE 'bind%';
+-------------+------------+
| Name        | Population |
+-------------+------------+
| Addis Abeba |    2495000 |
| Dire Dawa   |     164851 |
| Nazret      |     127842 |
| Gonder      |     112249 |
| Dese        |      97314 |
+-------------+------------+
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.7    |
+---------------+----------------+

 

 

That will give me :

[Pa]
SELECT active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 40      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 0    | 41      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 2    | 42      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | 0       | <-- Note the HITS (2 as the run queries)
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

 

In this example we have NO replace_pattern this is only a matching and redirecting Rule, where the destination HG is defined in the value of destination_hostgroup attribute while inserting.

In the case for Africa is HG 30.

The server in HG 30 is:

[Pa]
SELECT hostgroup_id,hostname,port,STATUS FROM mysql_servers ;
+--------------+-------------+------+--------+
| hostgroup_id | hostname    | port | STATUS |
+--------------+-------------+------+--------+
| 10           | 192.168.1.5 | 3306 | ONLINE |
| 20           | 192.168.1.6 | 3306 | ONLINE |
| 30           | 192.168.1.7 | 3306 | ONLINE | <---
+--------------+-------------+------+--------+

 

Which match perfectly with our returned value.

You can try by your own the other two continents.

 

Using destination_hostgroup

Another way to assign to which final host a query should go is to use the the destination_hostgroup, set the Schema_name attribute and use the use schema syntax in the query.

like:

[Pa]
INSERT INTO mysql_query_rules (active,schemaname,destination_hostgroup,apply) VALUES
(1, 'shard00', 1, 1), (1, 'shard01', 1, 1), (1, 'shard03', 1, 1),
(1, 'shard04', 2, 1), (1, 'shard06', 2, 1), (1, 'shard06', 2, 1),
(1, 'shard07', 3, 1), (1, 'shard08', 3, 1), (1, 'shard09', 3, 1);

 

And then in the query do something like :

USE shard02; SELECT * FROM tablex;

 

I mention this method because is one of the most common at the moment in large companies using SHARDING.

But it is not safe, because it relays on the fact the query will be execute in the desired HG.

While the risk of error is high.

Just think if a query doing join against a specified SHARD:

USE shard01; SELECT * FROM tablex JOIN shard03 ON tablex.id = shard03.tabley.id;

 

This will probably generate an error because shard03 is probably NOT present on the host containing shar01.

As such this approach can be used ONLY when you are 100% sure about what you are doing and when you are sure NO query will have explicit schema declaration.

 

Shard By Host and by Schema

Finally is obviously possible to combine the two approaches sharding by host and have only a subset of schemas

To do so let us use all the 3 nodes and have the schema distribute as follow:

  • Europe on Server 192.168.1.5 -> HG 10
  • Asia on Server 192.168.1.6 -> HG 20
  • Africa on Server 192.168.1.7 -> HG 30

I have already set the query rules both using HINT so what I have to do is to use them BOTH to combine the operations:

[Mc]
SELECT /* shard_host_HG=Asia */ /* continent=Asia */  City.Name, City.Population FROM world.Country JOIN world.City ON world.City.CountryCode=world.Country.Code WHERE Country.code='IND' LIMIT 5; SELECT * /* shard_host_HG=Asia */ FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE 'bind%';
+--------------------+------------+
| Name               | Population |
+--------------------+------------+
| Mumbai (Bombay)    |   10500000 |
| Delhi              |    7206704 |
| Calcutta [Kolkata] |    4399819 |
| Chennai (Madras)   |    3841396 |
| Hyderabad          |    2964638 |
+--------------------+------------+
5 rows IN SET (0.00 sec)
 
+---------------+----------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+---------------+----------------+
| BIND_ADDRESS  | 192.168.1.6    |
+---------------+----------------+
1 row IN SET (0.01 sec)

 

[Pa]
mysql> SELECT digest_text FROM stats_mysql_query_digest;
+--------------------------------------------------------------------------------------------------------------------------------------------+
| digest_text                                                                                                                                |
+--------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT * FROM information_schema.GLOBAL_VARIABLES WHERE variable_name LIKE ?                                                               |
| SELECT City.Name, City.Population FROM Asia.Country JOIN Asia.City ON Asia.City.CountryCode=Asia.Country.Code WHERE Country.code=? LIMIT ? |
+--------------------------------------------------------------------------------------------------------------------------------------------+
2 rows IN SET (0.00 sec)
 
mysql> SELECT active,hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| active | hits | rule_id | match_digest        | match_pattern                          | replace_pattern | cache_ttl | apply | flagIN | flagOUT |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+
| 1      | 0    | 10      | NULL                | \/\*\s*shard_host_HG=.*Europe\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 2    | 11      | NULL                | \/\*\s*shard_host_HG=.*Asia\s*\*.      | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 12      | NULL                | \/\*\s*shard_host_HG=.*Africa\s*\*.    | NULL            | NULL      | 0     | 0      | NULL    |
| 1      | 0    | 13      | NULL                | NULL                                   | NULL            | NULL      | 0     | 0      | 0       |
| 1      | 1    | 31      | NULL                | \S*\s*\/\*\s*continent=.*Asia\s*\*.*   | NULL            | NULL      | 0     | 0      | 23      |
| 1      | 4    | 32      | NULL                | world.                                 | Asia.           | NULL      | 0     | 23     | 23      |
| 1      | 0    | 33      | NULL                | \S*\s*\/\*\s*continent=.*Europe\s*\*.* | NULL            | NULL      | 0     | 0      | 25      |
| 1      | 0    | 34      | NULL                | world.                                 | Europe.         | NULL      | 0     | 25     | 25      |
| 1      | 0    | 35      | NULL                | \S*\s*\/\*\s*continent=.*Africa\s*\*.* | NULL            | NULL      | 0     | 0      | 24      |
| 1      | 0    | 36      | NULL                | world.                                 | Africa.         | NULL      | 0     | 24     | 24      |
+--------+------+---------+---------------------+----------------------------------------+-----------------+-----------+-------+--------+---------+

 

As you can see rule 11 has two HITS, which means my queries will go to the associated HG.

But given Apply for rule 11 is =0, ProxySQL will first continue to process the QueryRules.

As such it will also transform the queries as for rules 31 and 32, each one having the expected number of hits (1 the first and the 4 because the loop, the second).


Credits

It is obvious that I need to acknowledge and kudo the work Rene' Cannao is doing to make ProxySQL a solid, fast and flexible product.

I have also to mention that I was and am working with him very often, more often than he likes, asking him fix and discussing with him optimization.

Requests that he try to satisfied with surprising speed and efficiency.

 

Reference

https://github.com/sysown/proxysql/tree/v1.2.2/doc
https://github.com/google/re2/wiki/Syntax
http://www.proxysql.com/2015/09/proxysql-tutorial-setup-in-mysql.html
https://github.com/sysown/proxysql/blob/v1.2.2/doc/configuration_howto.md
https://github.com/sysown/proxysql/blob/v1.2.2/INSTALL.md
https://dev.mysql.com/doc/index-other.html

Last Updated on Wednesday, 24 August 2016 08:18
 
AWS Aurora Benchmark - Choose the right tool for the job PDF Print E-mail
Written by Marco Tusa   
Thursday, 19 May 2016 00:00

 

Some time ago, I published the article “AWS Aurora Benchmarking - Blast or Splash?”. In which I was analyzing the behavior of different solutions using synchronous replication in AWS environment.

After I published it, I received a lot of comments and feedback, from the community and from Amazon engineers.

Given that I had decide to perform another round of tests, keeping into account the comments received and the suggestions.

I had presented some of the results during the Percona conference in Santa Clara last April 2016. The following is the transposition into an article of that presentation with more details.

 

 

Why new test?

Very good question, with an easy answer.

Aurora is a product that is still under development and code refining, six months of development could present major changes in performance. Not only, the initial tests where focus on entry level solutions, meaning I was analyzing that kind of user, that are currently starting their business and looking for a flexible solution allow them to save money and scale.

This time I had put the focus on enterprise solution, analyzing what an already well establish company would eventually get when in the need to find for a decent scalable solution.

As such two different scenarios.

Why so many (different) tests?

I had used many different benchmarking tool, and I am still planning to run others. Why so? Why don’t simply relay in one of them?

Again simple answer, I had use different tools because in some case they provide me different way of access and use data. Not only, I do not trust benchmarking tools, not even the one I had developed, as such I want to tests same thing using different tools and compare results, ONLY if I see a common pattern, then I consider the test valid. Personally I tend to discard any test is not consistent or analysis performed using a single benchmarking tool. In my opinion be lazy is not an option when doing this kind of exercises.

Tests run

I had run three main kind of tests:

  • Performance and load stress
  • High Availability failover
  • Response time (latency) from application point of view

Performance and load stress

These tests were the most extensive and demanding.

I was analyzing the capacity to serve load in different conditions, from light load up to full utilization and some degree of saturation or resources.

 

First set of tests was to evaluate simple load on a single table causing the table to become a hotspot and showing how the platform would manage the increasing contention.

Second set of tests was to perform similar load but distributing it cross multiple table and batching the operations. Parallelization, contention, scalability and distributed hotspots where in the picture.

The two above were focus on write operation only, and were done using different tools comparing the results given they were complementary.

 

Third set of tests, using my own stresstool, was focus on R/W oriented usage. Performed tests execute against multiple tables, performing CRUD actions, using simple and batch insert, reads by PK, index, by range, IN and exact match conditions.

Fourth set of tests was performed using TPC-C like load (OLTP).

Fifth set of tests was using Sysbench in OLTP mode with 250 tables.

 

Scope of the last three set of tests was to identify how the platforms would had managed the load, considering the following:

  • Read and write contention on the same tables
  • High level of parallelism (from the application)
  • Possible hot-spots (TPCC district)
  • Increasing utilization (memory, threads, IO)
  • Saturation (connections)

Finally, all tests were run with fully utilized BufferPool.

 

About the tests

It was difficult to compare apple with apple here. And I think that is the main point to keep in mind.

Aurora is not a standard RDS solution, as we were used to have.
Aurora looks like MySQL, smell like MySQL, but is not vanilla MySQL.

To achieve what they have to achieve the engineers there had to change many parts, and the more you dig the more you realize there are significant differences.

Because that I had to focus more on identify what each solution can do, comparing solutions against expectations, instead comparing the numbers for the numbers.

 

I was more interested to see, what happen if I have a burst of connections and my application will go from 4K to 40K connections. Will it crash? Will it slow down?

How long I should wait if a node fails?

What should I prevent to have in my schema design, in order to do not have bottlenecks.

 

In this context, those in my opinion, are relevant questions, more than discover that solution A can have 3000 rows written/sec and the other can have 3100.

Or that I may (may) have some additional page rotation, file -> memory-> flush because the amount of memory differs.

 

Those are valuable information too, for sure, but less than have a decent understanding of which platform will help my business to grow and remain stable.

What is the right tool for the Job? This is the question I was addressing.

 

The machines

Small Boxes (first round of tests)

 

EIP = 1
VPC = 1
ELB=1
Subnets = 4 (1 public, 3 private)
HAProxy = 6 
MHA Monitor (micro ec2) = 1
NAT Instance (EC2) =1 (hosting EIP)
DB Instances (EC2) = 3 (m4.xlarge) 16GB
Application Instances (EC2) = 6 (4)
EBS SSD 3000 PIOS
Aurora RDS node = 3 (db.r3.xlarge) 30GB
 

 

 

Large Boxes (latest tests)

 

EIP = 1
VPC = 1
ELB=1
Subnets = 4 (1 public, 3 private)
HAProxy = 4
MHA Monitor (micro ec2) = 1
NAT Instance (EC2) =1 (hosting EIP)
DB Instances (EC2) = 3 (c3.8xlarge) 60GB
Application Instances (EC2) = 4
EBS SSD 5000 PIOS
Aurora RDS node = 3 (db.r3.8xlarge) 244GB
 

 

Note

It was pointed to me that I had deliberately choose to use an Ec2 solutions for PXC with less memory than the one available in Aurora.

This is true, and we must keep in to consideration.
Reason for this is the fact that the only Ec2 solution matching the memory of a db_r3.8xlarge is the d2.8xlarge.

I did try it but the level of scalability I got from the CPU point of view was less efficient than the one available with c3.8xlarge.

Given that I had decide to prefer CPU resource to memory here, especially because I was going to test, concurrency and parallelism in conjunction to load increase.

From the result I got I feel confident that I choose right, but I am open to comment.

 

 

 

The layout

This is how the setup looks like

hootsuite_mysql_ha_failover - poc architecture

Where you read Java, those are the application nodes running the different test applications.

Two words about Aurora first

Aurora has few key concepts that must have clear in mind. Especially how it manages the writes cross replica, and how connections are implemented.

The IO activity

To replicate the information across the different storage, Aurora replicate FRM files and data coming from IB_LOGS only. This is a quite significant advantage to other form of replication, given the limited number of bytes that are replicated over the network also if they are replicated for 6 times.

Screen Shot 2016-05-12 at 9.39.17 AM

image from Amazon Aurora Deep dive

 

Another significant advantage is that Aurora does not use double write buffer which is obviously another blast (see recent optimization in Percona Server https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/ ) .

Simplifying, writes in Aurora are organized filling its commit queue and pushing the changes as group commit to the storage.

Screen Shot 2016-05-12 at 1.34.10 PM

image from Amazon Aurora Deep dive

 

Now in some presentations you may have seen that all steps are asynchronous, but is important to underline that a commit is acknowledge by Aurora when at least 2 AZ had received and wrote the incoming data related to that commit. Writes here means received in the storage node incoming queue, and with a quorum of 4 over 6 nodes.

This means that no matter what, data has to travel on the network reach the final destination and ack signal come back, before Aurora returns the ack to the commit operation. Network is in the same region but still it could represent an incognita about performance. No wonder if we may have some latency at this stage.

As you can see what I am reporting is also confirmed in the image below and in the observations, point is that from that slide is not clear the impact of the step 1 – 2.

Screen Shot 2016-05-12 at 9.41.55 AM

image from Amazon Aurora Deep dive

 

Thread pooling

Oh yes, Aurora use thread pooling, a lot. That will become very clear later, and more the work is based on parallelism, more efficient thread pooling seems to be.

In most cases we are used to see CPUs on database servers not fully utilized, unless some heavy ordering operation or bad query. That behavior is also (not only) a direct consequence of the connection-to-thread model, that imply period of latency and stand by. In Aurora the incoming connections are not following the same model, instead the pool redistributes the load of the incoming connection to a pool of threads, optimizing the latency period, resulting in a higher CPU utilization. Which is what you want from your resource, to be utilized and not sit there waiting for something else to do its job.

 

Screen Shot 2016-05-12 at 9.42.13 AM

image from Amazon Aurora Deep dive

 

 

The results

Without additional waste of electronic ink, let see what comes out by this round of tests (not the final one by the way). To simplify the reading, I will report also the graphs from the first set of tests, but will focus on the latest, Small Boxes = SB, Large Boxes LB.

First test: IIBench

As declared previously my scope was to verify how the two platforms would have reacted to simple load focus on insert on a basic single table, bufferpool was saturated before the running.

SB

iibench_exectime_old

 

LB

iibench_exectime_new 

 

As we can see in presence of a hot spot the Solution using PXC outperform the Aurora, in both cases. What is notable though is that while PXC remain approximately around the same time/performance, Aurora is significantly reducing the time taken. This shows that Aurora was actually taking advantage of the more powerful platform while PXC was not able to.

Analyzing in more details what was happening, we can notice that Aurora is actually performing atomically better. It was able to manage more writes/second as well as rows and page managed. But it was inconsistent, Aurora was having performance hiccups at regular intervals. As such the final result was that it takes more time to process the whole workload.

I was not able to dig a lot given some metrics are not fully available in Aurora, as such I had to fully rely on Aurora engineers who mention me the hot-spot contention as possible issue.

 

Aurora Handler calls

iibench_aurora_handlers

PXC Handlers calls

iibench_pxc_handlers

 

The execution in PXC is showing less calls but constant performance, while Aurora has hiccups.

Aurora Page Activity Write

iibench_aurora_page

PXC Page Activity Write  

iibench_pxc_page

 

The trend shown by the handler stay consistent in the page management and rows insert, as logically expected.

Second test Application ingest

As mention this test see many threads from different application servers inserting by batch of 50 statement against multiple tables.

The results coming from this test are quite in favor to Aurora, as we can see starting from the time taken to complete the same workload:

LB

app_ingest_exec_time_old

 

 

SB

app_ingest_exec_time_new

While with small ones the situation was the inverse.

But here starts the interesting part.

Aurora is able to manage significantly higher number rows as the picture below shows

app_ingest_rows_inserted_old

The results are also quite constant and not significantly decreasing like the inserts with PXC.

But the number of Handler commits are significantly less.

 

 

 

 

app_ingest_rows_inserted_new

 

Once more they stay the same on the load increase, without impacting performance.

Reviewing all Handlers call we have a first surprise

PXC Handlers calls

 

app_ingest_pxc_handlers

Aurora Handlers calls

 

app_ingest_aurora_handlers

The gap/drop existing in the two graphs are the different tests (with increasing number of threads)

We have two things to notice here, the first one is that PXC has a decrease in performance while processing the load, while Aurora has not. The second (you need to zoom the image) the number of commit is floating in PXC while it stays fix in Aurora.

Even bigger surprise comes up when reviewing the connections graphs.

As expected PXC is having all my connections open there and the number of threads running is quite close to what is the number of the threads connected.

app_ingest_pxc_connections

And both of them follow the increasing number of connected threads.

But this is not the case in Aurora.

app_ingest_aurora_connections

Also if my applications are actually trying to open ~800 threads, the Aurora node see only a part of them, and the number of running is fix to 32 threads.

Thing to consider here are the following, first my applications does not connect directly to Aurora instance, but to a connector (MariaDB). Second Aurora, in this case, cap the number of running threads to the number of CPU available on the instance (here 32).

Given that I may expect to have worse performance, but I do not.

The fact that Aurora use one thread for multiple connections seems working quite efficiently here.

See also the number of Rows inserted is consistent with the handler calls and better performing than PXC.

Aurora Rows inserted

app_ingest_aurora_rows

 

 

PXC Rows inserted

 

app_ingest_pxc_rows

 

 

Again we have the same trend, only this time we have Aurora able to perform definitely better than PXC.

 

Third test: OLTP application

When run on the small boxes this test saw PXC performing tons’ time better then Aurora,

The time taken by Aurora was ~3 times the one taken by PXC

app_oltp_exec_time_old

With large box I had the inverse, Aurora is outperforming PXC by many times, being from two up to almost 7 times faster then PXC.

app_oltp_exec_time_new

 

Analyzing the number of commands executed with increasing workload, we can see how PXC is able to perform better than Aurora with a workload of 128 threads, but is starting to have worse performance as the load increase.

On the other hand, Aurora is able to manage the load and in read/write without significant performance loss, that include being able to increase the number of commits/sec.

commands_OLTP

Reviewing the Handler calls, we see gain that the Handler commit calls are significantly less in Aurora as already noticed in the ingest tests.

 

app_oltp_handlers_call

The other thing to notice, is that the number of calls for PXC is significantly higher and not scaling, while Aurora has a nice scaling trend.

Forth Test: TPCC-mysql

Tpcc test is main to test OLTP traffic, with the note that some tables like district my become a hotspot. The tests I run were executed against 400 warehouse and using 128 Threads as maximum for the small box and 2048 threads for the Large.

During this test I hit one of the Aurora limitation and I had escalated that to the Aurora engineers, who are aware of the problem.

Small boxes

tpcc_old

 

In the case of small boxes, there is nothing to say, PXC is able to manage the load more efficiently, also if his trend is not optimal having significant fluctuation. Aurora is just not able to keep the it up.

Large boxes

Different and a bit more complex scenario in the case of the use of large boxes.

I would like to say that Aurora is performing better:

tpcc_new

 

And as you can see this is true for 2 tests over 3, and up to when it got stuck by internal limitation, Aurora was also performing better on the 3td. But then its performance just collapse.

Performing more in depth investigation I noticed that under the hood, Aurora was not performing as well as it looks like.

That comes out quite clear performing comparison between few graphs covering Comm_ execution, Open Files, Handlers and Innodb row lock time.

In all of them is quite evident how PXC is able to keep serving the workload with consistent behavior, while Aurora fails from the second test on (512 threads), and not only on the 3td with 2048 threads.

Aurora

 tppc_aurora_com

PXC

 

tppc_pxc_com

 

It is clear how Aurora was better serving during the test with 256 threads going over the 450K com select serve (in 10 sec interval), comparing with PXC that was not able to go over 350K.

But in the tests after while PXC was able to keep going, also if with decreasing performance, Aurora was starting to struggle, with very inconsistent behavior.

This was also confirmed by the open files graph

 

Aurora

tppc_aurora_files

PXC

tppc_pxc_files

The graphs show the instance of files open during the running, not the one already open.
It reflect the Open_file metric “The number of files that are open. This count includes regular files opened by the server. It does not include other types of files such as sockets or pipes. Also, the count does not include files that storage engines open using their own internal functions rather than asking the server level to do so”. I was quite surprise by the number of files open by Aurora.

Handlers as well were reflecting the same behavior

Aurora

 

tppc_aurora_handlers

PXC

tppc_pxc_handlers

tppc_pxc_handlers

 

Perfectly in line with the Com trend.

So what was instead reversely increasing?

Aurora

 

tppc_aurora_locks

PXC 

tppc_pxc_locks

As you can see from the above, the exactly same workload, had generate an increasing lock row time, from quite low in the test with 256 threads, up to crazy high in the one with 2048 threads.

As mention we know that TPCC has a couple of tables that works as hotspots, and we had already saw with IIbench how Aurora is not working efficiently in that cases.

As additional information during the tests, I was getting a lot of 188 errors, this is an Aurora internal error. When I report it, I was told, they know about it, and they are planning to work on it.

I hope they will do soon, because if this issue is solved it is very likely that Aurora will not only be able to manage the tested workload, but go over it by far.

I am saying this because also with the identified issues Aurora was able to keep going and manage a more then decent response time during the test2 with 512 threads.

 tppc_response_time

Fifth test: Sysbench

I add the sysbench tests to test the scalability, and to see the what happen when the system reaches the saturation point.

This test brought up some limitation existing in the Aurora solution, more related to the connector than the Aurora engine itself.

Aurora has a limit of 16k connection, as said I was looking to see what happens if I got to saturation point or close to it. It doesn’t matter if this is a crazy high number or not.

What happened is that I was able to have Aurora managing traffic up to 4K but the more I was going close to the limit, the more I was having issue in connectivity, more than anything else.

At the end I had to run the test with 8k 12k and 20k threads pointing directly to the Aurora instance, bypassing the connector that was not able to serve the traffic.

After that I was able to hit up to ~15500 Threads but with a lot of inconsistent performance. Given that I am defining the limit of meaningful test to the previous level of 12K threads.

PXC was able to scale up to 16K no problem.

What also is notable here is that Aurora was able to mange the workload more efficiently in terms of transaction handling as transactions executed and latency.

sys_bench_high_threads_trnsactions

 


The number of transaction executed by Aurora were ~three times the one executed by PXC.

 

 

sys_bench_high_threads_latency

 

Also in term of latency Aurora was showing less latency then PXC.

Internally Aurora and PXC operations were once more different in terms of how the workload was handle. The most diverging result was the handlers calls.

sys_bench_high_threads_Handlers_write

Commit calls in Aurora were a fraction of the calls in PXC, while the number of rollback was higher.

The read calls had an even more diverging behavior, with PXC performing high number of read_keys, while Aurora was having a very limited number of them. Read_rnd are very high in PXC but totally absent in Aurora (note that in Aurora, read_rnd are reported but seems not really increasing).  On the other hand, Aurora report a high number of read_rnd_next while PXC has none.

 

sys_bench_high_threads_Handlers_read

HA availability

Fail-over time

 

Both solutions

ha_time  

In this test the fail-over time had seen the solution using Galera and HAProxy to be more efficient. That was happening with limited or mid level load, one assumption is that given Aurora has in any case to verify the status of the data transmitted and its consistency across the 6 data store node, the process is not so fast as it could be.

 

Or, another assumption, it could be that the cluster connector is not as efficient as it should in redirecting the traffic from one node to another. It would be a very interesting exercise to replace it with some other custom solution.

 

Note that I was performing the tests following the Amazon indication to use the following to simulate a real crash:

ALTER SYSTEM CRASH [INSTANCE|NODE]

As such, I was not doing anything strange or out of the ordinary.

 

It is worth mentioning that of the 8 seconds taken by MySQL/Galera to perform the failover, 6 were due to the HAProxy settings which had 3000 ms interval and 2 loops in the settings before executing failover.

Execution latency

The scope of this tests, was to identify the latency existing between the moment that application send the request, and the moment MySQL/Aurora take the request in “charge”.

The expectation is that the more the database will get busy, the longer latency will exist.

For this test I had report both results, the one coming from old test with small box and the new one with large box.

 

Small Boxes  

ha_latency_old

Large boxes

ha_latency_new

It is clear from the graphs that the two tests report a different scenario.

In the first Galera was able to manage the load more efficiently and serve requests with lower latency.

 

For the new tests, I had utilized higher number of threads than the ones for the small box, nevertheless in the second CPUs utilization and the number of running threads drive me to think Aurora was finally able to to utilize the resource more efficiently and the latency, just drop.

To mention, that latency was jumping up again when the number of connection was going above the 12K, but that was expected given previous tests results.

Conclusions

High Availability

The two platforms were shown to be able to manage the failover operation in a limited time frame (below 1 minute).

Nevertheless, MySQL/Galera was shown to be more efficient and consistent.

This result is a direct consequence of the synchronous replication, that by design brings MySQL/Galera in to not allow an active node to fell behind.

In my opinion the replication method used in Aurora, is efficient, and given data is shared across the read replicas, fail-over should happen faster.

I had suffered a lot during the tests because the connector, and I have the feeling that having another solution in place may bring some surprise, and actually I would really like to test that as well.

 

Performance

In this run of tests Aurora was able to invert the results I had in the first test with the small boxes. In almost all cases I had Aurora performing as well or better then PXC. There are still cases where Aurora is penalized and those are the ones where hotspots are present, and contention in Aurora is killing the performance, and raise errors (188). But I hope we will see a significant evolution soon.

 

 

General comments on Aurora

The product is evolving quickly, and benchmark results may become obsolete in very short time, this is why is important to have repeatable and comparable tests.

From my point of view, in this set of tests Aurora had clearly show where it fits better.  

Critical applications that require High Available platform, and a lot of CPU power.

There is no reason to use Aurora in small-mid boxes, the platform is not going to be as efficient as a standard solution like PXC.

But if cost is not an issue, and the application really require a lot of parallelism, Aurora on db.r3.8xlarge is a good solution.

I still see space for improvements, like for cluster connectors, or the time taken to restart a cluster after a full stop, or contention reduction.

But I am also confident that the work lead by the developer team will fix most of my concern (and more) soon.

Final note, it would be nice to have the code open source, to have the community to contribute, also if I understand the business reasons for not to.

 

 

 

 

Last Updated on Thursday, 19 May 2016 12:35
 
«StartPrev12345678910NextEnd»

Page 1 of 11
 

Who's Online

We have 15 guests online