Home MySQL Blogs
MySQL

My MySQL tipsvalid-rss-rogers




23
Mar
2014
Effective way to check the network connection performance, when using replication geographically distributed PDF Stampa E-mail
Scritto da Marco Tusa   

Why this article

The more I have to interact with customers asking about MySQL/Galera, the most I have to answer over and over to the same question about what kind of network conditions Galera can manage efficiently.

One of the most frequent myths I have to cover at the start of any conversation that involve the network is PING.

Most of the customers use PING to validate the generic network conditions, and as direct consequence they apply that approach also when in need to have information on more complex and heavy use like in the Galera replication.

To have a better understanding why I consider the use of PING, not wrong but inefficient, let us review some basic networking concepts.

 

Frame

At the beginning stay the physical layer, but I am going to skip it otherwise article will be too long, what I want only to say is that unless you are able to afford a leased line to connect to your distributed sites, you are going to be subject to Packet switching, as such affected by: throughput, bandwidth, latency, congestion, and dropped packets issues.

Given the physical layer, and given we are talking about Ethernet connections, the basic transporter and the vector that encapsulates all the others is the Ethernet frame. A frame can have a dimension up to 1518 bytes and nothing less then 64.

A frame has a header compose by:

  • Preamble of 7 bytes,
  • Delimiter 1 bytes,
  • MAC address destination 6 bytes,
  • MAC address destination 6 bytes,
  • Optional fields (IEEE 802.1Q/ IEEE 802.1p), 4 bytes
  • Ethernet type or length (if > 1500 it represent the type; if < 1501 it represents the length of the Payload, 2 bytes
  • PayLoad, up to 1500 bytes
  • Frame CRC, 4 bytes
  • Inter-packet gab, this is the space that is added between to frames

framedimension

 

 

A frame can encapsulate many different protocols like:

  • IPv4
  • IPv6
  • ARP
  • AppleTalk
  • IPX
  • ... Many more

The maximum size available for the datagram to be transmitted is of 1500 bytes, also known as MTU. That is, the MTU or Maximum transmission unit is the dimension in bytes that a frame can transport from source to destination, we will see after that this is not guarantee and fragmentations can happen.

Only exception is when Jumbo Frames are supported, in that case a frame can support a payload with a size up to 9000 bytes. Jumbo frames can be quite bad for latency, especially when the transmission is done between data-centre geographically distributed.

 

 

IP (internet protocol)

For the sake of this article we will focus on the IPv4 only. The IPv4 (Internet Protocol) is base on the connectionless and best-effort packets delivery, this means that each frame is sent independently, the data is split in N IP datagram and sent out to the destination, no guarantee it will deliver or that the frames will arrive in the same order they are sent.

Each IP datagram has a header section and data section.

The IPv4 packet header consists of 14 fields, of which 13 are required.

The 14th field is optional (red background in table) and aptly named: options.

 

IPv4 Header Format

Offsets 

Octet 

0

1

2

3

Octet 

Bit 

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

0

0

Version

IHL

DSCP

ECN

Total Length

4

32

Identification

Flags

Fragment Offset

8

64

Time To Live

Protocol

Header Checksum

12

96

Source IP Address

16

128

Destination IP Address

20

160

Options (if IHL > 5)

   

 

 

An IP datagram is encapsulated in the frame…

ip_datagram

 

 

Then sent.

For performance the larger is the datagram (1500 MTU), the better, but this only in theory, because this rules works fine when a datagram is sent over a local network, or a network that can guarantee to keep the level of MTU to 1500.

What happen in real life is that a frame sent over the Internet has to go over many different networks and there is no guarantee that all of them will support the same MTU, actually the normal condition is that they don’t.

So when a frame pass trough a gateway, the gateway knows the MTU of the two links (in/out) and if they do not match, it will process the frame fragmenting it.

See below:

fragmentating_net

 

Assuming Host A is sending an IP datagram of 1400 bytes to Host B, when Gw1 will receive it, it will have to fragment it in to smaller peaces to allow the frame to be transported:

fragmented

 

 

Once a datagram is fragmented to match the largest transportable frame, it will be recompose only at destination. If one of the fragments of the datagram is lost, for any reason, the whole datagram got discarded and transmission fails.

 

 

ICMP

The IP specification imposes the implementation of a special protocol dedicated to the IP status check and diagnostics, the ICMP (Internet Control Message Protocol).

Any communication done by ICMP is embedded inside an IP datagram, and as such follow the same rules:

icmp_fragment

 

 

ICMP has many “tools” in his pocketknife, one if them is PING.

 

A ping is compose by a echo request datagram and an echo reply datagram:

00

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

Type = 8 (request) OR =0 (reply)

Code = 0

Header Checksum

Identifier

Sequence Number

Data

 

Request and reply have a similar definition, with the note that the replay MUST contain the full set of data sent by the echo request.

Ping by default sent a set of data of 56 bytes plus the bytes for the ICMP header, so 64 bytes.

Ping can be adjust and to behave in different way, including the data dimension and the don’t fragment bit (DF). The last one is quite important because this set the bit that will mark the datagram as not available for fragmentation.

As such if you set a large dimension for the data and set the DF, the datagram will be sent requesting to not be fragmented, in that case if a frame will reach a gateway that will require the fragmentation the whole datagram will be drop, and we will get an error.

 

 

I.e.:

ping -M do -s 1472 -c 3 192.168.0.34
PING 192.168.0.34 (192.168.0.34) 1472(1500) bytes of data.
1480 bytes from 192.168.0.34: icmp_req=1 ttl=128 time=0.909 ms
1480 bytes from 192.168.0.34: icmp_req=2 ttl=128 time=0.873 ms
1480 bytes from 192.168.0.34: icmp_req=3 ttl=128 time=1.59 ms

as you can see I CAN send it using my home wireless because is below 1500 MTU also if I ask to do not fragment, this because my wireless is using 1500MTU.

But if I just raise the dimension of ONE byte:

ping -M do -s 1473 -c 3 192.168.0.34
PING 192.168.0.34 (192.168.0.34) 1473(1501) bytes of data.
From 192.168.0.35 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 192.168.0.35 icmp_seq=1 Frag needed and DF set (mtu = 1500)
From 192.168.0.35 icmp_seq=1 Frag needed and DF set (mtu = 1500)

Removing the –M (such that fragmentation can take place)

root@tusacentral01:~# ping  -s 1473 -c 3 192.168.0.34
PING 192.168.0.34 (192.168.0.34) 1473(1501) bytes of data.
1481 bytes from 192.168.0.34: icmp_req=1 ttl=128 time=1.20 ms
1481 bytes from 192.168.0.34: icmp_req=2 ttl=128 time=1.02 ms
1481 bytes from 192.168.0.34: icmp_req=3 ttl=128 time=0.996 ms

 

PING is widely used to test connectivity between two network points, which is good, and from the example above, you may think that it is a good tool to test the network also in more complex situations, like when we need to be sure that the virtually synchronous galera replication will NOT be affected.

But you are wrong, the PING tool as part of the ICPM for his operations it is still using the basic IP protocol which is base on the best effort delivery of the datagram, while any data that is transmitted between the nodes in a MySQL/Galera cluster, use the TCP protocol, the two are bound (from TCP point of view), but the TCP implementation behaviour is quite different.

To understand that let us review at very high level (one billion of kilometres high) the TCP characteristics.

 

 

TCP

As mentioned previously the IP and related ICPM work on the base of the best effort delivery of the datagram, what this means in short is that whatever I sent using it is sent without guarantee it will reach destination, and without actually caring to send it if it fails, more no real check is perform between source and destination regarding the data transmission.

TCP means Transmission Control Protocol and as the name says, it is design to control the data transmission happening between source and destination.

TCP’s implementations offer a reliable, ordered, error-checked delivery of a stream of octets between source and destination.

The TCP is a transmission protocol and as such it works at the transport level of the OSI model:

OSI Model

Data unit

Layer

Function

 

Host
layers

Data

7. Application

Network process to application

 

6. Presentation

Data representation, encryption and decryption, convert machine dependent data to machine independent data

 

5. Session

Interhost communication, managing sessions between applications

 

Segments

4. Transport

Reliable delivery of packets between points on a network.

 

Media
layers

Packet/Datagram

3. Network

Addressing, routing and (not necessarily reliable) delivery of datagrams between points on a network.

 

Bit/Frame

2. Data link

A reliable direct point-to-point data connection.

 

Bit

1. Physical

A (not necessarily reliable) direct point-to-point data connection.

 

 

 

TCP implementations use the IP protocol encapsulation for the transmission of the data:

tcp_datagram_incapsulation

 

A TCP implementation has several characteristics that make sense to summarize:

  • Is stream oriented, that' it when two applications open a connection base on TCP, they will see it as a stream of bit that will be deliver to the destination application, exactly in the same order and consistency they had on the source.
  • Establish a connection, which means that the host1 and host2 before start to send data over, must perform a handshake operation, which will allow them to know each other state. Connection use a three way handshake:

tcp_con_open

Once it is establish the two parts will keep to monitor the data transfer, in case of crash interruption of the service both hosts will notify the application involved in the stream transfer.

  • TCP sees a connection
  • Buffered transmission, the applications communicating the data can choose whatever dimension to send, it could be a single byte a time. The transport layer will buffer the data in order to match the best datagram dimension for that specific moment, and will send it. The dimension of the datagram is adaptative which means it can be adjusted in relation to the network transfer condition.
  • Unstructured stream, meaning the stream does not have a predefine format/dimension for internal data, as such the applications must know how to read from the stream.
  • Full-duplex connection, data can be transmitted simultaneously from both side.
  • Segment TCP implementation see the stream as a sequence of octet (bytes), which it will split in segments, the segment dimension is normally as large as a datagram.

 

TCP header definition:

TCP Header

Offsets

Octet

0

1

2

3

Octet

Bit 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

0

0 

Source port

Destination port

4

32 

Sequence number

8

64 

Acknowledgment number (if ACK set)

12

96 

Data offset

Reserved
0 0 0

N
S

C
W
R

E
C
E

U
R
G

A
C
K

P
S
H

R
S
T

S
Y
N

F
I
N

Window Size

16

128 

Checksum

Urgent pointer (if URG set)

20
...

160
... 

Options (if data offset > 5. Padded at the end with "0" bytes if necessary.)
...

                                                                   

Source port (16 bits)

 

 

As said TCP implementations are reliable and can re-transmit missed packets, let see how it works:

tcp_pcg_sent_simpletcp_pcg_sent_error

 

This mechanism is in place for each TCP transmission, and ensures the correct send/receive of the segment.

It is easy to understand how also if effective, having this mechanism working for a single segment at time is far to be efficient.  This because considering a set of packages from 1 to 100, before sending package 2, the dispatch of package 1 must be acknowledge.

To solve this issue, the TCP use the concept of dynamic slide window. That’s it the TCP implementations manage to dispatch multiple packages at the same time:

sliding_window_initial

 

 

A sliding window in TCP implementations works at octet level not by segment or by package, each stream’s octet has a sequential number, the dispatcher manages three pointers associated to each connection.

The first pointer indicate the start of the sliding window, splitting the octet that had be dispatched & acknowledge from the ones that are in progress or still to dispatch.

The second pointer indicates the higher octet that can be dispatch before getting the acknowledgments for the already dispatched octet.

The third pointer indicates the window limit after which no dispatch can be done, until further slide.

 

sliding_window_pointers

 

 

In TCP implementation the slide window has, as previously mentioned, a dynamic dimension, each acknowledgement received is also containing a declaration of how many octet the receiver can accept.

The whole discussion now will become much more complex and not pertinent to the scope of this article, but is worth to mention that, it is here that the whole concept of flow/congestion control will take place.

For the sake of this article and simplifying a lot let us say that when there are optimal conditions in a local lan the dimension of a segment will coincide with the maximum MTU available.

While when sending traffic over the internet, the TCP implementation will have to manage not only the optimal initial dimension, but possible and probable issue while the transmission will take place, reducing and enlarging the sliding window as needed, and in some cases put it on hold.

If you are really interested on how congestion works in the TCP/IP world I suggest this book (http://www.amazon.com/Adaptive-Congestion-Networks-Automation-Engineering-ebook/dp/B008NF05DQ)

 

 

Back to the topic

So we have seen how PING works, and the fact it send a simple echo request, using a very basic datagram, with no dispatch or connection error handling.

On the other hand we have see how complex TCP implementation is, and how important is in the TCP implementation the concept of adaptative transmission (that will also seriously affected by available bandwidth) and congestion control.

Keeping that in mind, there is no way we can compare a TCP/IP data transmission between to points with a simple ICMP/IP echo, no matter how large we will make the datagram, or if we declare it not to fragment, or whatever a Ping is executing and following a different protocol, and should never be used to validate the data transmission in a TCP/IP scenario.

Back to square 1, how I can evaluate my connection between two geographically distributed sites?  In detail how I can be sure the transfer will be efficient enough to support the Galera replication?

When implementing Galera, some additional status metrics are added in MySQL, two of them are quite relevant to the purpose of this article:

  • Wsrep replicated bytes
  • Wsrep received bytes

If you are smart, and I am sure you are, you already have some historical monitor in place, and before even thinking to go for geographical distributed replication you have tested and implement MySQL/Galera using local cluster.

As such you will have collected data about the two metrics above, and will be easy to identify the MAX value, be careful here do not use the average.

Why? Because the whole exercise is to see if the network can efficiently manage the traffic of a (virtually) synchronous connection, without having the Galera flow control to take place.

As such let us say we have 150Kb/s as replicated bytes, and 300k as received, and see what will happen if we use PING and another tool more TCP oriented.

In the following scenario I am going to first check if the two locations are connected using a high-speed link that will allow 1500 MTU, then I will check the connection state/latency using the max dimension of 300K.

 

 Test 1

[root@Machine1~]# ping -M  do -s 1432 -c 3 10.5.31.10
PING 10.5.31.10 (10.5.31.10) 1432(1460) bytes of data.
From 192.168.10.30 icmp_seq=1 Frag needed and DF set (mtu = 1398)
From 192.168.10.30 icmp_seq=2 Frag needed and DF set (mtu = 1398)
From 192.168.10.30 icmp_seq=2 Frag needed and DF set (mtu = 1398)

 

 Test 2

[root@Machine1~]# ping -M  do -s 1371 -c 3 10.5.31.10
PING 10.5.31.10 (10.5.31.10) 1371(1399) bytes of data.
From 192.168.10.30 icmp_seq=1 Frag needed and DF set (mtu = 1398)
From 192.168.10.30 icmp_seq=1 Frag needed and DF set (mtu = 1398)
From 192.168.10.30 icmp_seq=1 Frag needed and DF set (mtu = 1398)

Test 3
[root@Machine1~]# ping -M  do -s 1370 -c 3 10.5.31.10
PING 10.5.31.10 (10.5.31.10) 1370(1398) bytes of data.
1378 bytes from 10.5.31.10: icmp_seq=1 ttl=63 time=50.4 ms
1378 bytes from 10.5.31.10: icmp_seq=2 ttl=63 time=47.5 ms
1378 bytes from 10.5.31.10: icmp_seq=3 ttl=63 time=48.8 ms

 

As you can see the link is not so bad, it can support 1370 MTU, so in theory we should be able to have a decent connection.

Let see... what happens with PING

 

[root@Machine1~]# ping   -c 3 10.5.31.10
PING 10.5.31.10 (10.5.31.10) 56(84) bytes of data.
64 bytes from 10.5.31.10: icmp_seq=1 ttl=63 time=49.6 ms
64 bytes from 10.5.31.10: icmp_seq=2 ttl=63 time=46.1 ms
64 bytes from 10.5.31.10: icmp_seq=3 ttl=63 time=49.7 ms
--- 10.5.31.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2052ms
rtt min/avg/max/mdev = 46.189/48.523/49.733/1.650 ms

[root@Machine1~]# ping -M  do -s 300 -c 3 10.5.31.10
PING 10.5.31.10 (10.5.31.10) 300(328) bytes of data.
308 bytes from 10.5.31.10: icmp_seq=1 ttl=63 time=50.5 ms
308 bytes from 10.5.31.10: icmp_seq=2 ttl=63 time=48.5 ms
308 bytes from 10.5.31.10: icmp_seq=3 ttl=63 time=49.6 ms
--- 10.5.31.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2053ms
rtt min/avg/max/mdev = 48.509/49.600/50.598/0.855 ms

Performing the tests seems that we have more ore less 49-50 ms latency, which is not really great, but manageable, we can play a little with flow-control in Galera and have the sites communicating, with that I am NOT recommending to set a Galera cluster with 50ms latency, what I am saying is that if desperate something can be done, period.

But wait a minute let us do the other test, and this time let us use a tool that is design for checking the real condition of a network connection using TCP/IP.

For that I normally use NetPerf (there are other tools, but I like this one).

About NetPerf:

"Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency. The environments currently measureable by netperf include:

TCP and UDP via BSD Sockets for both IPv4 and IPv6

DLPI

Unix Domain Sockets

SCTP for both IPv4 and IPv6"

(http://www.netperf.org/netperf/)

 

NetPerf use a two-point connection instantiating a server demon on one machine, and using an application that simulate the client-server scenario.

I strongly suggest you to read the documentation, to better understand what it can do, how and the results.

 

Done? Ready let us go...

Given NetPerf allow me to define what is the dimension I need to send, and what I will receive, this time I can set the test properly, and ask what will be the real effort:

[root@Machine1~]# netperf -H 10.5.31.10 -t TCP_RR -v 2 -- -b 6 -r 156K,300k

MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.5.31.10 () port 0 AF_INET : first burst 6
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec
65536  87380  159744   300000  10.00      30.20
65536  87380
Alignment      Offset         RoundTrip  Trans    Throughput
Local  Remote  Local  Remote  Latency    Rate     10^6bits/s
Send   Recv    Send   Recv    usec/Tran  per sec  Outbound   Inbound
8         0          0        0       231800.259 30.198   38.592     72.476

 

The tool will use a TCP round trip, as such it will simulate a real TCP connection/stream/sliding-window/traffic-congestion-control.

The result is 231800.259 microseconds for latency; also other metrics are interesting but for consistency with ping let us stay on latency.

 

So we have 231800.259 microseconds that mean 231.8 millisecond, real latency.

 

This result is quite different from the PING results, reason for that is as said a totally different transport mechanism, which will imply a different cost.

Considering the number of 231.8 ms a site-to-site synchronous replication is out of the question.

I can then focus and spend my time to explore different and more appropriate solutions.

 

 

Conclusion(s)

The conclusion(s) is quite simple, whenever you need to validate your network connection PING is fine.

Whenever you need to measure the capacity of your network, then PING is just dangerous, and you should really avoid using it.

Good MySQL to all.

Ultimo aggiornamento Domenica 23 Marzo 2014 13:27
 
01
Nov
2013
Server ID misconfiguration can affect you more then you think. PDF Stampa E-mail
Scritto da Marco Tusa   

Sub title: When you do stupid things,  you get stupid things back, but sometime is good to discover odd behaviour.

 

The other day, I was having fun playing with the latest toy, mysql-cluster-gpl-7.3.2.

 

I was setting up a new cluster set, of 6 data nodes and 3 Mysql node, to do so I was using a simple bash file that helps me to generate what I need on my environment.

During that I had a very stupid idea, to have the script to recognize the machines IPs and PORT I have set, then use them as SERVER IDs.

 

I know I know is not a good way to define SERVER ID, anyhow I did it, so the result was that in my my.cnf my server_id looks like:

server-id = 1921680355510

 

Some of you already see what kind of stupid thing this is, but let me not anticipate it.

 

So as usual I start the mysql node first to generate a brand new database, to do so I have first disable the ndbcluster and all went fine.

 

Then I start my management nodes and my data nodes.

 

I was quite happy and I finally attach my MySQL node to the rest of the cluster.

But Nodes failed to starts, I was quite surprise so I open the log and is saw:

 

2013-10-28 10:20:58 2692 [Note] NDB[3]: NodeID: 77, all storage nodes connected
2013-10-28 10:20:58 2692 [ERROR] NDB: server id provided is too large to be represented in opt_server_id_bits or is reserved
mysqld: /usr/src/mysql-cluster-gpl-7.3.2/sql/ndb_component.cc:139: virtual int Ndb_component::deinit(): Assertion 'm_thread_state == TS_STOPPED' failed.
14:20:58 UTC - mysqld got signal 6 ;
 

 

 

What??? My fantastic server_id of 13!!!!! digit is too long?

 

Hey wait a minute?! how many digit??

 

At that point I just hit the keyboard with my head.. realizing what I have done.

headdesk

Server id is an unsigned integer, his max value is 4294967295 with 10 digit.

Once more my friend NDB was right and I was trying to do a stupid thing.

 

But when I realize it, I also wonder, why when activating ndbcluster I get the error and when I have MySQL running without it NO?

 

This kind of checking should be done at common level at MySQL start-up, so why I get it only with MySQL and NDB active?

I dig a little bit and I found the place where NDB is doing the check.

The if condition where the check is executed is quite simple and clear.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
....
sql::ha_ndbcluster_binlog.cc
 
int ndbcluster_binlog_start()
{
  DBUG_ENTER("ndbcluster_binlog_start");
 
  if (::server_id == 0)
  {
    sql_print_warning("NDB: server id set to zero - changes logged to "
                      "bin log with server id zero will be logged with "
                      "another server id by slave mysqlds");
  }
 
  /* 
     Check that ServerId is not using the reserved bit or bits reserved
     for application use
  */
  if ((::server_id & 0x1 << 31) ||                             // Reserved bit <--------------------- Check
      !ndbcluster_anyvalue_is_serverid_in_range(::server_id))  // server_id_bits
  {
    sql_print_error("NDB: server id provided is too large to be represented in "
                    "opt_server_id_bits or is reserved");
    DBUG_RETURN(-1);
  }

 

 

But again why the error is not reported in MySQL core?

 

Because MySQL is silently converting my server id to his maximum value, as such my stupid server id of 1921680355510 become 4294967295.

 

Oh my...!

An identifier is managed as a quantity!

 

I have done a very stupid thing setting the server_id to , but honestly I don't like this as well.

 

As we know MySQL silently adjust his behaviour in relation to the SQL_MODE, but the way it adjust the parameters should be something different.

I agree that variables like max_connections CAN be adjusted, but others like server_id, well no, I think is wrong given those are identifier(s) and not quantity.

 

Server ID is an ID, as such it must be validated and if not matching the validation an ERROR must be return and the server should not start at all.

 

Instead MySQL currently is just raising a WARNING and continue to run, not caring if there will be another server with the same id because this "adjustment".

 

131028 14:00:58 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-28 14:00:58 0 [Warning] option 'server_id': unsigned value 192168075510 adjusted to 4294967295 <----------------------------------------------

I was really "not happy" about this, so I went into the code and check where the server ID is evaluated and modified.
So this is the action calls:

 

mysqld::mysqld_main
	=>mysqld::init_common_variables
		=> my_getopt::init_variables(longopts, init_one_value);
			=> my_getopt::init_one_value
				=> my_getopt::getopt_ll_limit_value

 

So what exactly is happening here?

 

MySQL initialize reading the options in my_getopt and on the base of the data type it pass it to getopt_ll_limit_value.

There whatever numeric argument  is pass, if it exceed the max value is reset to his Max value. At this stage there is no logic to manage values like Server ID, that is an identifier and not quantity.

This is obviously something that can be managed better, and actually the fact that NDB prevent the server to go ahead is by far more correct, this because Server Id is it used internally to recognize the server.

 

I have done a quick check, and I have see that at least the following functions are using the server_id variable, in the new replication mechanism.

 

rpl_master.cc
int register_slave(THD* thd, uchar* packet, uint packet_length)
void unregister_slave(THD* thd, bool only_mine, bool need_lock_slave_list)
bool show_slave_hosts(THD* thd)
static int fake_rotate_event(NET* net, String* packet, char* log_file_name,ulonglong position, const char** errmsg,uint8 checksum_alg_arg)
static int send_heartbeat_event(NET* net, String* packet,const struct event_coordinates *coord,uint8 checksum_alg_arg)
bool com_binlog_dump(THD *thd, char *packet, uint packet_length)
bool com_binlog_dump_gtid(THD *thd, char *packet, uint packet_length)
void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos,const Gtid_set* slave_gtid_executed)

 

I have not investigate more, to see what will happen if there are two different servers having the same ID, I have not done it, because in any case it is conceptually wrong.

 

Just for fun I have create an horrible patch to manage the mismatch, adding a control on the parameter name.

 

Obviously that should be something different like an attribute of the my_option object that indicate if the option is a quantity an identifier or whatever else, then on that base decide what to do.

Any how also a stupid and horrible things like the check on the name is better then not having it at all:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ulonglong getopt_ull_limit_value(ulonglong num, const struct my_option *optp,
                                 my_bool *fix)
{
  my_bool adjusted= FALSE;
  ulonglong old= num;
  char buf1[255], buf2[255];
  const ulonglong max_of_type=
    max_of_int_range(optp->var_type & GET_TYPE_MASK);
 
	char string1[255], string2[255];
	int result;
	string1=optp->name;
	string2="server_id";
	scanf("%s",string1[10]);
	scanf("%s",string2[10]);
	int chlimit = 255;
	result = strncmp(string1,string2,chlimit);
 
 
 
  if ((ulonglong) num > (ulonglong) optp->max_value &&
      optp->max_value) /* if max value is not set -> no upper limit */
  {
    num= (ulonglong) optp->max_value;
    adjusted= TRUE;
    if(result == 0){
           my_getopt_error_reporter(ERROR_LEVEL,
           "option '%s': unsigned value %s CANNOT BE adjusted to %s please review your settings, and start MySQL again",
           optp->name, llstr(old, buf1), llstr(num, buf2));
           exit(1);
 
    }
  }
  if (num > max_of_type)
  {
    num= max_of_type;
    adjusted= TRUE;
  }
 

 

Running the server with the fix, I will finally had a consisten behaviour, and my favorite MySQL server doing the right thing:

 

131029 13:38:03 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended
131029 13:41:11 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-29 13:41:11 0 [ERROR] 'option server_id' unsigned value 192168075510 CANNOT BE adjusted to 4294967295 please review your settings, and start MySQL again
131029 13:41:12 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended

 

 

 

 

Conclusion

From a stupid action it could come out something interesting, at least for me.

I would like to classify this as bug, but given it is a conceptual thing may be is more a what? Feature request? Logic fix?

 

Finally my conclusion is that MySQL is a great product, but is growing fast and sometimes things are left behind, this is a small but possible dangerous one.

Ultimo aggiornamento Venerdì 01 Novembre 2013 11:11
 
15
Set
2013
Understand what happens in MySQL when using UTF String with Latin1 encoding... PDF Stampa E-mail
Scritto da Marco Tusa   

... And how to go out from it.

 

Why this article?

The scope of the article is to light up what happens behind the scene when an application push UTF data on a MySQL server using Latin1 encoding.

I will show how the characters are store, what exactly happen if you mix UTF8 and latin1, what is transformed to what and why.

Finally I will show the safer way to use to convert a table or simply a table field, without exporting the full dataset.

 

I will also show what is a good way to do it when using master-slave scenario, with minimal impact on the live dataset.

What I am not going to discuss here, and I assume you already have familiarity with is:

  • How to assign a character set or a collation to a table or fields;
  • How collation works in in sorting;
  • How to set the default character set in server and connecting clients.

If any of this point is not clear to you, please read first here (http://dev.mysql.com/doc/refman/5.6/en/charset.html)

For my exercise I will use two tables, two MySQL server instances, and Arabic text.

 

Please note that I have copy this text from Official UN nation page, as such I hope they do not contain any offensive text.

 

Step one check the status of our severs:

Master

(root@localhost) [(none)]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, for linux2.6 (x86_64) using readline 5.1
Connection id:		2
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfm/mysql.sock
Uptime:			2 min 34 sec
Threads: 3  Questions: 16  Slow queries: 0  Opens: 34  Flush tables: 1  Open tables: 26  Queries per second avg: 0.103

Slave

------------

(root@localhost) [(none)]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, for linux2.6 (x86_64) using readline 5.1
Connection id:		4
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfs/mysql.sock
Uptime:			10 min 43 sec
Threads: 2  Questions: 8  Slow queries: 0  Opens: 34  Flush tables: 1  Open tables: 26  Queries per second avg: 0.012


As you can see, I have define the MASTER to use latin1 as default also for the server, while for the SLAVE I choose to use UTF8.

In this exercise I am going to use only UTF8 for the examples never UTF8mb4.

 

First create the table.

CREATE TABLE `utf_test` (
  `ID` integer AUTO_INCREMENT PRIMARY KEY,
  `notutf` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL,
  `yesutf` varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `binutf` varbinary(250) DEFAULT NULL,
  `different` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 

 

Now let start insert data again and see what happens:

insert into utf_test values(null,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا',

' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same');

 

 

 Master
 -------------
(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', ' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same');
*************************** 1. row ***************************
       ID: 1
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same
1 row IN SET (0.00 sec)
 
 
Slave
---------------
(root@localhost) [test]>set names latin1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same
1 row IN SET (0.00 sec)
 

 

 

 

All good given the value are passed consistently

Now let add some text to the "different" fields and see IF we can still handle it correctly:

 

(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', ' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same? لصدأ القمح إلى كينيا');
Master
--------------
(root@localhost) [test]>select * FROM utf_test\G
       ID: 3
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same? لصدأ القمح إلى كينيا
2 rows IN SET (0.00 sec)
 
Slave
----------------------------
       ID: 3
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same? لصدأ القمح إلى كينيا
 

 

 

 

Ok once more all good.
Is this real UTF8 or as we have seen sometimes fake UTF thanks to Latin1 push?
let see inserting another row with UTF field "yesutf" and "different" containing the same apparent data.

insert into utf_test values(null,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'ح إلى كينيا', 'ح إلى كينيا', 'ح إلى كينيا');

 

 

Master
-----------
(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert same text everywhere', 'ح إلى كينيا', 'ح إلى كينيا', 'ح إلى كينيا');
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
Slave
------------------
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 

 

 


Checking the hex:

 

Master
-----
(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
 
Slave 
-------------
(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

As expected the hex code is different between the fields define as UTF8 and the one as Latin1,and I see them correctly because... ?

Look what happens if I change the Names to UTF8...:

 

(root@localhost) [test]>set names utf8;
 
(root@localhost) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

Now only the binutf (varbinary) is shown correctly.

So just to recap what happen if I use latin1....
Using latin1

I see all fine

 

(root@localhost) [test]>set names latin1;
(root@localhost) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

The hex values are not matching as:

 

(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

If I do a reverse conversion, also using UNHEX

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,unhex(hex(yesutf)) AS yesutf_hex,unhex(hex(binutf)) AS binutf_hex,unhex(hex(different)) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: ح إلى كينيا
   binutf_hex: ح إلى كينيا
different_hex: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

 

Again I see something different, is it confusing?

No.

 

It is quite clear that varbinary and varchar store the string using the same hexadecimal values, while the yesutf is different because it was expecting a UTF string.

 

Let us analyze JUST the first character coming from the arabic string: ح

This letter correspond to:

Unicode code point U+062D

character: ح

UTF8 characters: Ø

UTF8 hex: d8 ad

Meaning: ARABIC LETTER HAH

 

What happen is that I am inserting the correct codes but the UTF8 field read it as LATIN1 char per byte, as such it transform the value to Ø.

The value of that character in the UTF-8 encoding table and Unicode characters is:

Unicode code point U+00D8

character: Ø

UTF8 characters: Ø

UTF8 hex: c8 98

Meaning: LATIN CAPITAL LETTER O WITH STROKE

 

Comparing the values above with the HEX value we had before, you can see that they match perfectly

yesutf_hex: C398

binutf_hex: D8AD

different_hex: D8AD

 

So far I hope everything is clear.

 

The fields using LATIN1 and UT8 are correctly shown when using NAMES latin1 because the latin1 interpretation of the UTF8 hex values in the case of the fields with latin1 encoding.

 

While in the case of fields with UTF8 encoding, the server perform a double conversion of the codes, conversion that IS NOT executed when doing UNHEX, and in that case what it is shown is the REAL value of the stored codes.

 

Again the only SAFE one during these operation is the varbinary which store the code not bounded to any encoding.

 

So for instance if you have all your database using LATIN1 and you want to convert to UTF8 you cannot just do :

(root@localhost) [test]>alter table utf_corrupt modify different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;

 

Because your text in the tables will be converted and become unusable if using UTF8 clients, see below:

Encoding set as LATIN1:

Client characterset: latin1

Conn. characterset: latin1

I am going to do the conversion:

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
SET NAMES UTF8;
 
	Client characterset:	utf8
	Conn.  characterset:	utf8
 
 
(root@localhost) [test]>select * FROM utf_corrupt\G
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
 

 

The text is a mess again, this because the text is transformed:

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7 <---------
1 row IN SET (0.00 sec)
 

 

 

 

Starting from the beginning so the table is again:

 

(root@localhost) [test]>show CREATE TABLE utf_corrupt\G
*************************** 1. row ***************************
       TABLE: utf_corrupt
CREATE TABLE: CREATE TABLE `utf_corrupt` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `notutf` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL,
  `yesutf` varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `binutf` varbinary(250) DEFAULT NULL,
  `different` varbinary(250) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1
1 row IN SET (0.00 sec)
 

 

 

 

data is:

 

*************************** 3. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec) 

 

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

And encoding: Client characterset: latin1 Conn.  characterset: latin1

 

To do properly you have to do a 3 steps conversion, mainly varchar -> varbinary -> varchar.As follow:

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varbinary(250) DEFAULT NULL;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7 <-------------- 
1 row IN SET (0.00 sec)
 

 

 

Now all is fine and the string was not transformed.Now change the encoding with NAMES:

Set NAMES utf8;

Convert back to varchar and UTF ...

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0
 

 

 

And now yes you have the text stored correctly.

 

       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

All good! My data is correctly moved from a fields base on varchar LATIN1 to the new varchar UTF8.The most important aspect is to ensure to use the right encoding and to do not allow the server to convert you text to wrong hex values.

In case of significant migration if you have a master slave setup,it is possible to play with them, to minimize the impact of the migration, following these steps:

  1. all latin1 
  2. convert slave to varbinary
  3. insert data from master to slave keeping all as latin1
  4. swap the server
  5. convert ex-master to varbinary 
  6. when done change all application connection to utf8 master_host
  7. convert tables back to varchar utf8 on slave 
  8. swap master slave again
  9. convert slave table to UTF8

 

It seems cumbersome, but if you cannot afford to export the data change the table encoding and reload the data, and you cannot stop the write for long time, this is the safer way.The negative aspect of it is ... that varbinary use a different approach for sorting, it is based on the value of the binaries stored and you cannot use collation to modify it.

Let's go through the process:

 

Step 1) Check that all is latin1

 

Master
------------
(root@127.0.0.1:5510) [test]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, FOR linux2.6 (x86_64) USING readline 5.1
 
Connection id:		11
Current DATABASE:	test
Current user:		root@localhost
SSL:			NOT IN USE
Current pager:		stdout
USING OUTFILE:		''
USING delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfm/mysql.sock
Uptime:			2 days 3 hours 3 min 49 sec
 
Threads: 3  Questions: 198  Slow queries: 0  Opens: 89  FLUSH TABLES: 1  Open TABLES: 27  Queries per second avg: 0.001
--------------
 
Slave
---------------------
(root@127.0.0.1:5511) [test]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, FOR linux2.6 (x86_64) USING readline 5.1
 
Connection id:		9
Current DATABASE:	test
Current user:		root@localhost
SSL:			NOT IN USE
Current pager:		stdout
USING OUTFILE:		''
USING delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfs/mysql.sock
Uptime:			2 days 3 hours 3 min 45 sec
 
Threads: 2  Questions: 177  Slow queries: 0  Opens: 87  FLUSH TABLES: 1  Open TABLES: 27  Queries per second avg: 0.000
 

 

 

Insert some data on the master ... just to have a couples of rows.

 

(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 1', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 

 

 

Check the values:

 

Master
----------------------
(root@127.0.0.1:5510) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
2 rows IN SET (0.00 sec)
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 
  
Slave
-------------------------
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
2 rows IN SET (0.00 sec)
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

Step 2) Convert to varbinary the slave, keep same encoding (NAMES latin1)

 

 

(root@127.0.0.1:5511) [test]>alter TABLE utf_test MODIFY different varbinary(250);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7 <------------------
1 row IN SET (0.00 sec)
 

 

 

Code remain consistent.

Step 3) Production continue to work as usual, no impact.

Insert some other records on the master:

 

(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 3. row ***************************
       ID: 5
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 4. row ***************************
       ID: 7
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 2. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 3. row ***************************
           id: 5
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 4. row ***************************
           id: 7
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
4 rows IN SET (0.00 sec)
 
ON The Slave 
----------------------
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 3. row ***************************
       ID: 5
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 4. row ***************************
       ID: 7
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 2. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 3. row ***************************
           id: 5
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 4. row ***************************
           id: 7
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
4 rows IN SET (0.00 sec)
 

 

 

Step 4) swap the server

5511 master and 5510 slave (but stop the replication from 5511->5510 and stop also the previous replication from 5510 to 511), production will continue to work in write, order by will be affected for the moment.

 

On new Master 5511:

 

*************************** 6. row ***************************
       ID: 11 <----------------------------------------------
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
6 rows IN SET (0.00 sec)
 
ON new Slave 5510:
*************************** 4. row ***************************
       ID: 7 <--------------------------------
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 

 

Step 5) modify to varbinary the tables on the new Slave:

 

(root@127.0.0.1:5510) [test]>alter TABLE utf_test MODIFY different varbinary(250);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0
 

 

 

 

And now restart replication on 5510, so it can replicate from the master 5511.

 

(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
 
1 row IN SET (0.00 sec)(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 11
    noutf_hex: 53544550343A20696E73657274696E6720646174612066726F6D2045582D534C4156452061667465722073776170207265706C69636174696F6E20697320646F776E20
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

All good.

Step 6) change all Application connections to UTF8,

in our example I will change the NAMES on new Master 5511 and on slave 5510

SET NAMES UTF8;

Server characterset: utf8

Db     characterset: utf8

Client characterset: utf8

Conn.  characterset: utf8

 

Master
--------------------------
(root@127.0.0.1:5511) [test]>set names utf8;
Query OK, 0 rows affected (0.00 sec)
 
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
...
*************************** 6. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
6 rows IN SET (0.00 sec)
 

 

 

 

Note how the values in the yesutf was NOT correctly converted given the initial double translation.

 

Slave
---------------
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

 

Step 7) Convert table on SLAVE back to varchar but using utf8 now

 

(root@127.0.0.1:5510) [test]>alter TABLE utf_test MODIFY different varchar(250)  CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 15
   notutf: STEP7: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 15
    noutf_hex: 53544550343A20696E73657274696E6720646174612066726F6D2045582D534C4156452061667465722073776170207265706C69636174696F6E20697320646F776E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

Step 8) given all good put now the 5510 back as master,

given NOW the 5510 has the fields set as varchar and encoding/collation using UTF also the order by will work fine again.

At this point data will continue to be pass from master to slave.

 

ON Master
---------------------
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP8: inserting data from master 5510 after swap server again ', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 17
   notutf: STEP8: inserting DATA FROM master 5510 after swap server again 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 17
    noutf_hex: 53544550383A20696E73657274696E6720646174612066726F6D206D6173746572203535313020616674657220737761702073657276657220616761696E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 
ON Slave
-------------------------
*************************** 8. row ***************************
       ID: 17
   notutf: STEP8: inserting DATA FROM master 5510 after swap server again 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
8 rows IN SET (0.00 sec)
 
*************************** 8. row ***************************
           id: 17
    noutf_hex: 53544550383A20696E73657274696E6720646174612066726F6D206D6173746572203535313020616674657220737761702073657276657220616761696E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
 

 

 

 

Step 9) Finally convert all tables also on slave to varchar UTF8:

 

 

Master
---------------
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP9: final Slave conversions', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
 
Slave
---------------
(root@127.0.0.1:5511) [test]>alter TABLE utf_test MODIFY different varchar(250)  CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
*************************** 9. row ***************************
       ID: 19
   notutf: STEP9: final Slave conversions
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
9 rows IN SET (0.00 sec)
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
...
           id: 19
    noutf_hex: 53544550393A2066696E616C20536C61766520636F6E76657273696F6E73
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
9 rows IN SET (0.00 sec)
 

 

 

 

 

 

Done.
When in doubt about the character set you are using or the collation always check them.
good commands from application:
SELECT USER(), CHARSET(USER()), COLLATION(USER())\G(root@127.0.0.1:5511) [test]>SELECT USER(), CHARSET(USER()), COLLATION(USER())\G
*************************** 1. row ***************************
           USER(): root@127.0.0.1
  CHARSET(USER()): utf8
COLLATION(USER()): utf8_general_ci
1 row IN SET (0.00 sec)
 
AND
(root@127.0.0.1:5511) [test]>show VARIABLES LIKE '%server%'\G
*************************** 1. row ***************************
Variable_name: character_set_server
        Value: utf8
*************************** 2. row ***************************
Variable_name: collation_server
        Value: utf8_unicode_ci
*************************** 3. row ***************************
Variable_name: server_id
        Value: 5511
3 rows IN SET (0.00 sec)
 

 

 

Conclusion

You must put a lot of attention on how you connect to the server, in relation to the encoding you are going to use.
It is advisable to set it explicitly, to be really sure and to check that the server/application are getting it right.
I cannot find a good reasons for which you should have a mix of UTF8 and Latin1 in the same table as for the example I have use, but you never know.
What I mean is that it is normal to have Fields using Latin1 and be define as Latin1 for the encoding, but mixing the UTF8 using Latin1 encoding in the application connection, and try to write UTF will only generate a mess.
If you Have a table with 4 fields 2 UTF8 and 2 LATIN1, please use UTF8 encoding in the connection/server and be sure to push the right text, actually if you do so, the server will alert you if you try to do something stupid, like pushing UTF8 values in latin1 fields.
For that specific cases when you have no idea of what you are going to write, and you MUST use Latin1 for the application connection, well use varbinary and your life will be easier, of course the ordering will be done by binary value instead character, but you need to keep your data safe first.

Reference

 

Thanks 

Finally a huge thanks to Derek Downey, because he raise the issue (again) to me in relation to a customer migration.

Thanks for his dedication, professionality and will to learn and have fun together.

Ultimo aggiornamento Domenica 15 Settembre 2013 23:16
 
24
Ago
2013
The Galera installation guide for dummies. PDF Stampa E-mail
Scritto da Marco Tusa   

dummy

The Galera series

 

Overview and some history

Some years ago (2011), I was presenting in one article what at that time I have call “A dream on MySQL parallel replication” (http://www.tusacentral.net/joomla/index.php/mysql-blogs/96-a-dream-on-mysql-parallel-replication).

At that time I was dreaming about having a replication mechanism working on the commit operations, and was also trying to explain how useless is to have parallel replication by schema. I have to say that I got a lot of feedback about that article, directly and indirectly, most of it telling me that I was crazy and that what I was looking for cannot be done.

 

The only crazy guy that was following me in my wandering was Herik Ingo, who mentions to me Codership and Galera.

After few months a customer comes to me describing a scenario that in my imagination, would have be perfect for the cloud and an extension of MySQL call Galera.

 

At that time I was not really aware of all the different aspect of Galera, but I give it a try.  On November 2011, I had done the first POC and start to collect valuable information about how to play with Galera.

It was my intention to write an article presenting it, but I was really too busy and the article at the end was unfinished (http://www.tusacentral.net/joomla/index.php/mysql-blogs/119-galera-on-red-hat-is-123-part-1).

Instead I submit to MySQL connect 2012 the first version of a presentation about this POC, POC that was only the first of much longer list (http://www.slideshare.net/marcotusa/scaling-with-sync-replication-2012).

 

From September 2012 to April 2013, we have seen Galera and the work coming from Codership, becoming more and more known in the community, this also thanks to Percona and people like Jay Jansen, or support coming from FromDual and tools from Several Nines.

 

In April 2013 in Santa Clara I attend the Jay’s tutorial, also to see how others were dealing with what for me had become in the last 2 years, a must use.

If you have lost it, here the link and please review the material, Jay is a great professional, and his tutorial was awesome (http://www.percona.com/live/mysql-conference-2013/sessions/percona-xtradb-cluster-galera-practice-part-1).

At the same conference I presented the updates of the POC done with some additional numbers, statistics, and different binaries, in fact I moved from MySQL/Oracle InnoDB to XtraDB.

 

Keeping in mind that we still talk about 5.5 because Galera is still not 5.6 productions ready, the difference was significant. A lot of inconsistent behaviour in thread handling that I had suffers with standard InnoDB, were not present in XtraDB.

As for today after more then two years from that initial article, we have many Galera installations around, some of them used in very critical systems.

 

This is thanks to the continuous work of the Codership people, and to the others that had believe in them, people like Herik Ingo, Oli Sennauser (FromDual), Johan Anderson (Several Nines), Jay Jansen (Percona) and myself.

Galera is becoming more and more a solid and trustable product, it still has some issue here and there, but the Codership team is quite efficient in fixing them when tracked down.

 

ANYHOW I have to make a note here, and I am talking to all the company who are taking advantage out of Galera. Please remember that open source doesn’t mean free, people still has to pay bills, and Codership developers cannot provide and sustain Galera if you don’t support them.

To buy Codership support, it is a very efficient way to get a quality service from the software producer, and at the same time to guarantee the future development of a product that allow you to make business.

 

Architecture

Now let start to talk about it.

After many installation and different combination of blocks, the following is for me the solution that identify the most flexible, scalable and solid solution to implement a MySQL cluster using Galera.

Normally if a customer asks me advice the following is my answer:

galera

 

"The solution sees the HAProxy instance hosted directly on the application machine, the HAProxy then connect the application to the data layer rotating on them.

Given Codership had provide a check for HAProxy to recognize the status of the node in relation to his role, HAProxy is able to skip the MySQL nodes when those are in recover or Donor mode.

 

Our tests had shown that this approach is the most scalable in relation to Application/data connection, and at the same time is the one that reduce the impact to minimum, given each application tile is independent.

 

About MySQL this solution allow the data layer to scale both reads and writes. The scalability is anyhow limited by the amount of write and related bytes, which the data layer must share between nodes.

This is it; each MySQL node has a full copy of the dataset, as such each write or delete must be applying to all nodes.

 

The Galera replication layer is efficiently managing this effort, but efficiency is still dependant by the Data Storage and Network layer.

It is possible to have better understanding if this solution fulfils the requirements, calculating the real amount of traffic generated, and performing a projection.

Also a proof of concept is always suggested to validate the application functions against the MySQL/Galera cluster.

 

Pro

  • Solution scales write and read, it also allow to have no service interruption when using a proxy solution like HAProxy which redirect the traffic In less then 5 seconds.
  • MySQL nodes can be access at all times for read and write operation.
  • Application can access database directly if needed, or can be configure as a tile with the HAProxy for better scalability.
  • Specific check is provided to identify the internal status of MySQL/Galera node.
  • The solution use InnoDB as storage engine, as such it will behave in a well known way, in responding to Read operations.
  • This solution can scale out and scale in, quite easily, given that to scale out we just need to add a even number of MySQL servers, to an odd cluster.
  • To scale in is just a matter to remove the nodes, from HAProxy and then turn the MySQL off.

Cons

  • Data is not partitioned cross MySQL nodes, it is fully replicated on all the MySQL, as such a lot of space on disk will be used, (Data x Number of nodes) + (Binary logs size x number of nodes).
  • When a node is recovering it will require a donor node, this will reduce the capacity of the Cluster of the failed node + the Donor. In case of a 3 nodes MySQL cluster, only one node will remain active, given that the recommended minimum number nodes on busy environment is five nodes.
  • The solution has being tested on physical machines, Amazon EC2, and within different Zones, but it will require dedicated network cross-zone to prevent delay.

 

Minimum requirements

The minimum number of MySQL nodes for a cluster is 3, but if the application is critical to reduce possible issue when recovering a node, a cluster of 5 is strongly suggested.

Note that for quorum reasons the number of server must be odd.

Network between the nodes must be reliable and with low latency.

 

Best usage

Applications that require having write distribution and some scalability, with medium load of writes per second, and constant grow of the dataset size.

 

Uptime in nines

99. 995% that correspond to 26 minutes downtime per year.

 

Solution Availability

MySQL with Galera is a GA solution, so no cost in implementing it.

It is good practices to have a support contract with Codership as software provider, especially to have better reaction in case of bugs or feature requests."

 

Amen!

 

 

Implementation

Once you have identify your requirements, and dimension the machines (or cloud instances) that will host the MySQL Galera implementation, be sure to have one network channel to serve the MySQL-Application traffic, and a different one to serve the Galera replication layer, and a very efficient data storage layer.

To get help and advice on that you can contact me any time.

 

 

Basic requirements

Packages:

Xinetd

rsyslog

openssl-devel.x86_64

sysstat

iotop

netcat

htop

oprofile

Perl

Perl DBI

Perl DBD::mysql

Perl Time::HiRes

accepting network traffic from/to ports 3306 3307 3308 3311

 

HAPROXY for:

RH5: http://rpm.pbone.net/index.php3/stat/4/idpl/17082875/dir/redhat_el_5/com/haproxy-1.3.26-1.el5.x86_64.rpm.html

RH6: http://rpm.pbone.net/index.php3?stat=26&dist=74&size=440708&name=haproxy-1.4.8-1.el6.x86_64.rpm

 

MySQL Galera From Percona:

logo_percona_xtradbcluster_new

 

MariaDB Galera implementation

Mariadb-Galera-seal-shaded-browntext

 

Codership (Oracle) Galera implementation

logo_brown

 

Configuration

The initial configuration is normally base on 5 or 7 MySQL galera node listening each other on the internal network.

 

The HAProxy will check MySQL using the code delivered by Codership which is recognizing the Galera state.

This check will be installed ONLY on the MySQL/Galera node usign port 3311, so no configuration is due in the HAProxy node.

 

The MySQL frontend will connect to application using HAProxy, each Application node will have his own HAProxy installation.

Each Application will then locally connect to HAProxy (127.0.0.1:3307) HAProxy will redirect the connection to final target.

 

The connections are distributed using RR (Round Robin) and are non persistent.

 

That is once the TCP/IP connection is close the same Application Node will have no guarantee to access the same server.

PORTS:

  • HAProxy will listening on port 3307
  • HAProxy will show status using HTTP client on port 3308
  • MySQL will be listening on port 3306
  • MySQL check for HAProxy will operate on port 3311

MySQL:

adjust the standard parameter to fit the hosting environment.

redflag

NOTE!! for innodb_io_capacity Run FIO or IOZONE to discover the real IOPS available and adjust the parameter to that value, just guessing is not what you want in production.

 

Galera:

In line of principle the WSREP settings are tuned during the POC, but possible fine-tuning could be require when moving to production.

I advise you to spend some time to check and tune the following.

 

wsrep_max_ws_rows= 131072
wsrep_max_ws_size= 2147483648
wsrep_slave_threads= 48
wsrep_provider_options="gcache.size=10240M;
			evs.send_window=512;
			evs.user_send_window=512"
 

 

Parameters to keep an eye on are the send/receive queue and the GCACHE.SIZE.

About this there is something that must be clarify and why is very important to set it large enough.

 

First of all you should understand that when a node become a DONOR the server will not be accessible for write operation, as such it will be removed by HAProxy from the pool until the node has finished to feed the JOINER.

redflag

Galera has two ways of synchronizing a starting or recovering node.

IST and SST.

IST

When performing a synchronization with IST, Galera will send over to the resarting node ONLY the information present in the GCache, this can be see an INCREMENTAL update.

For instance, if you have a cluster of 5 nodes and for maintenance reasons you need to put them down on rotation, the node that will remain down will loose a set of operation during the maintenance time.

When you start back Galera read the last position the node has locally registered, and will request from the donor to start from there. If the DONOR still has that position in the GCache it will send to the restarting node the data from there.

This operation is normally much faster and has very limited impact also on the DONOR.

SST

This is a rebuild from scratch; normally it applies when the node is started the first time, and/or when it crashes. 

The operation can be very time consuming when talking of dataset of some consistencies.  

There are several methods that can be used for SST, from mysqldump to Xtrabackup. I have choose almost always to use the Xtrabackup, which is very well integrated in the Percona Galera distribution and guarantee performance and data safety.

But obviously when you are in the need to backup several hundreds of gigabytes, the process will take some time. Or if you have a very high level of inserts and say one or two hundreds of gigabytes, again the combination of time and datasize will be fatal.

The main point is that in these cases the time Galera will take the DONOR down in order to backup and trasmit the data to the JOINER, will be too long after for the DONOR node to recover from his Gcache once finish the operation, transforming  the DONOR in an additional JOINER.

 

I have being observing this on cascade effect on several cluster not configured correctly, in relation to their traffic and data size.

Clusters of 7 or more nodes, just going on hold because the nodes were not able to store enough information on gcache. It is true that when Galera is left with one node, given it is in DONOR mode it stops to write allowing the cluster to heal itself. But it is also tru that this could be a very long operation and in production is quite distruptive. 

 

So what to do? Easy just calculate before what is the worse scenario for you, then dimension the GCache to be at least 1/3 bigger then that is not more. Try to be safe, and stay on IST, this very important if you have a very high level of writes.

What I do is that Gcache must be large enough to guarantee modification statements for the double of the time needed to take a full backup.


IE.

With five node, and a set of binary log of 20GB per day.

If a full backup with XTRABACKUP takes 6Hrs the GCACHE size should be:

GCache = ((BS x (tb/hrsDat )) x Mn) x 2

GCache = ((20 x  (4/24)) x 5 )* 2 = ~33.3GB

BS - size of the binlog

Tb - Time for the backup in hours in a day 

Mn - MySQLGalera nodes 

hrsDat Hours in a day (24)

 

This should be sufficient to have a decent amount of time and space to be safe.

 

redflag

Finally rememeber that Galera with Xtrabackup REQUIRE perl with DBI DBD::mysql in place or synchronization will fail!

 

 

Main steps

1) configure the environment

- install xinetd (if not present)

- create user

- create directory layout

- download software for MySQL/Galera

- Install HAProxy

2) Deploy the first MySQL Galera node

- create the basic mysql database

- create basic grants

- test and cleanup the other accounts

3) Deploy all other nodes

4) Deploy HAProxy

- review configuration and change in relation to the network

- start HAProxy

5) Test connection from client to HAProxy

6) Perform test to validate the installation

7) Load data set 

 

Step by step

configure the environment

1) Install basic tools, if cusomer agreed:

yum -y install htop/sysstat/screen/xinetd/haproxy/iotop/nc

rpm -Uvh http://mirrors.kernel.org/fedora-epel/6/i386/epel-release-6-8.noarch.rpm

or

rpm -Uvh http://mirrors.kernel.org/fedora-epel/5/i386/epel-release-5-4.noarch.rpm

 

2) If you like run inside a screen so if you need you can detach and not interfere with the installtion process

 

screen -h 50000 -S MySQLINSTALL

 

 

 

3) Check if ANY MySQL daemon is present (RPM) and running, in case remove it

 

ps aux|grep mysql
rpm -qa |grep -i MySQL
rpm -e --nodeps Or -ev --allmatches
rpm -qa |grep MySQL
#Remove OLD
for bin in 'ls -D /usr/local/mysql/bin/'; do rm -f /usr/bin/$bin; done
for lib in 'ls -D /usr/local/mysql/lib/libmysqlclient*'; do rm -f /usr/lib64mysql/$lib; done
for bin in 'ls -D /usr/local/xtrabackup/bin/'; do rm -f /usr/bin/$bin; done

 

 

3) create user and the directory structure

 

userdel mysql
rm -fr /home/mysql/
groupadd mysql
useradd -m -g mysql -s /bin/bash -m -d /home/mysql mysql
passwd mysql
mkdir/opt/mysql_templates/
ln -s /usr/local/mysql -> /opt/mysql_templates/

 

IN case of binary use

5) Download the mysql version

 

wget http://www.percona.com/redir/downloads/Percona-XtraDB-Cluster/LATEST/binary/linux/x86_64/Percona-XtraDB-Cluster-5.5.30-23.7.4.405.Linux.x86_64.tar.gz
tar -xzf Percona-XtraDB-Cluster-5.5.30-23.7.4.405.Linux.x86_64.tar.gz
wget http://www.percona.com/redir/downloads/XtraBackup/LATEST/binary/Linux/x86_64/percona-xtrabackup-2.1.3-608.tar.gz
tar -xzf percona-xtrabackup-2.1.3-608.tar.gz
 

 

6) Create symbolic links to /usr/local

 

ln -s /opt/mysql_templates/Percona-XtraDB-Cluster-5.5.30-23.7.4.405.Linux.x86_64 /usr/local/mysql
ln -s /opt/mysql_templates/percona-xtrabackup-2.1.3 /usr/local/xtrabackup
 

 

7) Create symbolic links to /usr/bin

 

#Install new
for bin in 'ls -D /usr/local/mysql/bin/'; do ln -s /usr/local/mysql/bin/$bin /usr/bin/$bin; done
for bin in 'ls -D /usr/local/xtrabackup/bin/'; do ln -s /usr/local/xtrabackup/bin/$bin /usr/bin/$bin; done
 

 

 

#Set security
for bin in 'ls -D /usr/local/mysql/bin/'; do chmod +x /usr/bin/$bin; done
for bin in 'ls -D /usr/local/xtrabackup/bin/'; do chmod +x /usr/local/xtrabackup/bin/$bin /usr/bin/$bin; done
 

 

8) Move the service script from the original directory

 

mv /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld or /etc/init.d/mysql/mysqld

 

Edit the file filling the basedir and datadir variables, this is not always needed.

9) Edit my.cnf to match the path and set initial values 

10) Reset security

 

chown -R mysql:mysql /opt/mysql_templates
chmod +x /usr/local/mysql

 

Configure HAProxy check

Require:

xinetd

/usr/bin/clustercheck

the file clustercheck comes directly with the Percona distribution, you just to be sure that is in the path for the execution.

1) set haproxy check

prepared file (mysqlchk):

 

vi /etc/xinext.d/mysqlchk

# default: on
# description: mysqlchk
service mysqlchk
{
# this is a config for xinetd, place it in /etc/xinetd.d/
disable = no
flags = REUSE
socket_type = stream
port = 3311
wait = no
user = nobody
server = /usr/bin/clustercheck
log_on_failure += USERID
only_from = 0.0.0.0
# recommended to put the IPs that need
# to connect exclusively (security purposes)
per_source = UNLIMITED
}
 

 

2) check for free port

 

cat /etc/services |grep 3311

 

add service mysqlchk /etc/services

 

echo "mysqlchk 3311/tcp # mysqlchk" >> /etc/services

 

3) add to /etc/xinetd.d/ the configuration for mysqlchk services

restart xinetd

 

4) Check it

 

telnet 127.0.0.1 3311
	Trying 127.0.0.1...
	Connected to localhost.localdomain (127.0.0.1).
	Escape character is '^]'.
	HTTP/1.1 200 OK
	Content-Type: text/plain
	Connection: close
	Content-Length: 40
	Percona XtraDB Cluster Node is synced.
	Connection closed by foreign host.

 

 

 

Perl setup

You should do this the way you are more comfortable, anyhow be carefull on not doing double installation between yum/apt-get and cpan. These two way by default install library in different places, and will give you a nightmare in cleaning the mess and library conflict.

Be sure to have DBI and DBD installed where DBD::mysql should be version perl-DBD-MySQL-4.019 or newer.

 

Deploy the first MySQL Galera node

1) Create initial MySQL database FOR BINARY INSTALL ONLY:

 

su - mysql
cd /usr/local/mysql/
./scripts/mysql_install_db --defaults-file=/etc/my.cnf

 

 

Carefully check the output you should see OK twice, if not check the error log.

 

2) Start mysql

 

/etc/init.d/mysqld start --wsrep_cluster_address=gcomm://

 

 

Check the error log for possible errors

 

tail -fn 200 

 

 

 

3) connect for the first time and change security

 

mysql -uroot
set PASSWORD for root@'localhost'=PASSWORD('secret');
 

 

Grant access for xtrabackup

 

CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'mysqlsst';
GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'192.168.1.%';

mysql> FLUSH PRIVILEGES;
 

 

Grant access to haproxy checks

 

CREATE USER 'clustercheckuser'@'localhost' IDENTIFIED BY 'clustercheckpassword!';
GRANT PROCESS ON *.* TO 'clustercheckuser'@'127.0.0.1';
GRANT PROCESS ON *.* TO 'clustercheckuser'@'192.168.1.%';
Flush privileges;
 

 

Remove generic users:

 

DROP user ''@'localhost';
DROP user ''@'
Grant all on *.* to dbaadmin@'localhost' identified by 'secret'
 

 

 

4) collect statistics and informations:

 

SHOW GLOBAL STATUS/VARIABLES;
STATUS;
 

 

 

5) Stop server

6) restart server

 

/etc/init.d/mysqld start --wsrep_cluster_address=gcomm://

 

Deploy all other nodes

 

On each node:

1) modify the server identification in the my.cnf

 

wsrep_node_name=pchgny1 <------------
server-id=1 <----------------
 

 

 

2) start the node checking the mysql log

/etc/init.d/mysqld start

 

Deploy HAProxy

Connect on the appliction servers and perform the HAProxy installation. 

 

wget the HAProxy package related to the host OS
rpm -iUvh haproxy-1.4.22-4.el6_4.x86_64.rpm

 

 

1) Set the configuration file on each HAProxy node

redflagIn line of principle HAProxy is quite efficient to monitor and report the status of the nodes on his HTML interface when using the HTTP protocol, this is not true when using the TCP.
Given that, I was using the trick to use the HTTP protocol on a different port, just with the scope of reporting.

 

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local1.*                       /var/log/haproxy.log
    #
   log 127.0.0.1   local1 notice
 
    maxconn     40096
    user        haproxy
    group       haproxy
    daemon
 
    # turn on stats unix socket
    #stats socket /var/lib/haproxy/stats
 
 
 
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  tcplog
    option                  dontlognull
    option                  redispatch
    retries                 3
    maxconn 4096
    contimeout 160000
    clitimeout 240000
    srvtimeout 240000
 
 
 
 
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
 
listen stats 0.0.0.0:3308
    mode http
    stats enable
#    stats hide-version
    stats scope .
    stats realm Haproxy\ Statistics
    stats uri /haproxy?stats
    stats refresh 5s
#    stats auth xxxxx:xxxxx
    option contstats
    option httpchk
#        option mysql-check user test
 
    server node1 10.5.1.6:3306 check port 3311 inter 2000 rise 3 fall 3
    server node3 10.5.1.8:3306 check port 3311 inter 2000 rise 3 fall 3
    server node4 10.5.1.9:3306 check port 3311 inter 2000 rise 3 fall 3
    server node5 10.5.1.10:3306 check port 3311 inter 2000 rise 3 fall 3

 
 
listen galera_cluster 0.0.0.0:3307
#Logging
	log global
	option dontlognull
#mode
    mode tcp
# balancer
    balance roundrobin
 
#options
#	option abortonclose
#	option forceclose
	option clitcpka 
	option tcpka
    option httpchk
#        option mysql-check user test
 
    server node1 10.5.1.6:3306 check port 3311 inter 2000 rise 3 fall 3
    server node3 10.5.1.8:3306 check port 3311 inter 2000 rise 3 fall 3
    server node4 10.5.1.9:3306 check port 3311 inter 2000 rise 3 fall 3
    server node5 10.5.1.10:3306 check port 3311 inter 2000 rise 3 fall 3

 

 

 

2) add logging

to add logging using rsyslog

vim /etc/rsyslog.conf

 

Modify enabling, the following:

 

 

# Provides UDP syslog reception
$ModLoad imudp.so
$UDPServerRun 514
 

 

and add

 

#HAProxy log
local1.* /var/log/haproxy.log
 

 

Finally restart rsyslog

/etc/init.d/rsyslog restart

 

 

2) start HAProxy

/etc/init.d/haproxy start

 

3) check sever status using the web interface

Using a web browser check from:

http://:3308/haproxy?stats

You will see, or you SHOULD see, the HTML page reporting the status of your nodes.

 

Quick check for the connection

Connect to MySQL using mysql client and simple whatch to cycle the servers.

 

watch -n1 -d 'mysql -udbaadmin  -p  -h  -e "Show global status"| grep -E 

"wsrep_ready|wsrep_last_committed|wsrep_replicated|wsrep_received|wsrep_local_commits|wsrep_local_cert_failures|wsrep_local_bf_aborts|wsrep_local_send_queue|wsrep_local_recv_queue|wsrep_local_state_comment"'


 

To see how HAProxy redirect the connections from the APPLICATION NODE:

 

watch -n1 -d 'mysql -h127.0.0.1 -P3307 -u -p -e "Show global variables"| grep -i -E "server_id|wsrep_node_name"'

 

 

You will see the values changing at each request.

Try to put down one node and see what happen on the web interface of HAProxy and at the running command.

If all is fine it will be quite fun to see how easy and fast it manage the shutting down node.

 

POC steps

Finally this is just an example of what we do cover when doing the POC, it obviously vary from customer to customer.

 

POC Tests

Functional tests:

1) light data load/read on all nodes
- perform loading on all nodes

- perform data read on all nodes

 

2) Query/inserting while one node is failing
- perform selects an all nodes
- perform inserts on all nodes
Expectations:
No service interruption

No difference in the result sets between nodes.

 

3) Query/inserting while node is recovering
- perform selects on all data nodes
- perform inserts on all data nodes
- identify which node become the donor
Expectations:
minimal service degradation because Donor node will not be available
No service interruption
No difference in the results set
Recovery perform by IST if insight the boundary of the cache

Recovery perform by SST if bigger then cache

 

Performance/capacity tests (including difference in using ONE single node, Three to seven nodes, full capacity):

1) Execute increasing load from application node modification only (using simple light insert like real time application; more complex inserts like heavier records, batch insert, multiple linked tables)
- perform increasing inserts and delete using from 4 to 96 threads each application block (2 block for each application)
Expectations:
- validate the maximum load limit
- identify the lag in the Galera replication, if any, when under heavy stress
Mesure:
- IO
- memory usage
- Internal MySQL/Galera metrics

- threads contention

 

2) Execute increasing mix load read/write
- perform increasing selects/inserts and delete using from 4 to 96 threads each application block
Expectations:
- validate the maximum load limit
- identify the lag in the Galera replication, if any, when under heavy mix read and write stress
Mesure:
- IO
- memory usage
- Internal MySQL/Galera metrics

- threads contention

 

3) Perform node data crash and recovery while heavy load running

Same test as 1.2 and 1.3

 

4) Optimizations:
1) Implement partitioning on table and execute historical archiving
- Test impact optimization in accessing data
- Test impact in managing the partitions (optimizing/defragmenting/drop)

 

 

Additional article about galera

There are few forthcoming articles I am writing following the same serie:

Galera understanding what to monitor and how

Galera tests and numbers, what I have prove is it possible to achieve in numbers and graphs.


Reference

Jay (Percona)

http://www.mysqlperformanceblog.com/2012/11/20/understanding-multi-node-writing-conflict-metrics-in-percona-xtradb-cluster-and-galera/

http://www.mysqlperformanceblog.com/2013/05/02/galera-flow-control-in-percona-xtradb-cluster-for-mysql/

https://www.percona.com/live/mysql-conference-2013/users/jay-janssen

 

Oli (FromDual)

http://www.fromdual.com/mysql-and-galera-load-balancer

http://www.fromdual.com/galera-load-balancer-documentation

http://www.fromdual.com/unbreakable-mysql-cluster-with-galera-and-lvs

http://www.fromdual.com/switching-from-mysql-myisam-to-galera-cluster

http://www.fromdual.com/galera-cluster-nagios-plugin-en

 

Codership 

http://codership.com/

http://www.codership.com/wiki/doku.php?id=mysql_options_0.8

http://www.codership.com/wiki/doku.php?id=galera_parameters

http://www.codership.com/wiki/doku.php?id=galera_status_0.8

http://www.codership.com/wiki/doku.php?id=flow_control

http://www.codership.com/wiki/doku.php?id=galera_arbitrator

http://www.codership.com/wiki/doku.php?id=sst_mysql

http://www.codership.com/wiki/doku.php?id=ist

 

Several Nines

http://www.severalnines.com/clustercontrol-mysql-galera-tutorial

http://www.severalnines.com/blog/migrating-mysql-galera-cluster-new-data-center-without-downtime

http://www.severalnines.com/galera-configurator/

Ultimo aggiornamento Mercoledì 30 Ottobre 2013 17:33
 
28
Apr
2013
What I think about the Percona Live conference 2013. PDF Stampa E-mail
Scritto da Marco Tusa   

 

mysql-2013-6934_Small

No need to say that as many others I have enjoy this conference a lot.

For me was also a personal success because I was finally able to have my company sponsoring the event and bring new members of my team to the conference as speakers, obviously all their merit, but I was happy as responsible of the MySQL cluster in Pythian knowing how much have cost to them to be there.

 

We also had a lot of people at the community dinner, last head count I did was over 120 people. In this regard I have to clarify some misunderstanding and confusion we had there.

Pedro’s requests to us to have the last head count by Tuesday morning, given that I have closed the count from the comments in the web-site and from the emails around noon.  The number I pass to Pedro’s base on that was of 70-80 people, but we had additional people registering after that time and also last moment show, given that we have to manage additional 40 people that has to be located in different areas.  Quite obvious that I did not give clear enough instructions, I personally apology for that, I (we) will do better next year. I hope you had good time and good food also if a little bit detach from the other tables.

 

What about the conference?

My feeling is that this event consolidates what is the “core” of the MySQL community.

We have seen many companies providing the same service, one close to the other sitting and talking with positive spirit and attitude.  I have been personally chatting all the time with people from SkySQL, Percona and others, all of us with open and friendly attitude.

I have seen Oracle people participate to the conference (hurray!!!), as IOUG committee member I know very well the number of time we have said to Oracle to be there, and they were! This was good, period.

 

MySQL where are you going?

In relation of what is happening to MySQL, and where is that leading us, I have confirm my idea that nowadays we are not talking anymore of LAMP or Full stack, when we talk about MySQL.

What customers, companies and users are expecting is a more complex and articulate environment. Interaction between different blocks is now not an optional but a fact.

When we approach an existing environment or when we have to build a new one, we now think in term of hundreds of application servers, terabytes of data to store or to process, many different client platform to support and impressive amount of data to analyze for reporting.

 

Feel free to fill the boxes with the name of the product that you like most, but for sure you will not limit yourself to MySQL or LAMP.

Already today I have customers using MySQL, Oracle Database, MongoDB, Hadoop and more, all in one single connected environment.

 

Thinking in term of MySQL only when we think to product, service, monitoring or design is too limitative.

 

For instance a tools that monitor MySQL but do not catch his interaction with other element like Hadoop, is going to provide only part of the picture. That partial picture referring only to MySQL metrics will be close to be useless because it will not be able to provide all the require information needed to perform a valid analysis and eventually projection.

In other terms will cover the basic of the single block and will not help us to get the big picture. That is it, still useful to keep the block in decent state but will let you blind for what is going on in the whole context.

This is for many aspects true also for the products, each block, MySQL included, must become more and more flexible to exchange data with the others. This can be achieve developing specific interfaces, or by defining a common protocol/format of communication that is shared between the different blocks.

 

In MySQL universe (or MariaDB), this also means to keep consistency and to remain as open as possible to facilitate the creation of additional plug-in/engines.

But what really worries me, given also is “my” field, is “service”. Those environments require support, design and so on. We know very well how complex a MySQL environment could be, what about it when we start to have many other actors involve? What really scares me is the level of knowledge is required to cover all of them or just a segment.

I am convince that we will have to work around it, because users/customers/companies will ask us to provide the support for all the element in their architecture, actually it is already happening, and the real risk is to have or become generalist instead of high profile experts.

 

If you don’t do it, if you do not differentiate, the risk is to be isolate by the market, and yes be very smart on that specific area, but not able to understand the big picture, ergo useless.

On the other hand, trying to do too much could drives you (as company) to disperse the resource and have an average level, that is good but not Excellent with capital “E”.

The possible solution is to have a huge monster with hundred and hundred of people, and division per technology … and … well I have seen already several of them starting and die. Starting good with very high service quality and then become big, heavy and so slow that customers moves out.

 

No that is not the solution, solution reside in being able to balance correctly what you can do with your resources, and reasonable growth, and what not.

I am working in a multi technology company, and I know very well of what I am talking about when I say that balance is the key.  The future will need to see two things: one is the companies improving their capacity to cover more then just MySQL, the other is open the space to collaboration, company covering different technologies must start to interact more, and offer better service and results to the users/customers in a cooperative way.

That will allow the single company to remain focus on few things and keep a high level of expertise on the chosen areas. Working in a cooperative way is the key.

All this needs to happen, and require coordination.

 

Flexibility and coordination are the keywords for the future. The MySQL community have shown already how much energy it has, how strong it could be in difficult moment and how much we really care about our customer/users.


What I see for the future is us working all together gathering all the actors involve, and give life to a new ecosystem which will help to facilitate the evolution of the next generation of data and applications.

 

What about the Speeches

In term of talks I have to say I was expecting a little bit more, not from the speakers only (me included), but also from the product companies.

As said I think is time to move to the next step and I was expecting more talks about interactions between technologies.

 

I am not saying that we should not cover the base; we must do it, but having more talks on how MySQL and MongoDB coexist, or how we could help Terabytes of data to be process between A and B; well that would have be nice.

Not only as what we have now, but also what we are planning for the future, including new features and ideas for the development.

 

In this regards the only relevant speeches I have seen were, the ones done by my colleague Danil Zburivsky on Hadoop/MySQL , and the other about Json by Anders Karlsson during the MariaDB/SkySQL event.

Thanks guys you see the future, and shame on Marco that was thinking about it and could have done it but did not … may be for the next conference.

 

Said that, the level of the speeches was good, I have being talking with the people attending, most of them satisfy, but let us wait and see what the evaluations will reveal.

What I can say is that I really enjoy the tutorial on Xtradb and Galera done by Jay Janssen, that helps me to feel less alone in the Galera implementation adventure; and I regret to have miss the “InnoDB: A journey to the core” by Davi Arnaut  and Jeremy Cole. But I was presenting at the same slot, and would have not be nice for me to say to the people there, ok let us move all to the next room.

 

What about the Expo

WoooW, first time as sponsor and first time with a boot. A lot of talk, a lot of possible new friends and a better understanding on what we need to do to be more effective next time. T-shirts first!!! Lesson learned we bring to few … we could have cover all the bay area with LOVE YOUR DATA, we miss the target this year, we will not do the same the next one.

 

I think that this year we had a well-balanced expo, with less show, but more focus, I must also mention the presence outside the expo area of  Julian Cash (http://jceventphoto.com/index.html), which takes a lot of cool shot of most of us.

I know Julian was there also during other conferences but I never met him before. I did this year and was a great experience, I hate to takes photos but with Julian I was having so much fun that at the end I love it.

 

What about the Lightening Talks?

Another well establish event at MySQL conference, and every year we have fun. This year I have enjoy the Shlomi one, and absolutely AWSOME was the performance from the Tokutek team.

 

About the Lightening talks and just to confirm what Dave Apgar was saying in his really good presentation, shit happens and you never know when. I had place my video camera and register the WHOLE event, and guess what… my new SD card just failed, and nothing I mean NOTHING was there after. Next time I will come with TWO video cameras and will setup redundancy!

 

Announcements

Finally during the conference we had two very significant announcements.

The first one was the expected merge of MariaDB and SkySQL, nothing new, but it is good to see that SkySQL is defining his identity with more determination, but not only this merge is very important because all MariaDB users now have a clear referring point, that will hep them and the community to better adopt and improve MariaDB. Way to go guys well done!

 

The second one is about Tokutek (http://www.tokutek.com/), finally open source. I have tested it the first time 3 years ago, and was a very interesting technology, but hard to have implemented because customers where reluctant to go for non-open source code.

Just a note, I wrote open source, not free. Open source doesn’t mean free, and here the concept was very clear, customers were willing to “eventually” pay, but not for close code.

Tokutek move is not only smart because will allow the company to have substantial help from the community in identify issues, improve utilization and identify new trends, but it is smart also because remove the last philosophical barrier in the software adoption.

 

From the technical point of view, the presentations have shown a significant improvement in respect to the previous years, and I was very impress from the presentation done by Gerry Narvaja during the SkySQL/MariaDB event.

One thing is sure, I have customers that could take huge benefit from Tokutek and I will give a try right away starting next week.

 

Winner and looser

No doubt from my side, I was not even mentioning before because for me is a given.

On the 12 of March 2011 I have written this article http://www.tusacentral.net/joomla/index.php/mysql-blogs/96-a-dream-on-mysql-parallel-replication, which was my dream about replication. At the time of writing I was not aware/testing ANY software able to do what I was asking.

On the 23 November 2011 I wrote another article http://www.tusacentral.net/joomla/index.php/mysql-blogs/119-galera-on-red-hat-is-123-part-1, and that was my first approach with Galera.

On the 29 September 2012 I have presented the first results of a POC done on customer environment http://www.slideshare.net/marcotusa/scaling-with-sync-replication-2012.

Next week, I must implement another MySQL Cluster, base on Galera replication.

 

The winner for me is the Galera solution (http://www.codership.com/), whatever version you may like; from my side I have found that the Percona version is the more stable, and using the Severalnines tool to manage the cluster (http://www.severalnines.com/clustercontrol) is also helpful.

 

 

Who is the looser then?

All the ones that have believe MySQL was over on the 2010 (Oracle take over).

We have MySQL from Oracle, we have MariaDB from Monty, we have companies developing their storage engines and tools, we have a more complex ecosystem that is growing day by day.

No MySQL is not over at all.

One note only, whoever leads the development from any side, reminds that you MUST allow the community to use and develop code on top of yours, modify interfaces without documenting, or not be fully explicit on what to do, how to do, and which direction, well it is not fair.

Percona Live, MySQL Conference in Santa Clara was a great conference, done by great people. We can do better all of us, always, but what makes me feel good is that I know we will do better next year.

 

Last note…

Did not you miss some one? Did you as I did, feel as that something was not right? Was not a face missed?

I mean … yes! He! Baron Schwartz!!  Hey man we miss you!!! Or at list I miss you and your block notes during the presentations, come back ASAP.

 

Happy MySQL to everyone.

 

References

http://www.codership.com/

http://www.severalnines.com/

http://jceventphoto.com/index.html

https://www.percona.com/live/mysql-conference-2013/

http://www.tokutek.com/

http://www.skysql.com/

http://www.pythian.com/

https://vividcortex.com

And more ...

 

{joscommentenable}

Ultimo aggiornamento Domenica 18 Agosto 2013 17:04
 
«InizioPrec.12345678Succ.Fine»

JPAGE_CURRENT_OF_TOTAL
 

Who's Online

 35 visitatori online