google
 
 
Home
Server ID misconfiguration can affect you more then you think.
Scritto da Marco Tusa   
Venerdì 01 Novembre 2013 10:23

Sub title: When you do stupid things,  you get stupid things back, but sometime is good to discover odd behaviour.

 

The other day, I was having fun playing with the latest toy, mysql-cluster-gpl-7.3.2.

 

I was setting up a new cluster set, of 6 data nodes and 3 Mysql node, to do so I was using a simple bash file that helps me to generate what I need on my environment.

During that I had a very stupid idea, to have the script to recognize the machines IPs and PORT I have set, then use them as SERVER IDs.

 

I know I know is not a good way to define SERVER ID, anyhow I did it, so the result was that in my my.cnf my server_id looks like:

server-id = 1921680355510

 

Some of you already see what kind of stupid thing this is, but let me not anticipate it.

 

So as usual I start the mysql node first to generate a brand new database, to do so I have first disable the ndbcluster and all went fine.

 

Then I start my management nodes and my data nodes.

 

I was quite happy and I finally attach my MySQL node to the rest of the cluster.

But Nodes failed to starts, I was quite surprise so I open the log and is saw:

 

2013-10-28 10:20:58 2692 [Note] NDB[3]: NodeID: 77, all storage nodes connected
2013-10-28 10:20:58 2692 [ERROR] NDB: server id provided is too large to be represented in opt_server_id_bits or is reserved
mysqld: /usr/src/mysql-cluster-gpl-7.3.2/sql/ndb_component.cc:139: virtual int Ndb_component::deinit(): Assertion 'm_thread_state == TS_STOPPED' failed.
14:20:58 UTC - mysqld got signal 6 ;
 

 

 

What??? My fantastic server_id of 13!!!!! digit is too long?

 

Hey wait a minute?! how many digit??

 

At that point I just hit the keyboard with my head.. realizing what I have done.

headdesk

Server id is an unsigned integer, his max value is 4294967295 with 10 digit.

Once more my friend NDB was right and I was trying to do a stupid thing.

 

But when I realize it, I also wonder, why when activating ndbcluster I get the error and when I have MySQL running without it NO?

 

This kind of checking should be done at common level at MySQL start-up, so why I get it only with MySQL and NDB active?

I dig a little bit and I found the place where NDB is doing the check.

The if condition where the check is executed is quite simple and clear.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
....
sql::ha_ndbcluster_binlog.cc
 
int ndbcluster_binlog_start()
{
  DBUG_ENTER("ndbcluster_binlog_start");
 
  if (::server_id == 0)
  {
    sql_print_warning("NDB: server id set to zero - changes logged to "
                      "bin log with server id zero will be logged with "
                      "another server id by slave mysqlds");
  }
 
  /* 
     Check that ServerId is not using the reserved bit or bits reserved
     for application use
  */
  if ((::server_id & 0x1 << 31) ||                             // Reserved bit <--------------------- Check
      !ndbcluster_anyvalue_is_serverid_in_range(::server_id))  // server_id_bits
  {
    sql_print_error("NDB: server id provided is too large to be represented in "
                    "opt_server_id_bits or is reserved");
    DBUG_RETURN(-1);
  }

 

 

But again why the error is not reported in MySQL core?

 

Because MySQL is silently converting my server id to his maximum value, as such my stupid server id of 1921680355510 become 4294967295.

 

Oh my...!

An identifier is managed as a quantity!

 

I have done a very stupid thing setting the server_id to , but honestly I don't like this as well.

 

As we know MySQL silently adjust his behaviour in relation to the SQL_MODE, but the way it adjust the parameters should be something different.

I agree that variables like max_connections CAN be adjusted, but others like server_id, well no, I think is wrong given those are identifier(s) and not quantity.

 

Server ID is an ID, as such it must be validated and if not matching the validation an ERROR must be return and the server should not start at all.

 

Instead MySQL currently is just raising a WARNING and continue to run, not caring if there will be another server with the same id because this "adjustment".

 

131028 14:00:58 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-28 14:00:58 0 [Warning] option 'server_id': unsigned value 192168075510 adjusted to 4294967295 <----------------------------------------------

I was really "not happy" about this, so I went into the code and check where the server ID is evaluated and modified.
So this is the action calls:

 

mysqld::mysqld_main
	=>mysqld::init_common_variables
		=> my_getopt::init_variables(longopts, init_one_value);
			=> my_getopt::init_one_value
				=> my_getopt::getopt_ll_limit_value

 

So what exactly is happening here?

 

MySQL initialize reading the options in my_getopt and on the base of the data type it pass it to getopt_ll_limit_value.

There whatever numeric argument  is pass, if it exceed the max value is reset to his Max value. At this stage there is no logic to manage values like Server ID, that is an identifier and not quantity.

This is obviously something that can be managed better, and actually the fact that NDB prevent the server to go ahead is by far more correct, this because Server Id is it used internally to recognize the server.

 

I have done a quick check, and I have see that at least the following functions are using the server_id variable, in the new replication mechanism.

 

rpl_master.cc
int register_slave(THD* thd, uchar* packet, uint packet_length)
void unregister_slave(THD* thd, bool only_mine, bool need_lock_slave_list)
bool show_slave_hosts(THD* thd)
static int fake_rotate_event(NET* net, String* packet, char* log_file_name,ulonglong position, const char** errmsg,uint8 checksum_alg_arg)
static int send_heartbeat_event(NET* net, String* packet,const struct event_coordinates *coord,uint8 checksum_alg_arg)
bool com_binlog_dump(THD *thd, char *packet, uint packet_length)
bool com_binlog_dump_gtid(THD *thd, char *packet, uint packet_length)
void mysql_binlog_send(THD* thd, char* log_ident, my_off_t pos,const Gtid_set* slave_gtid_executed)

 

I have not investigate more, to see what will happen if there are two different servers having the same ID, I have not done it, because in any case it is conceptually wrong.

 

Just for fun I have create an horrible patch to manage the mismatch, adding a control on the parameter name.

 

Obviously that should be something different like an attribute of the my_option object that indicate if the option is a quantity an identifier or whatever else, then on that base decide what to do.

Any how also a stupid and horrible things like the check on the name is better then not having it at all:

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ulonglong getopt_ull_limit_value(ulonglong num, const struct my_option *optp,
                                 my_bool *fix)
{
  my_bool adjusted= FALSE;
  ulonglong old= num;
  char buf1[255], buf2[255];
  const ulonglong max_of_type=
    max_of_int_range(optp->var_type & GET_TYPE_MASK);
 
	char string1[255], string2[255];
	int result;
	string1=optp->name;
	string2="server_id";
	scanf("%s",string1[10]);
	scanf("%s",string2[10]);
	int chlimit = 255;
	result = strncmp(string1,string2,chlimit);
 
 
 
  if ((ulonglong) num > (ulonglong) optp->max_value &&
      optp->max_value) /* if max value is not set -> no upper limit */
  {
    num= (ulonglong) optp->max_value;
    adjusted= TRUE;
    if(result == 0){
           my_getopt_error_reporter(ERROR_LEVEL,
           "option '%s': unsigned value %s CANNOT BE adjusted to %s please review your settings, and start MySQL again",
           optp->name, llstr(old, buf1), llstr(num, buf2));
           exit(1);
 
    }
  }
  if (num > max_of_type)
  {
    num= max_of_type;
    adjusted= TRUE;
  }
 

 

Running the server with the fix, I will finally had a consisten behaviour, and my favorite MySQL server doing the right thing:

 

131029 13:38:03 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended
131029 13:41:11 mysqld_safe Starting mysqld daemon with databases from /home/mysql/instances/csite3/data
2013-10-29 13:41:11 0 [ERROR] 'option server_id' unsigned value 192168075510 CANNOT BE adjusted to 4294967295 please review your settings, and start MySQL again
131029 13:41:12 mysqld_safe mysqld from pid file /home/mysql/instances/csite3/mysql.pid ended

 

 

 

 

Conclusion

From a stupid action it could come out something interesting, at least for me.

I would like to classify this as bug, but given it is a conceptual thing may be is more a what? Feature request? Logic fix?

 

Finally my conclusion is that MySQL is a great product, but is growing fast and sometimes things are left behind, this is a small but possible dangerous one.

Ultimo aggiornamento Venerdì 01 Novembre 2013 11:11
 
Understand what happens in MySQL when using UTF String with Latin1 encoding...
Scritto da Marco Tusa   
Domenica 15 Settembre 2013 21:40

... And how to go out from it.

 

Why this article?

The scope of the article is to light up what happens behind the scene when an application push UTF data on a MySQL server using Latin1 encoding.

I will show how the characters are store, what exactly happen if you mix UTF8 and latin1, what is transformed to what and why.

Finally I will show the safer way to use to convert a table or simply a table field, without exporting the full dataset.

 

I will also show what is a good way to do it when using master-slave scenario, with minimal impact on the live dataset.

What I am not going to discuss here, and I assume you already have familiarity with is:

  • How to assign a character set or a collation to a table or fields;
  • How collation works in in sorting;
  • How to set the default character set in server and connecting clients.

If any of this point is not clear to you, please read first here (http://dev.mysql.com/doc/refman/5.6/en/charset.html)

For my exercise I will use two tables, two MySQL server instances, and Arabic text.

 

Please note that I have copy this text from Official UN nation page, as such I hope they do not contain any offensive text.

 

Step one check the status of our severs:

Master

(root@localhost) [(none)]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, for linux2.6 (x86_64) using readline 5.1
Connection id:		2
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfm/mysql.sock
Uptime:			2 min 34 sec
Threads: 3  Questions: 16  Slow queries: 0  Opens: 34  Flush tables: 1  Open tables: 26  Queries per second avg: 0.103

Slave

------------

(root@localhost) [(none)]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, for linux2.6 (x86_64) using readline 5.1
Connection id:		4
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfs/mysql.sock
Uptime:			10 min 43 sec
Threads: 2  Questions: 8  Slow queries: 0  Opens: 34  Flush tables: 1  Open tables: 26  Queries per second avg: 0.012


As you can see, I have define the MASTER to use latin1 as default also for the server, while for the SLAVE I choose to use UTF8.

In this exercise I am going to use only UTF8 for the examples never UTF8mb4.

 

First create the table.

CREATE TABLE `utf_test` (
  `ID` integer AUTO_INCREMENT PRIMARY KEY,
  `notutf` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL,
  `yesutf` varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `binutf` varbinary(250) DEFAULT NULL,
  `different` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 

 

Now let start insert data again and see what happens:

insert into utf_test values(null,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا',

' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same');

 

 

 Master
 -------------
(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', ' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same');
*************************** 1. row ***************************
       ID: 1
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same
1 row IN SET (0.00 sec)
 
 
Slave
---------------
(root@localhost) [test]>set names latin1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same
1 row IN SET (0.00 sec)
 

 

 

 

All good given the value are passed consistently

Now let add some text to the "different" fields and see IF we can still handle it correctly:

 

(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', ' this is a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا', 'Am I the same? لصدأ القمح إلى كينيا');
Master
--------------
(root@localhost) [test]>select * FROM utf_test\G
       ID: 3
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same? لصدأ القمح إلى كينيا
2 rows IN SET (0.00 sec)
 
Slave
----------------------------
       ID: 3
   notutf: simple INSERT NOT utf IN non utf FIELDS, but USING Master ALL Latin1 AND Slave WITH server DEFAULT UTF8
   yesutf: إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
   binutf:  this IS a mix of it 3 إستقدام أصناف محصولية مقاومة لصدأ القمح إلى كينيا
different: Am I the same? لصدأ القمح إلى كينيا
 

 

 

 

Ok once more all good.
Is this real UTF8 or as we have seen sometimes fake UTF thanks to Latin1 push?
let see inserting another row with UTF field "yesutf" and "different" containing the same apparent data.

insert into utf_test values(null,'simple insert not utf in non utf fields, but using Master all Latin1 and Slave with server default UTF8', 'ح إلى كينيا', 'ح إلى كينيا', 'ح إلى كينيا');

 

 

Master
-----------
(root@localhost) [test]>insert INTO utf_test VALUES(NULL,'simple insert same text everywhere', 'ح إلى كينيا', 'ح إلى كينيا', 'ح إلى كينيا');
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
Slave
------------------
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 

 

 


Checking the hex:

 

Master
-----
(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
 
Slave 
-------------
(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

As expected the hex code is different between the fields define as UTF8 and the one as Latin1,and I see them correctly because... ?

Look what happens if I change the Names to UTF8...:

 

(root@localhost) [test]>set names utf8;
 
(root@localhost) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

Now only the binutf (varbinary) is shown correctly.

So just to recap what happen if I use latin1....
Using latin1

I see all fine

 

(root@localhost) [test]>set names latin1;
(root@localhost) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

The hex values are not matching as:

 

(root@localhost) [test]>select hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

If I do a reverse conversion, also using UNHEX

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,unhex(hex(yesutf)) AS yesutf_hex,unhex(hex(binutf)) AS binutf_hex,unhex(hex(different)) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: ح إلى كينيا
   binutf_hex: ح إلى كينيا
different_hex: ح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

 

Again I see something different, is it confusing?

No.

 

It is quite clear that varbinary and varchar store the string using the same hexadecimal values, while the yesutf is different because it was expecting a UTF string.

 

Let us analyze JUST the first character coming from the arabic string: ح

This letter correspond to:

Unicode code point U+062D

character: ح

UTF8 characters: Ø

UTF8 hex: d8 ad

Meaning: ARABIC LETTER HAH

 

What happen is that I am inserting the correct codes but the UTF8 field read it as LATIN1 char per byte, as such it transform the value to Ø.

The value of that character in the UTF-8 encoding table and Unicode characters is:

Unicode code point U+00D8

character: Ø

UTF8 characters: Ø

UTF8 hex: c8 98

Meaning: LATIN CAPITAL LETTER O WITH STROKE

 

Comparing the values above with the HEX value we had before, you can see that they match perfectly

yesutf_hex: C398

binutf_hex: D8AD

different_hex: D8AD

 

So far I hope everything is clear.

 

The fields using LATIN1 and UT8 are correctly shown when using NAMES latin1 because the latin1 interpretation of the UTF8 hex values in the case of the fields with latin1 encoding.

 

While in the case of fields with UTF8 encoding, the server perform a double conversion of the codes, conversion that IS NOT executed when doing UNHEX, and in that case what it is shown is the REAL value of the stored codes.

 

Again the only SAFE one during these operation is the varbinary which store the code not bounded to any encoding.

 

So for instance if you have all your database using LATIN1 and you want to convert to UTF8 you cannot just do :

(root@localhost) [test]>alter table utf_corrupt modify different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;

 

Because your text in the tables will be converted and become unusable if using UTF8 clients, see below:

Encoding set as LATIN1:

Client characterset: latin1

Conn. characterset: latin1

I am going to do the conversion:

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
SET NAMES UTF8;
 
	Client characterset:	utf8
	Conn.  characterset:	utf8
 
 
(root@localhost) [test]>select * FROM utf_corrupt\G
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
 

 

The text is a mess again, this because the text is transformed:

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7 <---------
1 row IN SET (0.00 sec)
 

 

 

 

Starting from the beginning so the table is again:

 

(root@localhost) [test]>show CREATE TABLE utf_corrupt\G
*************************** 1. row ***************************
       TABLE: utf_corrupt
CREATE TABLE: CREATE TABLE `utf_corrupt` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `notutf` varchar(250) CHARACTER SET latin1 COLLATE latin1_general_cs DEFAULT NULL,
  `yesutf` varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
  `binutf` varbinary(250) DEFAULT NULL,
  `different` varbinary(250) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1
1 row IN SET (0.00 sec)
 

 

 

 

data is:

 

*************************** 3. row ***************************
       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec) 

 

 

(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

And encoding: Client characterset: latin1 Conn.  characterset: latin1

 

To do properly you have to do a 3 steps conversion, mainly varchar -> varbinary -> varchar.As follow:

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varbinary(250) DEFAULT NULL;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7 <-------------- 
1 row IN SET (0.00 sec)
 

 

 

Now all is fine and the string was not transformed.Now change the encoding with NAMES:

Set NAMES utf8;

Convert back to varchar and UTF ...

 

(root@localhost) [test]>alter TABLE utf_corrupt MODIFY different varchar(250) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
Query OK, 3 rows affected (0.02 sec)
Records: 3  Duplicates: 0  Warnings: 0
 

 

 

And now yes you have the text stored correctly.

 

       ID: 5
   notutf: simple INSERT same text everywhere
   yesutf: ح إلى كينيا
   binutf: ح إلى كينيا
different: ح إلى كينيا
3 rows IN SET (0.00 sec)
 
(root@localhost) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_corrupt WHERE id=(SELECT max(id) FROM utf_corrupt)\G
*************************** 1. row ***************************
           id: 5
    noutf_hex: 73696D706C6520696E736572742073616D6520746578742065766572797768657265
   yesutf_hex: C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

All good! My data is correctly moved from a fields base on varchar LATIN1 to the new varchar UTF8.The most important aspect is to ensure to use the right encoding and to do not allow the server to convert you text to wrong hex values.

In case of significant migration if you have a master slave setup,it is possible to play with them, to minimize the impact of the migration, following these steps:

  1. all latin1 
  2. convert slave to varbinary
  3. insert data from master to slave keeping all as latin1
  4. swap the server
  5. convert ex-master to varbinary 
  6. when done change all application connection to utf8 master_host
  7. convert tables back to varchar utf8 on slave 
  8. swap master slave again
  9. convert slave table to UTF8

 

It seems cumbersome, but if you cannot afford to export the data change the table encoding and reload the data, and you cannot stop the write for long time, this is the safer way.The negative aspect of it is ... that varbinary use a different approach for sorting, it is based on the value of the binaries stored and you cannot use collation to modify it.

Let's go through the process:

 

Step 1) Check that all is latin1

 

Master
------------
(root@127.0.0.1:5510) [test]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, FOR linux2.6 (x86_64) USING readline 5.1
 
Connection id:		11
Current DATABASE:	test
Current user:		root@localhost
SSL:			NOT IN USE
Current pager:		stdout
USING OUTFILE:		''
USING delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfm/mysql.sock
Uptime:			2 days 3 hours 3 min 49 sec
 
Threads: 3  Questions: 198  Slow queries: 0  Opens: 89  FLUSH TABLES: 1  Open TABLES: 27  Queries per second avg: 0.001
--------------
 
Slave
---------------------
(root@127.0.0.1:5511) [test]>\s
--------------
/opt/mysql_templates/mysql-55o/bin/mysql  Ver 14.14 Distrib 5.5.27, FOR linux2.6 (x86_64) USING readline 5.1
 
Connection id:		9
Current DATABASE:	test
Current user:		root@localhost
SSL:			NOT IN USE
Current pager:		stdout
USING OUTFILE:		''
USING delimiter:	;
Server version:		5.5.27-log MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/opt/mysql_instances/instances/derekutfs/mysql.sock
Uptime:			2 days 3 hours 3 min 45 sec
 
Threads: 2  Questions: 177  Slow queries: 0  Opens: 87  FLUSH TABLES: 1  Open TABLES: 27  Queries per second avg: 0.000
 

 

 

Insert some data on the master ... just to have a couples of rows.

 

(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 1', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 

 

 

Check the values:

 

Master
----------------------
(root@127.0.0.1:5510) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
2 rows IN SET (0.00 sec)
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 
  
Slave
-------------------------
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
2 rows IN SET (0.00 sec)
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

Step 2) Convert to varbinary the slave, keep same encoding (NAMES latin1)

 

 

(root@127.0.0.1:5511) [test]>alter TABLE utf_test MODIFY different varbinary(250);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0
 
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7 <------------------
1 row IN SET (0.00 sec)
 

 

 

Code remain consistent.

Step 3) Production continue to work as usual, no impact.

Insert some other records on the master:

 

(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP1: insert data 2', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 3. row ***************************
       ID: 5
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 4. row ***************************
       ID: 7
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 2. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 3. row ***************************
           id: 5
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 4. row ***************************
           id: 7
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
4 rows IN SET (0.00 sec)
 
ON The Slave 
----------------------
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 2. row ***************************
       ID: 3
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 3. row ***************************
       ID: 5
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
*************************** 4. row ***************************
       ID: 7
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 2. row ***************************
           id: 3
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 3. row ***************************
           id: 5
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
*************************** 4. row ***************************
           id: 7
    noutf_hex: 53544550313A20696E7365727420646174612032
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
4 rows IN SET (0.00 sec)
 

 

 

Step 4) swap the server

5511 master and 5510 slave (but stop the replication from 5511->5510 and stop also the previous replication from 5510 to 511), production will continue to work in write, order by will be affected for the moment.

 

On new Master 5511:

 

*************************** 6. row ***************************
       ID: 11 <----------------------------------------------
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
6 rows IN SET (0.00 sec)
 
ON new Slave 5510:
*************************** 4. row ***************************
       ID: 7 <--------------------------------
   notutf: STEP1: INSERT DATA 2
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
4 rows IN SET (0.00 sec)
 

 

Step 5) modify to varbinary the tables on the new Slave:

 

(root@127.0.0.1:5510) [test]>alter TABLE utf_test MODIFY different varbinary(250);
Query OK, 4 rows affected (0.02 sec)
Records: 4  Duplicates: 0  Warnings: 0
 

 

 

 

And now restart replication on 5510, so it can replicate from the master 5511.

 

(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
 
1 row IN SET (0.00 sec)(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 11
    noutf_hex: 53544550343A20696E73657274696E6720646174612066726F6D2045582D534C4156452061667465722073776170207265706C69636174696F6E20697320646F776E20
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

All good.

Step 6) change all Application connections to UTF8,

in our example I will change the NAMES on new Master 5511 and on slave 5510

SET NAMES UTF8;

Server characterset: utf8

Db     characterset: utf8

Client characterset: utf8

Conn.  characterset: utf8

 

Master
--------------------------
(root@127.0.0.1:5511) [test]>set names utf8;
Query OK, 0 rows affected (0.00 sec)
 
(root@127.0.0.1:5511) [test]>select * FROM utf_test\G
*************************** 1. row ***************************
       ID: 1
   notutf: STEP1: INSERT DATA 1
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
...
*************************** 6. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
6 rows IN SET (0.00 sec)
 

 

 

 

Note how the values in the yesutf was NOT correctly converted given the initial double translation.

 

Slave
---------------
Server characterset:	utf8
Db     characterset:	utf8
Client characterset:	utf8
Conn.  characterset:	utf8
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 11
   notutf: STEP4: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 

 

 

 

Step 7) Convert table on SLAVE back to varchar but using utf8 now

 

(root@127.0.0.1:5510) [test]>alter TABLE utf_test MODIFY different varchar(250)  CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 15
   notutf: STEP7: inserting DATA FROM EX-SLAVE after swap replication IS down 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 15
    noutf_hex: 53544550343A20696E73657274696E6720646174612066726F6D2045582D534C4156452061667465722073776170207265706C69636174696F6E20697320646F776E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 

 

 

 

Step 8) given all good put now the 5510 back as master,

given NOW the 5510 has the fields set as varchar and encoding/collation using UTF also the order by will work fine again.

At this point data will continue to be pass from master to slave.

 

ON Master
---------------------
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP8: inserting data from master 5510 after swap server again ', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
Query OK, 1 row affected (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select * FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
       ID: 17
   notutf: STEP8: inserting DATA FROM master 5510 after swap server again 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
1 row IN SET (0.00 sec)
 
(root@127.0.0.1:5510) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test WHERE id=(SELECT max(id) FROM utf_test)\G
*************************** 1. row ***************************
           id: 17
    noutf_hex: 53544550383A20696E73657274696E6720646174612066726F6D206D6173746572203535313020616674657220737761702073657276657220616761696E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
1 row IN SET (0.00 sec)
 
ON Slave
-------------------------
*************************** 8. row ***************************
       ID: 17
   notutf: STEP8: inserting DATA FROM master 5510 after swap server again 
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
8 rows IN SET (0.00 sec)
 
*************************** 8. row ***************************
           id: 17
    noutf_hex: 53544550383A20696E73657274696E6720646174612066726F6D206D6173746572203535313020616674657220737761702073657276657220616761696E20
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
 

 

 

 

Step 9) Finally convert all tables also on slave to varchar UTF8:

 

 

Master
---------------
(root@127.0.0.1:5510) [test]>insert INTO utf_test VALUES(NULL,'STEP9: final Slave conversions', 'القمح إلى كينيا', 'القمح إلى كينيا', 'القمح إلى كينيا');
 
Slave
---------------
(root@127.0.0.1:5511) [test]>alter TABLE utf_test MODIFY different varchar(250)  CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL;
*************************** 9. row ***************************
       ID: 19
   notutf: STEP9: final Slave conversions
   yesutf: القمح إلى كينيا
   binutf: القمح إلى كينيا
different: القمح إلى كينيا
9 rows IN SET (0.00 sec)
(root@127.0.0.1:5511) [test]>select id,hex(notutf) AS noutf_hex,hex(yesutf) AS yesutf_hex,hex(binutf) AS binutf_hex,hex(different) AS different_hex FROM utf_test\G
*************************** 1. row ***************************
           id: 1
    noutf_hex: 53544550313A20696E7365727420646174612031
   yesutf_hex: C398C2A7C399E2809EC399E2809AC399E280A6C398C2AD20C398C2A5C399E2809EC399E280B020C399C692C399C5A0C399E280A0C399C5A0C398C2A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
...
           id: 19
    noutf_hex: 53544550393A2066696E616C20536C61766520636F6E76657273696F6E73
   yesutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
   binutf_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
different_hex: D8A7D984D982D985D8AD20D8A5D984D98920D983D98AD986D98AD8A7
9 rows IN SET (0.00 sec)
 

 

 

 

 

 

Done.
When in doubt about the character set you are using or the collation always check them.
good commands from application:
SELECT USER(), CHARSET(USER()), COLLATION(USER())\G(root@127.0.0.1:5511) [test]>SELECT USER(), CHARSET(USER()), COLLATION(USER())\G
*************************** 1. row ***************************
           USER(): root@127.0.0.1
  CHARSET(USER()): utf8
COLLATION(USER()): utf8_general_ci
1 row IN SET (0.00 sec)
 
AND
(root@127.0.0.1:5511) [test]>show VARIABLES LIKE '%server%'\G
*************************** 1. row ***************************
Variable_name: character_set_server
        Value: utf8
*************************** 2. row ***************************
Variable_name: collation_server
        Value: utf8_unicode_ci
*************************** 3. row ***************************
Variable_name: server_id
        Value: 5511
3 rows IN SET (0.00 sec)
 

 

 

Conclusion

You must put a lot of attention on how you connect to the server, in relation to the encoding you are going to use.
It is advisable to set it explicitly, to be really sure and to check that the server/application are getting it right.
I cannot find a good reasons for which you should have a mix of UTF8 and Latin1 in the same table as for the example I have use, but you never know.
What I mean is that it is normal to have Fields using Latin1 and be define as Latin1 for the encoding, but mixing the UTF8 using Latin1 encoding in the application connection, and try to write UTF will only generate a mess.
If you Have a table with 4 fields 2 UTF8 and 2 LATIN1, please use UTF8 encoding in the connection/server and be sure to push the right text, actually if you do so, the server will alert you if you try to do something stupid, like pushing UTF8 values in latin1 fields.
For that specific cases when you have no idea of what you are going to write, and you MUST use Latin1 for the application connection, well use varbinary and your life will be easier, of course the ordering will be done by binary value instead character, but you need to keep your data safe first.

Reference

 

Thanks 

Finally a huge thanks to Derek Downey, because he raise the issue (again) to me in relation to a customer migration.

Thanks for his dedication, professionality and will to learn and have fun together.

Ultimo aggiornamento Domenica 15 Settembre 2013 23:16
 
«InizioPrec.12345678910Succ.Fine»

JPAGE_CURRENT_OF_TOTAL
 

Who's Online

 17 visitatori online