mysql - Convert all the data of table (which have both encoding data) into single encoding either utf8 or latin1 -
we have table consists of 2 different encoding data (utf8 , latin1) has been inserted 2 different use cases of application. getting broken strings issue other languages text if data in 1 encoding.we need convert total table data single encoding.
eg: table x
id name data encoded
1 ébarber - utf8
2 à gogo - latin1
if use "latin1" connection charset, issue "ébarber"(broken strings). if use "utf8" connection charset, issue "à gogo"(broken strings).
how can convert table data single encoding either utf8 or latin1?
please share thoughts fix issue.
it possible. painful.
- convert
binary
- identify rows contain encoding. can automated, may not 100% correct.
- do special
update
against rows. - convert
utf8
.
details:
step 1: convert binary:
alter table tbl modify column col varbinary(...) ...; -- suitable matching other stuff, or alter table tbl modify column col blob ...; -- if text.
step 2: find latin1 lines (negate following):
select hex(col) ...
extracts hex use regexp below
this regexp checks utf8 (actually utf8mb4):
/^( [\xc2-\xdf] [\x80-\xbf] | \xe0[\xa0-\xbf][\x80-\xbf] | [\xe1-\xec\xee\xef][\x80-\xbf]{2} | \xed[\x80-\x9f][\x80-\xbf] | \xf0[\x90-\xbf][\x80-\xbf]{2} | [\xf1-\xf3][\x80-\xbf]{3} | \xf4[\x80-\x8f][\x80-\xbf]{2} )+/ox
combining should tell if row probably utf8.
step 3: converts column (col
) latin1 encoding utf8:
convert(convert(col using latin1) using utf8)
step 4: utf8:
alter table tbl modify column col varchar(...) ... character set utf8 ...; -- or alter table tbl modify column col text ... character set utf8 ...;
caveat! while individual parts of have told known work, cobbled-together combination has not been tested. suspect there form of hiccups.
Comments
Post a Comment