PostgreSQL: Convert Windows-1252 ISO-8859-1-superset characters to UTF-8
From FVue
Contents |
Problem
Characterset Windows-1252 has - compared to ISO-8859-1 - special displayable characters rather than control characters in the 0x80 to 0x9F range. When a Windows-1252 text is converted to Unicode with the source characterset specified as "ISO-8859-1", these special characters will not be converted to Unicode.
How can I convert these special characters to their Unicode equivalent? If I try this code underneath, I get an error:
database# UPDATE table SET text = REPLACE(text, '\x80', '€');
WARNING: nonstandard use of escape in a string literal
LINE 2: text = REPLACE(text, '\x91', '€')
^
HINT: Use the escape string syntax for escapes, e.g., E'\r\n'.
ERROR: invalid byte sequence for encoding "UTF8": 0x91
HINT: This error can also happen if the byte sequence does not match the encoding
expected by the server, which is controlled by "client_encoding".
Solution
Do not escape the special characters, but insert them directly into the sql. Here's an UPDATE statement for the entire 0x80 to 0x9F range, replace mytable and myfield with yours:
UPDATE mytable
SET myfield =
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE(myfield,
E'\\u0080', '€'), E'\\u0081', '' ), E'\\u0082', '‚'), E'\\u0083', 'ƒ'),
E'\\u0084', '„'), E'\\u0085', '…'), E'\\u0086', '†'), E'\\u0087', '‡'),
E'\\u0088', 'ˆ'), E'\\u0089', '‰'), E'\\u008a', 'Š'), E'\\u008b', '‹'),
E'\\u008c', 'Œ'), E'\\u008d', '' ), E'\\u008e', 'Ž'), E'\\u008f', '' ),
E'\\u0090', '' ), E'\\u0091', '‘'), E'\\u0092', '’'), E'\\u0093', '“'),
E'\\u0094', '”'), E'\\u0095', '•'), E'\\u0096', '–'), E'\\u0097', '—'),
E'\\u0098', '˜'), E'\\u0099', '™'), E'\\u009a', 'š'), E'\\u009b', '›'),
E'\\u009c', 'œ'), E'\\u009d', '' ), E'\\u009e', 'ž'), E'\\u009f', 'Ÿ');
See also
http://en.wikipedia.org/wiki/Windows-1252
Advertisement