What is Unicode?
Unicode is a standard encoding system for computers to display text and symbols from all writing systems around the world. Unicode is coordinated by the Unicode Consortium. There are several Unicode encodings: the most popular is UTF-8, other examples are UTF-7 and UTF-16. UTF-8 uses a variable-length character encoding, and all basic Latin character codes are identical to ASCII. On the Unicode website you can read the following definition for Unicode: Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. More information
Converting from Latin to UTF-8 in your code
Quick Jump A-Z:
utf8_decode($data) to convert from UTF-8 to ISO-8859-1 (more info)
utf8_encode($data) to convert from ISO-8859-1 to UTF-8 (more info).
Some native PHP functions such as strtolower(), strtoupper() and ucfirst() do not always function correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code:
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
use Encode qw( from_to is_utf8 );You can use:
from_to($data, "iso-8859-1", "utf8");
is_utf8($data) to check if a string is valid UTF-8 (more info)
To encode in UTF-8:
source_encoding = "iso-8859-1" To decode back to locale character set:
string = "Names with international characters like 'Andrée'"
string = string.encode(source_encoding)
string = unicode(string, 'utf-8')
In C-Sharp use System.Text:
byte utf8Bytes = Encoding.UTF8.GetBytes("ASCII to UTF8");
byte isoBytes = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, utf8Bytes);
string uf8converted = Encoding.UTF8.GetString(isoBytes);
MySQL uses character sets on all levels, there are settings like: character_set_connection and collation_connection, and you can specify a character set at the database level, the table level and field level.
To convert a character set inside a MySQL query use convert:
SELECT CONVERT(latin1field USING utf8)
If you are experiencing speed issues with table joins after converting character sets of tabels or fields make sure that all ID fields use the same COLLATE setting . More information.
You can specify your preferred character set using the content-type meta tag :
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
To avoid problems with various character sets it is sometimes easier to convert your special characters to (plain ASCII) HTML code. HTML encoded special characters are also readable by
old browsers, whereas the content-type meta tag is not. You can use this special character to HTML code converter for this.
Use the character set conversion tool:
iconv -f ISO-8859-1 -t UTF-8 filename.txtMore information on GNU.org
Most good text-editors offer Unicode support, such as UltraEdit (File → Conversions → 'ASCII to UTF-8' or 'ASCII to Unicode (16-Bit)').
Thanks to software developers who sent me corrections and updates!