What is Unicode?
Unicode is a standard encoding system for computers to diplay text and symbols from all writing systems around the world. Unicode is coordinated by the
Unicode
Consortium. There are several Unicode encodings: the most popular is
UTF-8, other examples are UTF-7 and UTF-16.
UTF-8 uses a variable-length character encoding, and all basic
Latin character codes are identical to ASCII. On the Unicode website you can read the following definition for Unicode:
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
Converting from Latin to UTF-8 and back in your code
PHP: Use
utf8_decode($data) (convert from UTF-8 to ISO-8859-1

) and
utf8_encode($data) (convert from ISO-8859-1 to UTF-8

).
Some native PHP functions such as
strtolower(),
strtoupper() and
ucfirst() do not always function
correctly with UTF-8 strings. Possible solutions: convert to latin first or add the following line to your code
setlocale(LC_CTYPE, 'C');
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
PERL:
use Encode; from_to($data, "iso-8859-1", "utf8"); You can use
is_utf8($data) to check if a string is valid UTF-8
Python: To encode in UTF-8:
utf8string = unicode(data,"utf-8"); To decode back to locale character set:
utf8string.encode("utf-8");
MySQL: MySQL uses charachter sets on all levels, there are settings like: character_set_connection and collation_connection, and you can specify a character set at the database level, the table level and field level.
To convert a charachter set inside a MySQL query use convert:
SELECT CONVERT(latin1field USING utf8) 
.
If you are experiencing speed issues with table joins after converting character sets of tabels or fields make sure that all ID fields use the same COLLATE setting.
HTML: You can specify your prefered character set using the content-type meta tag (example:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">).
To avoid problems with various character sets it is sometimes easier to convert your special charachters to (plain ASCII) HTML code. HTML encoded special characters are also readable by
old browsers, whereas the content-type meta tag is not. You can use this
special charachter to HTML code converter for this.
Unix systems: Use the character set conversion tool:
iconv -f ISO-8859-1 -t UTF-8 filename.txt
Windows systems: Most good text-editors offer Unicode support, such as
UltraEdit (File → Conversions → ASCII to UTF-8 or ASCII to Unicode (16-Bit)).
Convert UTF-8 to Latin or Latin to UTF-8
Copy your text below. This page is Latin encoded.
Sitemap