|
|
|
|
|
|
| |
One of the major new features in MySQL 4.1 is strong Unicode support, along with support for specifying character sets at many different levels. This makes it much simpler to handle content in a wide range of languages in your applications, as well as making it possible to handle content in multi-byte character encodings that were not supported in earlier versions of MySQL.
Character Encodings and Unicode
A character encoding is a way of mapping a character (the letter 'A') to an integer in a character set (the number 65 in the US-ASCII character set). With something as limited as the characters in the US-ASCII character set (the twenty-six letters of the English alphabet, both lowercase and uppercase, numbers from 0 to 9, and some punctuation), fitting this into a single byte is not a problem. But once you start to create a character set for languages like German, Swedish, Hungarian, and Japanese, you start to either hit the boundaries of the 8-bit byte when you try to create a character set to represent two of the languages, or even a single language like Japanese.
So throughout the history of computing, a number of different character encodings have been specified for mapping different characters to integers. For character sets that wouldn't fit in a single byte, double-byte character sets created, and so were multi-byte character sets that use a special character to signal a shift between single-byte and double-byte encoding.
The Unicode Consortium came together to create a specification for a character encoding that would be able to encompass the characters in all written languages (although contrary to what you may have heard, that does not yet include Klingon). The result was the Unicode character set, and some encodings. The two most common (and the two that MySQL 4.1 supports) are UCS-2, which encodes everything as two-byte characters, and UTF-8, which uses a multi-byte encoding scheme that extends US-ASCII.
ISO-8859-1 is the most common character set used for Western languages, and it is extended by the Windows-1252 character set to include some other characters, such as the euro (€) and trademark symbol (™). Because Windows-1252 is a superset of ISO-8859-1, the character set is known as latin1 to MySQL, and there is no distinct ISO-8859-1 character set. This matches the common behavior in web applications, which often treat the two interchangeably.
So why not just use UCS-2 or UTF-8 for everything? Well, if you're already working with a lot of data in a particular encoding, like Big-5 (often used for Chinese), you can avoid the processing overhead of converting into and out of UTF-8 by just storing the data in Big-5 encoding. UTF-8 encoding also tends to be larger (byte-wise) than more specific encodings, because characters outside of the normal ASCII range take at least two bytes. The string "déja vù" is only seven bytes in ISO-8859-1, but nine in UTF-8. The characters in scripts such as Chinese, Japanese, and Korean are each three bytes in UTF-8, but can be represented as two bytes in more specific encodings such as Big-5.
Read more... |
|
| |
| Using Transactions In MySQL Part 1 Categories : Databases, MySQL, Transactions | | | Date Arithmetic With MySQL Categories : PHP, Databases, MySQL, Date Time | | | Time Is Money Part 1 of 2 - Designing and implementing a Web-based application Categories : PHP, Databases, MySQL, Complete Programs | | | Alternating row colors with PHP and mySQL Categories : PHP, Databases, MySQL, HTML and PHP | | | Multicolumn Output from a Database with PHP Categories : PHP, Databases, HTML and PHP, MySQL | | | PHP and MySQL News with Comments Categories : PHP, Databases, MySQL | | | User identification using cookies in PHP and MySQL Categories : PHP, Databases, MySQL, Cookies | | | Backing Up Your MySQL Databases With MySQLDump Categories : MySQL, Databases | | | How To add paging (Pagination) with PHP and MySQL Categories : PHP, Beginner Guides, Databases, MySQL, HTML and PHP | | | Custom MySQL-functions Categories : Databases, MySQL, PHP, PHP Functions | | | Saving Images in MySQL Categories : MySQL, PHP, Graphics, Databases | | | Watching The Web Categories : PHP, Databases, MySQL, HTTP, MD5 | | | Referer Statistics Categories : PHP, MySQL, HTTP, Databases | | | Beginners guide to PHP and MySQL Categories : PHP, Beginner Guides, Databases, MySQL, Installation | | | Descriptions of Common Data Types Categories : MySQL, Databases, PHP, PHP options/info, General | |
| |
|
|