|
|
| |
One of the major new features in MySQL 4.1 is strong Unicode support, along with support for specifying character sets at many different levels. This makes it much simpler to handle content in a wide range of languages in your applications, as well as making it possible to handle content in multi-byte character encodings that were not supported in earlier versions of MySQL.
Character Encodings and Unicode
A character encoding is a way of mapping a character (the letter 'A') to an integer in a character set (the number 65 in the US-ASCII character set). With something as limited as the characters in the US-ASCII character set (the twenty-six letters of the English alphabet, both lowercase and uppercase, numbers from 0 to 9, and some punctuation), fitting this into a single byte is not a problem. But once you start to create a character set for languages like German, Swedish, Hungarian, and Japanese, you start to either hit the boundaries of the 8-bit byte when you try to create a character set to represent two of the languages, or even a single language like Japanese.
So throughout the history of computing, a number of different character encodings have been specified for mapping different characters to integers. For character sets that wouldn't fit in a single byte, double-byte character sets created, and so were multi-byte character sets that use a special character to signal a shift between single-byte and double-byte encoding.
The Unicode Consortium came together to create a specification for a character encoding that would be able to encompass the characters in all written languages (although contrary to what you may have heard, that does not yet include Klingon). The result was the Unicode character set, and some encodings. The two most common (and the two that MySQL 4.1 supports) are UCS-2, which encodes everything as two-byte characters, and UTF-8, which uses a multi-byte encoding scheme that extends US-ASCII.
ISO-8859-1 is the most common character set used for Western languages, and it is extended by the Windows-1252 character set to include some other characters, such as the euro () and trademark symbol (). Because Windows-1252 is a superset of ISO-8859-1, the character set is known as latin1 to MySQL, and there is no distinct ISO-8859-1 character set. This matches the common behavior in web applications, which often treat the two interchangeably.
So why not just use UCS-2 or UTF-8 for everything? Well, if you're already working with a lot of data in a particular encoding, like Big-5 (often used for Chinese), you can avoid the processing overhead of converting into and out of UTF-8 by just storing the data in Big-5 encoding. UTF-8 encoding also tends to be larger (byte-wise) than more specific encodings, because characters outside of the normal ASCII range take at least two bytes. The string "dja v" is only seven bytes in ISO-8859-1, but nine in UTF-8. The characters in scripts such as Chinese, Japanese, and Korean are each three bytes in UTF-8, but can be represented as two bytes in more specific encodings such as Big-5.
Read more... |
|
| |
| Creating an IE-Only Database Driven Menu System With PHP, MySQL and DHTML Categories : PHP, MySQL, Databases, DHTML | | | Case Study: Handling MySQL Growth With a PHP Class Categories : Databases, MySQL, PHP | | | MySQL Access Control System - Grant Tables Categories : Databases, MySQL, Security | | | PHP, MySQL and Authentication 101 Categories : PHP, Databases, MySQL, Authentication | | | Building A Persistent Shopping Cart With PHP and MySQL Categories : PHP, MySQL, Databases, Ecommerce | | | Practical Date and Time examples with PHP and MySQL Categories : Databases, MySQL, PHP, Date/time | | | Speaking SQL part 2 Categories : General SQL, Databases, MySQL | | | How Logs Work On MySQL With InnoDB Tables Categories : Databases, MySQL, InnoDB | | | Managing Foreign Key Relationships In MySQL Using SQLyog Categories : Databases, MySQL, SQLyog | | | Creating Users and Setting Permissions in MySQL Categories : Databases, MySQL | | | Miles To Go Before I Sleep... Categories : PHP, Calendar, Databases, MySQL | | | Simple Connection to MySQL with PHP Categories : PHP, MySQL, Databases | | | Start Using MySQL Categories : MySQL, Databases, To MySQL, Beginner Guides | | | Using Transactions In MySQL Part 1 Categories : Databases, MySQL, Transactions | | | Date Arithmetic With MySQL Categories : PHP, Databases, MySQL, Date Time | |
| | |
|