WeberDev.com PHP and MySQL Code

LOG IN
BEGINNER GUIDES  |  PHP CLASSES  |  CODE SEARCH  |  ARTICLES SEARCH  |  PHP FORUMS  |  PHP MANUAL  |  PHP FUNCTIONS LIST  |  WEB SITE TEMPLATES
Start typing to search for PHP and MySQL Code Snippets and Articles Search
Submit a code Example / Snippet Submit Your Code
Search Engine Optimization Monitor SEO Monitor
Web Site UpTime Monitor UpTime Monitor
WeberDev's Monthly code contest PHP Code Contest
Your Personal Examples List My Favorite Examples
Your Personal Articles List My Favorite Articles
Edit Account Info Update Your Profile
PHP Code Search
Web Development Forums
Learn MySQL Playing Trivia
PHPBB2 Templates
Web Development Index
Web Development Resources
Web Development Content
PHPClasses
PHP Editor
PHP Jobs
Vision.To Design
Ajax Tutorials
PHP Programming Help
PHP/MySQL Programming
Webmaster Resources
Webmaster Forum
XML meta language
website builder
Mobile Dev World

Go Back Add a Comment Send this Article to a friend Add this Article to your personal favoritest for easy future access to your favorite Code Examples and Articles. Submit a code example Print this code example.
BACK ADD A COMMENT SEND TO A FRIEND ADD TO MY FAVORITES SUBMIT AN ARTICLE PRINT
Title : Unicode and Other Funny Characters
Categories : Databases, MySQL, Unicode
MySQL.com
MySQL.com
Date : 2004-09-30
Grade : 0 of 5 (graded 0 times)
Viewed : 8185
Search : More Articles by MySQL.com
Action : Grade This Article
Tools : My Favotite Articles


  Submit your own code examples 
 


One of the major new features in MySQL 4.1 is strong Unicode support, along with support for specifying character sets at many different levels. This makes it much simpler to handle content in a wide range of languages in your applications, as well as making it possible to handle content in multi-byte character encodings that were not supported in earlier versions of MySQL.

Character Encodings and Unicode
A character encoding is a way of mapping a character (the letter 'A') to an integer in a character set (the number 65 in the US-ASCII character set). With something as limited as the characters in the US-ASCII character set (the twenty-six letters of the English alphabet, both lowercase and uppercase, numbers from 0 to 9, and some punctuation), fitting this into a single byte is not a problem. But once you start to create a character set for languages like German, Swedish, Hungarian, and Japanese, you start to either hit the boundaries of the 8-bit byte when you try to create a character set to represent two of the languages, or even a single language like Japanese.

So throughout the history of computing, a number of different character encodings have been specified for mapping different characters to integers. For character sets that wouldn't fit in a single byte, double-byte character sets created, and so were multi-byte character sets that use a special character to signal a shift between single-byte and double-byte encoding.

The Unicode Consortium came together to create a specification for a character encoding that would be able to encompass the characters in all written languages (although contrary to what you may have heard, that does not yet include Klingon). The result was the Unicode character set, and some encodings. The two most common (and the two that MySQL 4.1 supports) are UCS-2, which encodes everything as two-byte characters, and UTF-8, which uses a multi-byte encoding scheme that extends US-ASCII.

ISO-8859-1 is the most common character set used for Western languages, and it is extended by the Windows-1252 character set to include some other characters, such as the euro (€) and trademark symbol (™). Because Windows-1252 is a superset of ISO-8859-1, the character set is known as latin1 to MySQL, and there is no distinct ISO-8859-1 character set. This matches the common behavior in web applications, which often treat the two interchangeably.

So why not just use UCS-2 or UTF-8 for everything? Well, if you're already working with a lot of data in a particular encoding, like Big-5 (often used for Chinese), you can avoid the processing overhead of converting into and out of UTF-8 by just storing the data in Big-5 encoding. UTF-8 encoding also tends to be larger (byte-wise) than more specific encodings, because characters outside of the normal ASCII range take at least two bytes. The string "déja vù" is only seven bytes in ISO-8859-1, but nine in UTF-8. The characters in scripts such as Chinese, Japanese, and Korean are each three bytes in UTF-8, but can be represented as two bytes in more specific encodings such as Big-5.

Read more...









Using Transactions In MySQL Part 1
Categories : Databases, MySQL, Transactions
Date Arithmetic With MySQL
Categories : PHP, Databases, MySQL, Date Time
Time Is Money Part 1 of 2 - Designing and implementing a Web-based application
Categories : PHP, Databases, MySQL, Complete Programs
Alternating row colors with PHP and mySQL
Categories : PHP, Databases, MySQL, HTML and PHP
Multicolumn Output from a Database with PHP
Categories : PHP, Databases, HTML and PHP, MySQL
PHP and MySQL News with Comments
Categories : PHP, Databases, MySQL
User identification using cookies in PHP and MySQL
Categories : PHP, Databases, MySQL, Cookies
Backing Up Your MySQL Databases With MySQLDump
Categories : MySQL, Databases
How To add paging (Pagination) with PHP and MySQL
Categories : PHP, Beginner Guides, Databases, MySQL, HTML and PHP
Custom MySQL-functions
Categories : Databases, MySQL, PHP, PHP Functions
Saving Images in MySQL
Categories : MySQL, PHP, Graphics, Databases
Watching The Web
Categories : PHP, Databases, MySQL, HTTP, MD5
Referer Statistics
Categories : PHP, MySQL, HTTP, Databases
Beginners guide to PHP and MySQL
Categories : PHP, Beginner Guides, Databases, MySQL, Installation
Descriptions of Common Data Types
Categories : MySQL, Databases, PHP, PHP options/info, General