WeberDev.com PHP and MySQL Code

LOG IN
BEGINNER GUIDESPHP CLASSESCODE SEARCHARTICLES SEARCHPHP FORUMSPHP MANUALPHP FUNCTIONS LISTWEB SITE TEMPLATES
Start typing to search for PHP and MySQL Code Snippets and Articles Search


Submit a code Example / Snippet Join us on FaceBook
Submit a code Example / Snippet Submit Your Code
Poker Tournaments Poker Tournaments
Poker Guide for Developers Poker Guide for Developers
Search Engine Optimization Monitor SEO Monitor
Web Site UpTime Monitor UpTime Monitor
Your Personal Examples List My Favorite Examples
Your Personal Articles List My Favorite Articles
Edit Account Info Update Your Profile
PHP Code Search
Web Development Forums
Learn MySQL Playing Trivia
PHPBB2 Templates
Web Development Resources
Web Development Content
PHPClasses
PHP Editor
PHP Jobs
Vision.To Design
Ajax Tutorials
PHP Programming Help
PHP/MySQL Programming
Webmaster Resources
Webmaster Forum
XML meta language
website builder
פרייסז - השוואת מחירים בסופר
ZeroLag.com
Texas Holdem Poker Evangelists

Go Back Add a Comment Send this Article to a friend Add this Article to your personal favoritest for easy future access to your favorite Code Examples and Articles. Submit a code example Print this code example.
BACK ADD A COMMENT SEND TO A FRIEND ADD TO MY FAVORITES SUBMIT AN ARTICLE PRINT
Title : Unicode and Other Funny Characters
Categories : Databases, MySQL, Unicode
MySQL.com
MySQL.com
Date : 2004-09-30
Grade : 0 of 5 (graded 0 times)
Viewed : 11438
Search : More Articles by MySQL.com
Action : Grade This Article
Tools : My Favotite Articles


Submit your own code examples 
 


One of the major new features in MySQL 4.1 is strong Unicode support, along with support for specifying character sets at many different levels. This makes it much simpler to handle content in a wide range of languages in your applications, as well as making it possible to handle content in multi-byte character encodings that were not supported in earlier versions of MySQL.

Character Encodings and Unicode
A character encoding is a way of mapping a character (the letter 'A') to an integer in a character set (the number 65 in the US-ASCII character set). With something as limited as the characters in the US-ASCII character set (the twenty-six letters of the English alphabet, both lowercase and uppercase, numbers from 0 to 9, and some punctuation), fitting this into a single byte is not a problem. But once you start to create a character set for languages like German, Swedish, Hungarian, and Japanese, you start to either hit the boundaries of the 8-bit byte when you try to create a character set to represent two of the languages, or even a single language like Japanese.

So throughout the history of computing, a number of different character encodings have been specified for mapping different characters to integers. For character sets that wouldn't fit in a single byte, double-byte character sets created, and so were multi-byte character sets that use a special character to signal a shift between single-byte and double-byte encoding.

The Unicode Consortium came together to create a specification for a character encoding that would be able to encompass the characters in all written languages (although contrary to what you may have heard, that does not yet include Klingon). The result was the Unicode character set, and some encodings. The two most common (and the two that MySQL 4.1 supports) are UCS-2, which encodes everything as two-byte characters, and UTF-8, which uses a multi-byte encoding scheme that extends US-ASCII.

ISO-8859-1 is the most common character set used for Western languages, and it is extended by the Windows-1252 character set to include some other characters, such as the euro () and trademark symbol (). Because Windows-1252 is a superset of ISO-8859-1, the character set is known as latin1 to MySQL, and there is no distinct ISO-8859-1 character set. This matches the common behavior in web applications, which often treat the two interchangeably.

So why not just use UCS-2 or UTF-8 for everything? Well, if you're already working with a lot of data in a particular encoding, like Big-5 (often used for Chinese), you can avoid the processing overhead of converting into and out of UTF-8 by just storing the data in Big-5 encoding. UTF-8 encoding also tends to be larger (byte-wise) than more specific encodings, because characters outside of the normal ASCII range take at least two bytes. The string "dja v" is only seven bytes in ISO-8859-1, but nine in UTF-8. The characters in scripts such as Chinese, Japanese, and Korean are each three bytes in UTF-8, but can be represented as two bytes in more specific encodings such as Big-5.

Read more...









Creating an IE-Only Database Driven Menu System With PHP, MySQL and DHTML
Categories : PHP, MySQL, Databases, DHTML
Case Study: Handling MySQL Growth With a PHP Class
Categories : Databases, MySQL, PHP
MySQL Access Control System - Grant Tables
Categories : Databases, MySQL, Security
PHP, MySQL and Authentication 101
Categories : PHP, Databases, MySQL, Authentication
Building A Persistent Shopping Cart With PHP and MySQL
Categories : PHP, MySQL, Databases, Ecommerce
Practical Date and Time examples with PHP and MySQL
Categories : Databases, MySQL, PHP, Date/time
Speaking SQL part 2
Categories : General SQL, Databases, MySQL
How Logs Work On MySQL With InnoDB Tables
Categories : Databases, MySQL, InnoDB
Managing Foreign Key Relationships In MySQL Using SQLyog
Categories : Databases, MySQL, SQLyog
Creating Users and Setting Permissions in MySQL
Categories : Databases, MySQL
Miles To Go Before I Sleep...
Categories : PHP, Calendar, Databases, MySQL
Simple Connection to MySQL with PHP
Categories : PHP, MySQL, Databases
Start Using MySQL
Categories : MySQL, Databases, To MySQL, Beginner Guides
Using Transactions In MySQL Part 1
Categories : Databases, MySQL, Transactions
Date Arithmetic With MySQL
Categories : PHP, Databases, MySQL, Date Time