I had a mess with creating an RSS feed for http://www.weberforums.com.
each time someone would enter a post with some High Bit Characters such as Russian
or other non English characters that are considered as High Bit Chars the RSS geed
would not validate and fail.
I was looking for a way to check the text for such chars and skip these posts in
the RSS feed. Two ways I found are :
This look checks each char in the text. Valid chars are only between 32 and 126.
Two other valid chars I added were 10 (linefeed) and 13 (carriage return)
<?
for ($i=0 ; $i < strlen($post_text) ; $i++) {
$chr = $post_text{$i};
$ord = ord($chr);
if (($ord<32 or $ord>126) and ($ord != 13) and ($ord != 10)){
Echo "BAD CHAR is : " . $ord;
break;
}
}
?>
After sending this topic to the PHP General list i got a response from
Abdullah Ramazanoglu with a different solution :
<?php
if (preg_match("/[\x80-\xff]/", $string)) {
# high-bit char found
} else {
# no high-bit char
}
?>