HTML special characters after parsing XLS file

After parsing an XLS file, if there are Armenian characters in the file, they are converted into special characters. HTML characters, like this function:

function uc2html($str) {
        $ret = '';
        for( $i=0; $i<strlen($str)/2; $i++ ) {
                $charcode = ord($str[$i*2])+256*ord($str[$i*2+1]);
                $ret .= '&#'.$charcode;
        }
        return $ret;
}

Everything would be fine if this data from the file did not go to the database, but directly to HTML.
And in the database, instead of the words 44&#44&#1392&#1396&#1397&#1414&#1379&#1397, characters like this come?()


Answer 1, authority 100%

First replace

$ret .= '&#'.$charcode;

to

$ret .= '&#'.$charcode.';';

Second, before writing to the database, do

function mytestcallback($m) {
  return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES");
  }
$text = preg_replace_callback("/(&#[0-9]+;)/", "mytestcallback", $text);

This will turn all htm special characters into unicode characters.


Answer 2

What do you want to do with this data? Let there be such in the database. After output to shtml they will be as they should…

Well, if you want to write to the database as is, then don’t convert it.

Besides, what is your database encoding?