将中文单个单词加载到数组中

时间:2014-03-21 17:18:04

标签: php utf-8 ucs2

我有一个包含多行中文单词的文件,用逗号,分隔,如下所示:

你,我,他,好,但,中,国,龙
好,把,是,的,啊,人,吖,哦

我想通过使用以下代码将它们加载到数组中,稍后我将使用此数组来查找文章中包含的中文单词:

$ds = file($Dictionary);
$_SP_ = chr(0xFF).chr(0xFE);
$array = array();
foreach($ds as $d)
{
    $spstr = _SP_;//
    $spstr = iconv(ucs-2be, 'utf-8', $spstr);
    $ws = explode(',', $d);//array of single Chinese word
    $wall = iconv('utf-8', ucs-2be, join($spstr, $ws));//what is $wall used for?
    $ws = explode(_SP_, $wall);
    foreach($ws as $estr)
    {
        $array[$estr] = strlen($estr);
    }
}

我的问题:

  1. $_SP_ = chr(0xFF).chr(0xFE) mean?chr(0xFF).chr(0xFE)是从ASCII中的最后两个字符中检索的字符串,这两者的组合用于什么?

  2. 为什么我应该从ucs-2b将 SP 转换为utf-8格式?

  3. 为什么$ws再次转换为字符串,但以chr(0xFF).chr(0xFE)的utf-8类型分隔。

  4. 为什么它需要每个单词的长度?

  5. 为什么$spstr属于UCS-2be类型,只是因为它是chr(0xFF).chr(0xFE)的组合?

0 个答案:

没有答案