在Multidimensinal Array PHP中计算文档频率

时间:2017-12-07 19:56:13

标签: php arrays multidimensional-array

我有一个像这样的数组

 Array ( 
        [0] => Array ( [id_doc] => 1 [term] => curi ) 
        [1] => Array ( [id_doc] => 1 [term] => tidur ) 
        [2] => Array ( [id_doc] => 1 [term] => kamar ) 
        [3] => Array ( [id_doc] => 2 [term] => curi ) 
        [4] => Array ( [id_doc] => 2 [term] => cela ) 
        [5] => Array ( [id_doc] => 2 [term] => hukum ) 
        [6] => Array ( [id_doc] => 3 [term] => nyanyi ) 
        [7] => Array ( [id_doc] => 3 [term] => dangdut ) 
        [8] => Array ( [id_doc] => 3 [term] => curi )   
    ) 

如何从这些文件上的术语获取文件频率的数量。我希望输出像这样。

Array ( 
        [0] => Array ( [id_doc] => 1 [term] => curi [doc_frequency] => 3 ) 
        [1] => Array ( [id_doc] => 1 [term] => tidur [doc_frequency] => 1 ) 
        [2] => Array ( [id_doc] => 1 [term] => kamar [doc_frequency] => 1 ) 
        [3] => Array ( [id_doc] => 2 [term] => curi [doc_frequency] => 3 ) 
        [4] => Array ( [id_doc] => 2 [term] => cela [doc_frequency] => 1 ) 
        [5] => Array ( [id_doc] => 2 [term] => hukum [doc_frequency] => 1 ) 
        [6] => Array ( [id_doc] => 3 [term] => nyanyi [doc_frequency] => 1 ) 
        [7] => Array ( [id_doc] => 3 [term] => dangdut [doc_frequency] => 1 ) 
        [8] => Array ( [id_doc] => 3 [term] => curi [doc_frequency] => 3 )  
    ) 

所以术语' curi'有3个文件频率,因为它出现在3个文件上。 我试过这个

$count_df = array_count_values(array_map(function($item) {
   return $item['term'];
}, $dokumen_frek));
print_r($count_df);

但结果是

Array ( 
[curi] => 3 
[tidur] => 1 
[kamar] => 1 
[cela] => 1 
[hukum] => 1 
[nyanyi] => 1 
[dangdut] => 1 

1 个答案:

答案 0 :(得分:1)

使用array_count_values功能

$terms = array_count_values(array_column($arr, 'term'));

foreach($arr as &$x) {
   $x['doc_frequency'] = $terms[$x['term']];
}

demo