Question

我正在尝试创建一个数据框，其中第一列（＆＃34; Value＆＃34;）在每一行中都有一个多字符串，而所有其他列都有标签，表示来自＆＃34;中所有字符串的唯一字。值＆＃34 ;.我想用每个字符串（一行）的字频率填充这个数据帧，检查所有唯一的单词（列）。从某种意义上说，创建一个简单的TDM

rows = ['you want peace', 'we went home', 'our home is nice', 'we want peace at home']
col_list = [word.lower().split(" ") for word in rows]
set_col = set(list(itertools.chain.from_iterable(col_list)))

columns = set_col
ncols = len(set_col)

testDF = pd.DataFrame(columns = set_col)
testDF.insert(0, "Value", " ")

testDF["Value"] = rows
testDF.fillna(0, inplace=True)

irow = 0

for tweet in testDF["Value"]:

    for word in tweet.split(" "):
        for col in xrange(1, ncols):

            if word == testDF.columns[col]: testDF[irow, col] += 1

    irow += 1

testDF.head()

然而，我收到一个错误：

KeyError                                  Traceback (most recent call last)
<ipython-input-64-9a991295ccd9> in <module>()
     23         for col in xrange(1, ncols):
     24 
---> 25             if word == testDF.columns[col]: testDF[irow, col] += 1
     26 
     27     irow += 1

C:\Users\Tony\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
   1795             return self._getitem_multilevel(key)
   1796         else:
-> 1797             return self._getitem_column(key)
   1798 
   1799     def _getitem_column(self, key):

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item   (pandas\hashtable.c:12280)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)()

KeyError: (0, 9)

我不确定这是什么问题，谢谢你的帮助此外，如果有更简洁的方法（除了没有文本化 - 安装问题），学习会很棒！

Answer 1

我不是100％确定您的完整计划正在尝试做什么，但如果通过以下内容 -

testDF[irow, col]

您的意思是索引数据框中的单元格，以irow作为索引，col作为列，您不能使用简单的下标。你应该使用.iloc等等。示例 -

 if word == testDF.columns[col]: testDF.iloc[irow, col] += 1

如果您希望.iloc指向索引的0索引编号，请使用irow，如果irow是DataFrame的确切索引，则可以使用{{1}而不是.loc。

计算DataFrame中的单词频率

1 个答案: