如何从数据框中获取特定列?

时间:2016-12-08 02:29:36

标签: pandas jupyter

你好我正在使用一个数据框,它有三列:comment_id,class和comment_message,我需要存储三列,但是当我尝试存储名为:class,my complete的列时出现错误代码如下:

from sklearn import svm
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer

df1=pd.read_csv("C:/Users/acamagon/Downloads/dataSet",sep=',')
#print(df1)

comment_id = df1['comment_id']



comment_message = df1['comment_message']
print(comment_message)

问题出现了:

#Here is the problem
classification = df1['class']

该文件如下所示:

comment_id,comment_message,class    
10154395643583692_10154397346673692,quisiera saber el precio y las caracteristicas del selulae samsung s5 xfavoor,1 
10154395643583692_10154397434578692,"buenos dias, necesito que le den seguimiento a un telefono que deje en garantia desde octubre en el cac urban center de xalapa, veracruz. ya van 4 veces y me dicen que el telefono no esta y ya va para 3 meses que lo deje. espero me den una respuesta pronto. me comunico al *111 y solo me dicen que el folio sigue en pendiente.",1  
10154395643583692_10154397511368692,no sirve su aplicacion de mi telcel... [[PHOTO]],0  
10154395643583692_10154397598508692,"buenas tardes, gracias por su atencion brindada... pude resolver mi duda y asi sabre que es lo mejor para mi. saludos.",1  
10154394898978692_10154397173938692,q precio tiene el plan????,2    
10154394898978692_10154397265133692,para solicitarlo?,1 

我想感谢任何克服此问题的建议,感谢您的支持,

错误如下:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Program Files\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   1944             try:
-> 1945                 return self._engine.get_loc(key)
   1946             except KeyError:

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()

KeyError: 'class'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-54-f52e2494564b> in <module>()
     15 
     16 
---> 17 classification = df1['class']
     18 
     19 

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Program Files\Anaconda3\lib\site-packages\pandas\indexes\base.py in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()

KeyError: 'class'

1 个答案:

答案 0 :(得分:3)

试试这个:

df1.columns = [c.strip() for c in list(df1.columns.values)]
print(df1["class"])

问题是您的class标头包含空格。使用.strip()剥离该空格允许pandas找到标题,从而避免使用KeyError