根据值将列转换为多列

时间:2017-08-14 23:03:44

标签: python pandas dataframe

在Python中,我想知道是否有办法从这个转换单列数据帧: enter image description here

进入这个:

enter image description here

2 个答案:

答案 0 :(得分:5)

来源DF:

In [204]: df
Out[204]:
     Country
0      Italy
1  Indonesia
2     Canada
3      Italy

我们可以使用pd.get_dummies()

In [205]: pd.get_dummies(df.Country)
Out[205]:
   Canada  Indonesia  Italy
0       0          0      1
1       0          1      0
2       1          0      0
3       0          0      1

sklearn.feature_extraction.text.CountVectorizer

In [211]: from sklearn.feature_extraction.text import CountVectorizer

In [212]: cv = CountVectorizer()

In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country), 
                                 columns=cv.get_feature_names(), 
                                 index=df.index,
                                 default_fill_value=0)

In [214]: r
Out[214]:
   canada  indonesia  italy
0       0          0      1
1       0          1      0
2       1          0      0
3       0          0      1

答案 1 :(得分:3)

其他几个选项

protected override void OnModelCreating(ModelBuilder modelBuilder) { // Configure model foreach (var entityType in modelBuilder.Model.GetEntityTypes()) { foreach (var declaredForeignKey in entityType.GetDeclaredForeignKeys()) { declaredForeignKey.Relational().Name = "<Construct_FK_Name>"; } } }

pd.Series.str.get_dummies

df.Country.str.get_dummies() Canada Indonesia Italy 0 0 0 1 1 0 1 0 2 1 0 0 3 0 0 1 pd.DataFrame.groupby

value_counts

df.groupby(level=0).Country.value_counts().unstack(fill_value=0) Country Canada Indonesia Italy 0 0 0 1 1 0 1 0 2 1 0 0 3 0 0 1 + pd.factorize

np.bincount

f, u = pd.factorize(df.Country.values) pd.DataFrame( np.bincount( f + np.arange(f.size) * u.size, minlength=u.size * f.size ).reshape(f.size, u.size), df.index, u ) Italy Indonesia Canada 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0 + pd.factorize

np.eye

f, u = pd.factorize(df.Country.values) pd.DataFrame(np.eye(u.size, dtype=int)[f], df.index, u) Italy Indonesia Canada 0 1 0 0 1 0 1 0 2 0 0 1 3 1 0 0 +数组切片分配

pd.factorize
相关问题