One Hot Encoding 和 pandas.categorical.code 有什么区别

时间:2021-01-10 20:27:27

标签: python pandas scikit-learn categorical-data one-hot-encoding

我正在处理一些问题并且有如下疑问:

在数据集中有一个具有以下唯一值的文本列:

class FirstResponderNSSearchFieldController: NSViewController {

  @Binding var text: String
  var isFirstResponder : Bool = true

    init(text: Binding<String>, isFirstResponder : Bool = true) {
    self._text = text
    super.init(nibName: nil, bundle: nil)
  }

  required init?(coder: NSCoder) {
    fatalError("init(coder:) has not been implemented")
  }

  override func loadView() {
    let searchField = NSSearchField()
    searchField.delegate = self
    
    self.view = searchField
  }

  override func viewDidAppear() {
    self.view.window?.makeFirstResponder(self.view)
  }
}


extension FirstResponderNSSearchFieldController: NSSearchFieldDelegate {

  func controlTextDidChange(_ obj: Notification) {
    if let textField = obj.object as? NSTextField {
      self.text = textField.stringValue
    }
  }
}

如果我使用 Count Vectorize 将它们转换为一种热编码,


array(['1 bath', 'na', '1 shared bath', '1.5 baths', '1 private bath',
       '2 baths', '1.5 shared baths', '3 baths', 'Half-bath',
       '2 shared baths', '2.5 baths', '0 shared baths', '0 baths',
       '5 baths', 'Private half-bath', 'Shared half-bath', '4.5 baths',
       '5.5 baths', '2.5 shared baths', '3.5 baths', '15.5 baths',
       '6 baths', '4 baths', '3 shared baths', '4 shared baths',
       '3.5 shared baths', '6 shared baths', '6.5 shared baths',
       '6.5 baths', '4.5 shared baths', '7.5 baths', '5.5 shared baths',
       '7 baths', '8 shared baths', '5 shared baths', '8 baths',
       '10 baths', '7 shared baths'], dtype=object)

我收到以下错误:


<块引用>

AttributeError: 'float' 对象没有属性 'lower'


请告诉我错误的原因。

我可以用它代替:

vectorizer = CountVectorizer()
vectorizer.fit(X_train[colname].values) 

One hot encoding 和 pd.categorical.code 有什么区别?

谢谢 阿米特·莫迪

1 个答案:

答案 0 :(得分:1)

如果你想要使用 Pandas 的一种热编码,你可以这样做:

pandas.get_dummies(X_train[colname])[0]
相关问题