在具有特殊字符的python字典上创建Pyspark数据框

时间:2020-08-10 21:47:20

标签: python apache-spark pyspark

我有一个如下的python字典:

data = [{"cust_decision": "buy", "cust_details": "Easy to use"}, {"cust_decision": "buy", "cust_details": "econoimical"}, {"cust_decision":"no buy", "cust_details": "Didn’t like Product"}]

我正在根据以下数据创建pyspark df和temp视图:

from pyspark.sql import SparkSession, Row
spark.createDataFrame([Row(**i) for i in data]).createOrReplaceTempView("cust")

现在,当我看到此临时视图的数据时,特殊字符'(这不是单引号,它是)变成了另一个字符â。以下是结果

spark.table("cust").show(10,False)
+-------------+---------------------+                                           
|cust_decision|cust_details         |
+-------------+---------------------+
|buy          |Easy to use          |
|buy          |econoimical          |
|no buy       |Didn’t like Product|
+-------------+---------------------+ 

但是我想按每个值获取字符。我该如何实现? 预期结果如下:

+-------------+---------------------+                                           
|cust_decision|cust_details         |
+-------------+---------------------+
|buy          |Easy to use          |
|buy          |econoimical          |
|no buy       |Didn’t like Product  |
+-------------+---------------------+ 

谢谢..

1 个答案:

答案 0 :(得分:1)

尝试通过 df$z <- ifelse(df$y=='blank', 0, 1) 将您的数据字典访问 decoding

utf-8