连接嵌套列表时数据类型不匹配

时间:2019-07-12 21:43:51

标签: python-3.x pyspark-sql

我正在尝试解决一个问题,该问题使我找到了最喜欢的零食与商品表上小吃名称相匹配的人,遇到的问题是商品表被嵌套,并给出了没有道理的错误。数组(字符串)不匹配字符串)

我试图在网上寻找如何处理嵌套列的过程,遇到了2个问题。 1.大多数有意义的东西都在scala上,当我尝试在python上做时,它有语法错误。 2.由于某种原因无法在我的pyspark上找到爆炸。

peopleDF是:

root
|-- email: string (nullable = true)
|-- fave_snack: string (nullable = true)
|-- first_name: string (nullable = true)
|-- gender: string (nullable = true)
|-- id: long (nullable = true)
|-- ip_address: string (nullable = true)
|-- last_name: string (nullable = true)

goodsDF是:

root
|-- products: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- body_html: string (nullable = true)
|    |    |-- created_at: string (nullable = true)
|    |    |-- handle: string (nullable = true)
|    |    |-- id: long (nullable = true)
|    |    |-- images: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- created_at: string (nullable = true)
|    |    |    |    |-- height: long (nullable = true)
|    |    |    |    |-- id: long (nullable = true)
|    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |-- product_id: long (nullable = true)
|    |    |    |    |-- src: string (nullable = true)
|    |    |    |    |-- updated_at: string (nullable = true)
|    |    |    |    |-- variant_ids: array (nullable = true)
|    |    |    |    |    |-- element: long (containsNull = true)
|    |    |    |    |-- width: long (nullable = true)
|    |    |-- options: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- name: string (nullable = true)
|    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |-- values: array (nullable = true)
|    |    |    |    |    |-- element: string (containsNull = true)
|    |    |-- product_type: string (nullable = true)
|    |    |-- published_at: string (nullable = true)
|    |    |-- tags: array (nullable = true)
|    |    |    |-- element: string (containsNull = true)
|    |    |-- title: string (nullable = true) ##this is the title I'm using
|    |    |-- updated_at: string (nullable = true)
|    |    |-- variants: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- available: boolean (nullable = true)
|    |    |    |    |-- compare_at_price: string (nullable = true)
|    |    |    |    |-- created_at: string (nullable = true)
|    |    |    |    |-- featured_image: struct (nullable = true)
|    |    |    |    |    |-- alt: string (nullable = true)
|    |    |    |    |    |-- created_at: string (nullable = true)
|    |    |    |    |    |-- height: long (nullable = true)
|    |    |    |    |    |-- id: long (nullable = true)
|    |    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |    |-- product_id: long (nullable = true)
|    |    |    |    |    |-- src: string (nullable = true)
|    |    |    |    |    |-- updated_at: string (nullable = true)
|    |    |    |    |    |-- variant_ids: array (nullable = true)
|    |    |    |    |    |    |-- element: long (containsNull = true)
|    |    |    |    |    |-- width: long (nullable = true)
|    |    |    |    |-- grams: long (nullable = true)
|    |    |    |    |-- id: long (nullable = true)
|    |    |    |    |-- option1: string (nullable = true)
|    |    |    |    |-- option2: string (nullable = true)
|    |    |    |    |-- option3: string (nullable = true)
|    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |-- price: string (nullable = true)
|    |    |    |    |-- product_id: long (nullable = true)
|    |    |    |    |-- requires_shipping: boolean (nullable = true)
|    |    |    |    |-- sku: string (nullable = true)
|    |    |    |    |-- taxable: boolean (nullable = true)
|    |    |    |    |-- title: string (nullable = true)
|    |    |    |    |-- updated_at: string (nullable = true)
|    |    |-- vendor: string (nullable = true)

我试图加入他们的代码是:

peopleDF.join(goodsDF, peopleDF.fave_snack == goodsDF.products.title,"leftouter").show()

期望在PeopleDF中为表提供条目,以使fave_snack列与product.title匹配。但是实际结果是错误消息:

pyspark.sql.utils.AnalysisException: "cannot resolve '(`fave_snack` = `products`.`title`)' due to data type mismatch: differing types in '(`fave_snack` = `products`.`title`)' (string and array<string>).;;\n'Join LeftOuter, (fave_snack#1 = products#14.title)\n:- Relation[email#0,fave_snack#1,first_name#2,gender#3,id#4L,ip_address#5,last_name#6] json\n+- Relation[products#14] json\n"

任何见解都会有所帮助,谢谢。

0 个答案:

没有答案