如何使用这种类型的列?

时间:2019-04-12 10:13:25

标签: sql apache-spark databricks

我不怎么从此SQL列类型获取相关信息:

array<
 struct<
  day_of_week:string,
  start:bigint,
  duration:bigint,
  enabled:boolean,
  created_at:timestamp,
  deleted_at:timestamp
  >
>

此列在数据库中包含有关餐馆的每日营业时间的信息。有一些餐厅改变了我们的日常运作,因此,我实际上不需要SQL表中的某些行。所有需要的就是所有餐馆的当前营业时间。

这是我尝试从中获取信息的列的示例:

[
  {
    "day_of_week": "4",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-02-23T10:47:15.033+0000",
    "deleted_at": "2018-10-22T18:27:40.403+0000"
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "5",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "6",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:40.397+0000",
    "deleted_at": "2018-10-22T18:27:42.074+0000"
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:40.397+0000",
    "deleted_at": "2018-10-22T18:27:42.074+0000"
  },
  {
    "day_of_week": "1",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:42.069+0000",
    "deleted_at": "2018-10-22T18:29:11.035+0000"
  },
  {
    "day_of_week": "6",
    "start": 64800000,
    "duration": 359,
    "enabled": true,
    "created_at": "2018-10-22T18:29:11.030+0000",
    "deleted_at": null
  },
  {
    "day_of_week": "7",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-10-22T18:27:42.069+0000",
    "deleted_at": "2018-10-22T18:29:11.035+0000"
  },
  {
    "day_of_week": "2",
    "start": 64800000,
    "duration": 359,
    "enabled": false,
    "created_at": "2018-02-23T10:47:15.033+0000",
    "deleted_at": "2018-10-22T18:27:40.403+0000"
  },

我对此信息不感兴趣,因为它已于2018-10-22删除:

[{"day_of_week":"4","start":64800000,"duration":359,"enabled":false,
"created_at":"2018-02-23T10:47:15.033+0000","deleted_at":"2018-10-22T18:27:40.403+0000"}

但是我对本专栏中所有看起来像这样的部分都感兴趣,因为它显示了day_of_week的营业时间:7。

"day_of_week":"7","start":64800000,"duration":359,"enabled":true,
"created_at":"2018-10-22T18:29:11.030+0000","deleted_at":null

我已经尝试过获取列的所有元素,但是它仅返回单元格的第一个类似元素,仅此而已:

LATERAL VIEW explode(shifts.`day_of_week`) exploded_table as day_of_week
LATERAL VIEW explode(shifts.`start`) exploded_table as start
LATERAL VIEW explode(shifts.`enabled`) exploded_table as enabled
LATERAL VIEW explode(shifts.`duration`) exploded_table as duration

有人可以帮我吗!

另外,我想"start":64800000是指开放时间

"duration":359餐厅营业时间。但是我也不知道如何解释这些数字。我不知道"start":64800000是指上午7点,上午8点,上午9点吗?如果是“持续时间”:359 7小时9小时??

很抱歉,发表了这么长的文章,但是我对SQL还是陌生的,在这里,我是唯一真正的资源,可以找出我无知的事情。

在此先感谢您提供的任何帮助。

1 个答案:

答案 0 :(得分:0)

TLDR:

For a dataframe df with schema:

key:integer
data:array
  element:struct
    day_of_week:string
    start:decimal(38,0)
    duration:decimal(38,0)
    enabled:boolean
    created_at:string
    deleted_at:string

which is registered as temp table test can be exploded with:

select key, a.ed.day_of_week,
  a.ed.start, a.ed.duration,
  a.ed.enabled, a.ed.created_at, a.ed.deleted_at
from (select key, explode(data) as ed from global_temp.test) a
where a.ed.deleted_at is null

See: https://imgur.com/a/bFcoSz3