pyspark.sql理解语法和函数爆炸和拆分之间的区别

时间:2016-07-10 16:41:26

标签: split apache-spark-sql

我正在接受mooc

它有一个shakespeareDF数据框,低于文本

word                                             |
+-------------------------------------------------+
|1609                                             |
|                                                 |
|the sonnets                                      |
|                                                 |
|by william shakespeare                           |
|                                                 |
|                                                 |
|                                                 |
|1                                                |
|from fairest creatures we desire increase        |
|that thereby beautys rose might never die        |
|but as the riper should by time decease          |
|his tender heir might bear his memory            |
|but thou contracted to thine own bright eyes     |
|feedst thy lights flame with selfsubstantial fuel|
+-------------------------------------------------+

在它上面,它们运行在代码

之下
from pyspark.sql.functions import split, explode
shakeWordsDF = (shakespeareDF.select(explode(split(shakespeareDF[0],"\s+"))

我想了解:

  1. 爆炸和分裂有什么区别,为什么我们必须这样做 使用两者?我试着查看在线文档并且无法进行 明白
  2. 为什么我们必须使用shakespeareDF [0]而不仅仅是 shakespeareDF

1 个答案:

答案 0 :(得分:0)

Q.1 看here

Q.2    shakespeareDF [0] - 选择第一列