读取pyspark2中的文本文件

时间:2018-09-17 20:50:20

标签: pyspark apache-spark-2.2

我正在尝试使用python在spark 2.3中读取文本文件,但出现此错误。 这是textFile的格式:

name marks
amar 100
babul 70
ram 98
krish 45

代码:

df=spark.read.option("header","true")\
    .option("delimiter"," ")\
    .option("inferSchema","true")\
    .schema(
        StructType(
            [
                StructField("Name",StringType()),
                StructField("marks",IntegerType())
            ]
        )
    )\
    .text("file:/home/maria_dev/prac.txt") 

错误:

java.lang.AssertionError: assertion failed: Text data source only
produces a single data column named "value"

当我尝试将textFile读入RDD时,将其作为单个列收集。

应该更改数据文件还是应该更改代码?

1 个答案:

答案 0 :(得分:3)

使用.csv代替.text(仅生成单个值列),而使用>>> df=spark.read.option("header","true")\ .option("delimiter"," ")\ .option("inferSchema","true")\ .schema( StructType( [ StructField("Name",StringType()), StructField("marks",IntegerType()) ] ) )\ .csv('file:///home/maria_dev/prac.txt') >>> from pyspark.sql.types import * >>> df DataFrame[Name: string, marks: int] >>> df.show(10,False) +-----+-----+ |Name |marks| +-----+-----+ |amar |100 | |babul|70 | |ram |98 | |krish|45 | +-----+-----+ 将文件加载到DF。

    <p>
    <audio controls
    src="https://soundbible.com/mp3/Tyrannosaurus%20Rex%20Roar-SoundBible.com-807702404.mp3">
    </audio>