如何使用自然语言处理从文本pyspark中提取简单字符串

时间:2019-10-09 11:55:44

标签: apache-spark pyspark nlp

我有一个pyspark数据框,其中包含4列。 一列包含文本(数据是非结构化的)。下面是此列的数据示例:

data = [('Ambitioni dedisse scripsisse iudicaretur',)
,('Cras mattisiudicium',)
,('purus sit amet fermentum',)
,('Donec sed odio operae- NORMAL)
,('eu vulputate felis - A300B4-61 - MP 13219',)
,('Praeterea iter est - quasdam res - MP 28180',)
,('quas ex communi - ,)
,('At nos hinc posthat CONTROL - FADEC',)
,('sitientis piros Afros. Petierunt',)
,('uti sibi concilium totius Galliae-2 - GENERATION',)
,('in dim - V105X )
,('Cras mattis iudicium',)]

df = spark.createDataFrame(data, ["text"])

预期输出示例:

   Interest Column == Exemple data                                                                      new_column                                                                                                       
    --------------------------------------------------------------------------------------------------------------------------------------|----------------------------
    Cras mattis iudicium -INTRODCE A NEW STANDARD 

    ------------------------------------------------------------------------------------------------------------------------
    Praeterea iter est                       
    ------------------------------------------------------------------------------------------------------------------------

    Cras mattis iudicium purus sit amet fermentum. 
    ------------------------------------------------------------------------------------------------------------------------
     class to truncate the text ---------------------------------------------------------------------------------------------------------|----------------------------
    Ambitioni dedisse -
    ------------------------------------------------------------------------------------------------------------------------
    For left, right, ------------------------------------------------------------------------------------------------------
    TCAS II - Praeterea iter est     | 
    ------------------------------------------------------------------------------------------------------------------------
    Donec sed odio operae 
    ------------------------------------------------------------------------------------------------------------------------
    Ambitioni dedisse                                                                                            |
    ------------------------------------------------------------------------------------------------------------------------


My question: 
Thank you



0 个答案:

没有答案
相关问题