Python获取相关的软件名称

时间:2017-01-27 13:06:03

标签: python nltk

我有一个excel表,其中包含许多软件名称,如Visual Studio 2012,Visual Studio 2013,Visual Studio 2017,Adobe Reader英语,Adobe Reader Deutsche,Power shell 4.0,Power shell 2.0,Power Shell 5.0。

我想只获得一个相关的软件版本名称。例如,在这种情况下,我希望我的输出是Visual Studio 2013,Power shell 4.0,Adobe Reader英语,剩下的就剩下了。我正在使用Python NLP。我删除了所有垃圾字符和版本号,但我不确定如何继续进行。

任何进一步构建的想法?在获得两个没有任何数字和垃圾字符的软件名称后,我尝试了序列匹配,但结果并不准确和有效。

import pandas as pd
from nltk.tokenize import wordpunct_tokenize

df = pd.read_csv('C:\\Users\\533471\\Desktop\\Book2.csv', encoding='Windows-1252')
saved_column = df.RowLabels[:]
second_column = df.RowLabels[:]

print(saved_column)

for eachcol in saved_column:
    eachword = eachcol.split()
    print(eachword)

    for secondcol in second_column:
        sentence = None
        wordo = None
        punct = None

        x = []
        copy = []
        secondword = secondcol.split()[:]

        ####proceed only if the first word is equal
        if eachword[0] in secondword[0]:
            print("true")
            sentence = eachword[:]
            sentence += secondword

            ####splitting according to punctuations.
            for token in sentence:
                word = wordpunct_tokenize(token)

                if wordo is None:
                    wordo = word
                else:
                    wordo += word

            ####Removing all the punctuations.
            punct = [item for item in wordo if item.isalpha()]
            t = punct[:]
            t.reverse()

            for p in punct:
                print(p)
                if len(x) > 0:
                    print(x, "Appended")
                    a = str(p)
                    x += [p]
                    if p == x[0]:
                        break
                else:
                    print("list is empty")

                    x += [p]

            x.pop()
            for z in t:
                print(z)
                if len(copy) > 0:
                    print(copy, "appended")

                    copy += [z]
                    if z == punct[0]:
                        break
                else:
                    print("list is empty")
                    copy += [z]

                print(copy)

        else:
            print("false")

0 个答案:

没有答案