python列表中的停用词删除

时间:2020-04-06 21:27:40

标签: python list

我的句子列表如下

pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow','A grape is red']

如果我定义了停用词列表,例如

stopwords=['This', 'is', 'an', 'The']

我是否有办法将其应用于整个列表,使得我的输出是

pylist=['apple','orange','pineapple is yellow','A grape is red']

PS:我尝试将apply与定义为删除[removewords(x) for x in pylist]之类的停用词的函数一起使用,但未成功(而且不确定这是否是最有效的方法)。 谢谢!

2 个答案:

答案 0 :(得分:2)

我认为您的输出并不是您真正想要的。停用词“ is”仍然包含在内。

我的尝试如下:

pylist = ['This is an apple', 'This is an orange', 'The pineapple is yellow', 'A grape is red']
stopwords = ['This', 'is', 'an', 'The']

stopwords = set(w.lower() for w in stopwords)


def remove_words(s, stopwords):
    s_split = s.split()
    s_filtered = [w for w in s_split if not w.lower() in stopwords]
    return " ".join(s_filtered)


result = [remove_words(x, stopwords) for x in pylist]

result

['apple', 'orange', 'pineapple yellow', 'A grape red']

为了进行合理的有效搜索(在一个集合中查找当然需要花费恒定的时间),我将停用词的小写形式存储在一个集合中。通常,删除停用词应该不区分大小写。

旁注:删除停用词通常很有帮助,甚至有必要。但是请注意,在某些情况下不建议删除停用词:https://towardsdatascience.com/why-you-should-avoid-removing-stopwords-aa7a353d2a52

更新:当您确实确定需要摆脱所有可能的停用词时,请确保您不要错过任何停用词-以yatu的建议为例:看看nltk 。尤其是如果明年,您可能会面临必须添加西班牙的paraparas de paradas,法国的mot d'arrêt和德国的Stopp-Wörter的问题。

答案 1 :(得分:1)

您可以使用嵌套列表推导,并将set定义为O(1),以将查找复杂度降低到pylist=['This is an apple', 'This is an orange', 'The pineapple is yellow', 'A grape is red'] stopwords = set(['This', 'is', 'an', 'The']) [' '.join([w for w in s.split() if w not in stopwords]) for s in pylist] # ['apple', 'orange', 'pineapple yellow', 'A grape red']

stopwords

但是请注意,对于更通用的方法,您可以使用nltk的英语语料库中的from nltk.corpus import stopwords stop_w = set(stopwords.words('english')) [' '.join([w for w in s.split() if w.lower() not in stop_w]) for s in pylist] # ['apple', 'orange', 'pineapple yellow', 'grape red']

private void configureFirebase(String projectID, String applicationID, String APIkey, String databaseURL, String storageBucket) {
    FirebaseOptions options = new FirebaseOptions.Builder()
            .setProjectId(projectID)
            .setApplicationId(applicationID)
            .setApiKey(APIkey)
            .setDatabaseUrl(databaseURL)
            .setStorageBucket("gs://myProjectID.appspot.com")
            .build();

    try {
        FirebaseApp.initializeApp(context, options, "secondary");
    } catch (Exception e) {
        Log.d("Exception",e.toString());
    }

    FirebaseApp secondary = FirebaseApp.getInstance("secondary");
    FirebaseDatabase otherDatabase = FirebaseDatabase.getInstance(secondary);

    databaseReference = otherDatabase.getReference();

    FirebaseStorage storage = FirebaseStorage.getInstance("secondary");
    storageRef = storage.getReferenceFromUrl(storageBucket);

    saveBucketUrl(storageBucket);
}
相关问题