在python中从不同字符串中提取关键字的最佳方法是什么?

时间:2018-01-26 12:31:46

标签: python text nlp text-processing data-extraction

我希望从一组文本片段中提取重要的关键词,这些文本片段实际上是在任何交易后收到的短信。 下面是一个示例数据集:

{"message": "*boi star sandesh* rs 20 has been debited to your account xx2136 from pos-paytm.com on 08-11-2014.available balance 275.00.", "number": "boiind"}
{"message": "your a/c xxxxx388847 debited inr 7,500.00 on 12/08/16 -transferred to mr. rajendra kurmi . a/c balance inr 1,314.45", "number": "amcbssbi"}
{"message": "an amount of rs.10,000.00 has been debited from your account  number xxxx1152 for an online payment txn done using hdfc bank netbanking.", "number": "dmhdfcbk"}
{"message": "your a/c no. xxxxxxxx1152 is debited for rs. 10,000.00 on 11-08-16 and a/c xxxxxxx847 credited (imps ref no 622421331357)", "number": "vkhdfcmp"}
{"message": "one time password for netbanking transaction is 785516. please use the password to complete the transaction. pls do not share this with anyone. ref no- xxxx4763", "number": "imhdfcbk"}
{"message": "your a/c no. xxxxxxxx3962 is debited for rs.20000.00 on 11-08-16 and a/c of unregistered has been credited (imps ref no 622421342625).", "number": "dmaxisbk"}

我需要从有关交易金额,剩余余额,日期和交易类型的消息中提取信息。

我应该采取什么方法,哪种模块最好?

FYI 来自相同号码的邮件具有相同的邮件格式,但我必须处理大量格式,因此为每个号码编写代码将是重复且耗时的。

1 个答案:

答案 0 :(得分:1)

使用regular expressions模块中的re

例如,为了找到每个字符串的日期,我们可以使用正则表达式

r" on (\d\d[-\/]\d\d[-\/]\d{2,4})"

相关问题