Question

我有一个包含大约 20 万行记录/行的大文本文件。

但我只需要提取以 CLM 开头的特定行。例如，如果文件有 100K 行以 CLM 开头，我应该单独打印所有这 100K 行。

谁能帮助我使用 python 脚本来实现这一点？

Answer 1

试试：

with open('file.txt') as f:
    for line in f:
        if line.startswith('CLM'):
            print(line.rstrip())

Answer 2

有多种方法可以实现这一点。

您可以简单地遍历行并使用 re 库搜索模式

解决方案 1

# Note :- Regex is faster in terms of execution as compared to string match
import re
pattern = re.compile("CLM")

for line in open("sample.txt"):
    for match in re.finditer(pattern, line):
        print(line)

如果你愿意，你也可以在 python 脚本中运行 bash 命令。

解决方案 2

有两个流行的模块可供使用：- os 和 subprocess

os 有点过时了，我建议使用 subprocess 模块，如下所示：-

以下是在控制台上打印输出的代码：-
```
import subprocess
process = subprocess.Popen(['grep', '-i', '^hel*', 'sample.txt'],
                           stdout=subprocess.PIPE,
                           stderr=subprocess.PIPE,universal_newlines=True)
stdout, stderr = process.communicate()
print(stdout)
```
在上面，我们传递参数 universal_newlines=True 因为输出 (stdout) 是字节类型。

在上面的 grep 命令中，我传递了 -i 参数以忽略区分大小写。如果您只想搜索 CLM 而不是 clm，请将其删除并使用它

我使用了 grep 命令来描述用例，您也可以根据需要使用 awk 或 sed 或任何命令。

只是一个插件，如果你想将输出保存在某个文件中，假设 ouput.txt 你可以实现如下：-

import subprocess
with open('output.txt', 'w') as f:
    process = subprocess.Popen(['grep', '-i', '^hel*', 'file.txt'], stdout=f)

如果您的文件非常大，您还可以执行 poll 并检查子进程执行状态。有关详细信息，请参阅以下链接。

Python-Shell-Commands

有没有办法使用python从文本文件中仅提取特定行

2 个答案:

解决方案 1

解决方案 2