将.txt文件解析为单个.csv输出

时间:2019-05-23 00:31:04

标签: python

我当前正在尝试解析2个文本文件,然后有一个.csv输出。一个包含路径/文件位置的列表,另一个包含与路径/文件位置相关的其他信息。

第一个文本文件包含(path.txt):

C:/Windows/System32/vssadmin.exe
C:/Users/Administrator/Desktop/google.com

第二个文本文件包含(filelist.txt):

-= List of files in hash: =-

$VAR1 = {
          'File' => [
                      {
                        'RootkitInfo' => 'Normal',
                        'FileVersionLabel' => '6.1.7600.16385',
                        'ProductVersion' => '6.1.7601.17514',
                        'Path' => 'C:/Windows/System32/vssadmin.exe',
                        'Signer' => 'Microsoft Windows',
                        'Size' => '210944',
                        'SHA1' => 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
                        },
                        {
                        'RootkitInfo' => 'Normal',
                        'FileVersionLabel' => '6.1.7600.16385',
                        'ProductVersion' => '6.1.7601.17514',
                        'Path' => 'C:/Users/Administrator/Desktop/steam.exe',
                        'Signer' => 'Valve Inc.',
                        'Size' => '300944',
                        'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b65d94'
                        },
                        {
                        'RootkitInfo' => 'Normal',
                        'FileVersionLabel' => '6.1.7600.16385',
                        'ProductVersion' => '6.1.7601.17514',
                        'Path' => 'C:/Users/Administrator/Desktop/google.com',
                        'Signer' => 'Valve Inc.',
                        'Size' => '300944',
                        'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b78987'
                        },
                        .
                        .
                        .
                    ]
          }

如何使用.csv输出包含文件路径及其相应的哈希值?另外,如果我想添加与该路径相对应的其他列/信息?

样本表输出:

    <table>
      <tr>
        <th>File Path</th>
        <th>Hash Value</th> 
      </tr>
      <tr>
        <td>C:/Windows/System32/vssadmin.exe</td>
        <td>da39a3ee5e6b4b0d3255bfef95601890afd80709</td> 
      </tr>
      <tr>
        <td>C:/Users/Administrator/Desktop/google.com</td>
        <td>cf23df2207d99a74fbe169e3eba035e633b78987</td> 
      </tr>
    </table>

2 个答案:

答案 0 :(得分:1)

要解析所称的第二个.txt(不是第二个import ast contents = "" # this will be to hold the read contents of that file filestart = False with open('filelist.txt') as fh: for line in fh: if not filestart and not line.startswith("$VAR"): continue elif line.startswith("$VAR"): contents+="{" # start the dictionary filestart = True # to kill the first if statement else: contents += line # fill out with rest of file # create dictionary, we use ast here because json will fail result = ast.literal_eval(contents.replace("=>", ":")) # {'File': [{'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Windows/System32/vssadmin.exe', 'Signer': 'Microsoft Windows', 'Size': '210944', 'SHA1': 'da39a3ee5e6b4b0d3255bfef95601890afd80709'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/steam.exe', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b65d94'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/google.com', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b78987'}]} files = result["File"] # get your list from here ),您将需要重新构造它,使其看起来像普通的python数据结构。它非常接近,可以通过多种方法强制使其看起来像这样:

file: hash

现在,它采用可容忍的格式,我将其转换为files_dict = {file['Path']: file['SHA1'] for file in files} # now grab your other file, and lookups should be quite simple with open("path.txt") as fh: results = [f"{filepath.strip()}, {files_dict.get(filepath.strip())}" for filepath in fh] # Now you can put that to a csv with open("paths.csv", "w") as fh: fh.write('File Path, Hash Value') # write the header fh.write('\n'.join(results)) 个键值对的字典,以便轻松查找其他文件

#include <iostream>
using std::cout, std::cerr, std::endl, std::flush,
      std::hex, std::dec, std::cin;

#include <iomanip>
using std::setw, std::setfill;

#include <string>
using std::string, std::to_string;

#include <thread>
using std::thread, std::this_thread::sleep_for;

#include <vector>
using std::vector;

有更好的方法可以做到这一点,但这可以留给读者练习

答案 1 :(得分:1)

您可以构建与所需内容匹配的正则表达式模式

pattern = r"""{.*?(C:/Windows/System32/vssadmin.exe).*?'SHA1' => '([^']*)'.*?}"""

要在循环中将其与多个文件名一起使用,请将该模式转换为格式字符串

fmt = r"""{{.*?({}).*?'SHA1' => '([^']*)'.*?}}"""

类似这样的东西:

import re
with open('filelist.txt') as f:
    s = f.read()
with open('path.txt') as f:
    for line in f:
        pattern = fmt.format(line.strip())
        m = re.search(pattern, s, flags=re.DOTALL)
        if m:
            print(m.groups())
        else:
            print('no match for', fname)

效率有些低下,它取决于文件的内容是否完全像您表示的一样-大小写相同。


或者不使用正则表达式:在filelist.txt的行上进行迭代;找到Path行;用切片提取路径,看看它是否是来自path.txt的路径;找到下一个SHA1行;用切片提取哈希。这取决于两行相对于彼此的位置以及每行中字符的位置。这样可能会更有效。

with open('path.txt') as f:
    fnames = set(line.strip() for line in f)
with open('filelist.text') as f:
    for line in f:
        line = line.strip()
        if line.startswith("'Path'") and line[11:-2] in fnames:
            name = line[11:-2]
            while not line.startswith("'SHA1'"):
                line = next(f)
                line = line.strip()
            print((name, line[11:-2]))

这也假定文本文件与您所表示的一样。

相关问题