我当前正在尝试解析2个文本文件,然后有一个.csv输出。一个包含路径/文件位置的列表,另一个包含与路径/文件位置相关的其他信息。
第一个文本文件包含(path.txt):
C:/Windows/System32/vssadmin.exe
C:/Users/Administrator/Desktop/google.com
第二个文本文件包含(filelist.txt):
-= List of files in hash: =-
$VAR1 = {
'File' => [
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Windows/System32/vssadmin.exe',
'Signer' => 'Microsoft Windows',
'Size' => '210944',
'SHA1' => 'da39a3ee5e6b4b0d3255bfef95601890afd80709'
},
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Users/Administrator/Desktop/steam.exe',
'Signer' => 'Valve Inc.',
'Size' => '300944',
'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b65d94'
},
{
'RootkitInfo' => 'Normal',
'FileVersionLabel' => '6.1.7600.16385',
'ProductVersion' => '6.1.7601.17514',
'Path' => 'C:/Users/Administrator/Desktop/google.com',
'Signer' => 'Valve Inc.',
'Size' => '300944',
'SHA1' => 'cf23df2207d99a74fbe169e3eba035e633b78987'
},
.
.
.
]
}
如何使用.csv输出包含文件路径及其相应的哈希值?另外,如果我想添加与该路径相对应的其他列/信息?
样本表输出:
<table>
<tr>
<th>File Path</th>
<th>Hash Value</th>
</tr>
<tr>
<td>C:/Windows/System32/vssadmin.exe</td>
<td>da39a3ee5e6b4b0d3255bfef95601890afd80709</td>
</tr>
<tr>
<td>C:/Users/Administrator/Desktop/google.com</td>
<td>cf23df2207d99a74fbe169e3eba035e633b78987</td>
</tr>
</table>
答案 0 :(得分:1)
要解析所称的第二个.txt
(不是第二个import ast
contents = "" # this will be to hold the read contents of that file
filestart = False
with open('filelist.txt') as fh:
for line in fh:
if not filestart and not line.startswith("$VAR"):
continue
elif line.startswith("$VAR"):
contents+="{" # start the dictionary
filestart = True # to kill the first if statement
else:
contents += line # fill out with rest of file
# create dictionary, we use ast here because json will fail
result = ast.literal_eval(contents.replace("=>", ":"))
# {'File': [{'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Windows/System32/vssadmin.exe', 'Signer': 'Microsoft Windows', 'Size': '210944', 'SHA1': 'da39a3ee5e6b4b0d3255bfef95601890afd80709'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/steam.exe', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b65d94'}, {'RootkitInfo': 'Normal', 'FileVersionLabel': '6.1.7600.16385', 'ProductVersion': '6.1.7601.17514', 'Path': 'C:/Users/Administrator/Desktop/google.com', 'Signer': 'Valve Inc.', 'Size': '300944', 'SHA1': 'cf23df2207d99a74fbe169e3eba035e633b78987'}]}
files = result["File"] # get your list from here
),您将需要重新构造它,使其看起来像普通的python数据结构。它非常接近,可以通过多种方法强制使其看起来像这样:
file: hash
现在,它采用可容忍的格式,我将其转换为files_dict = {file['Path']: file['SHA1'] for file in files}
# now grab your other file, and lookups should be quite simple
with open("path.txt") as fh:
results = [f"{filepath.strip()}, {files_dict.get(filepath.strip())}" for filepath in fh]
# Now you can put that to a csv
with open("paths.csv", "w") as fh:
fh.write('File Path, Hash Value') # write the header
fh.write('\n'.join(results))
个键值对的字典,以便轻松查找其他文件
#include <iostream>
using std::cout, std::cerr, std::endl, std::flush,
std::hex, std::dec, std::cin;
#include <iomanip>
using std::setw, std::setfill;
#include <string>
using std::string, std::to_string;
#include <thread>
using std::thread, std::this_thread::sleep_for;
#include <vector>
using std::vector;
有更好的方法可以做到这一点,但这可以留给读者练习
答案 1 :(得分:1)
您可以构建与所需内容匹配的正则表达式模式
pattern = r"""{.*?(C:/Windows/System32/vssadmin.exe).*?'SHA1' => '([^']*)'.*?}"""
要在循环中将其与多个文件名一起使用,请将该模式转换为格式字符串。
fmt = r"""{{.*?({}).*?'SHA1' => '([^']*)'.*?}}"""
类似这样的东西:
import re
with open('filelist.txt') as f:
s = f.read()
with open('path.txt') as f:
for line in f:
pattern = fmt.format(line.strip())
m = re.search(pattern, s, flags=re.DOTALL)
if m:
print(m.groups())
else:
print('no match for', fname)
效率有些低下,它取决于文件的内容是否完全像您表示的一样-大小写相同。
或者不使用正则表达式:在filelist.txt
的行上进行迭代;找到Path
行;用切片提取路径,看看它是否是来自path.txt
的路径;找到下一个SHA1
行;用切片提取哈希。这取决于两行相对于彼此的位置以及每行中字符的位置。这样可能会更有效。
with open('path.txt') as f:
fnames = set(line.strip() for line in f)
with open('filelist.text') as f:
for line in f:
line = line.strip()
if line.startswith("'Path'") and line[11:-2] in fnames:
name = line[11:-2]
while not line.startswith("'SHA1'"):
line = next(f)
line = line.strip()
print((name, line[11:-2]))
这也假定文本文件与您所表示的一样。