Question

我有一个文件＆＃39; peaks_ee.xpk＆＃39;我试图使用该文件中的值在我的python代码中创建一个字典。

j = 0;
contents_atom = []
atom_lines=[]
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        float_str = re.findall("\d\.H\d'?", name)
        if (len(float_str)>1):
            j = j+1
            value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
            atom_lines.insert(-1,value1)                     
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

我正在阅读文件peaks_ee.xpk。这就是peaks_ee.xpk的样子：

peaks_ee

这是来自peaks_ee.xpk的示例代码段：

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0

我想制作一个以原子名称为键的字典。在peaks_ee.xpk中的原子名称是＆＃34; 1.H1＆＃39;＆＃34;，＆＃34; 2.H8＆＃34;等等。我希望该值是化学位移，它们是例如＆＃34; 5.82020＆＃34;和＆＃34; 7.61004＆＃34; （这来自peaks_ee.xpk中的0行）例如，我希望字典看起来像：

dict = { "1.H1'":"5.82020", "2.H8":"7.61004"...}

但下一行重复2.H8和1.H1＆＃39;再次，所以它不需要被添加到字典中。之后的行（第2行）应该添加到字典中，因为它有一个名为1.H8的新原子，所以它应该是

dict = {"1.H1'":"5.82020", "2.H8":"7.61004", "1.H8:8.13712", ...}

我该怎么做？

编辑：如果我有另一个文件＆＃34; ee_pinkH1.xpk＆＃34;我想阅读它，看看那里的化学位移值是否在一定范围内，然后打印出这些值，这会是代码吗？

这是我的全部代码：

import os
import sys
import re

i = 0;
contents_peak = []
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak ' + str(i) + ' '+  str(float_num[0])+ ' 0.05 ' + str(float_num[1])+ ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write('rbclust \n')
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

j = 0;
contents_atom = []
atom_lines=[]
result = {}
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        for match in rex.finditer(line):
            name,shift = match.groups()
        if name not in result: 
            result[name] = float(shift)
            float_str = re.findall("\d\.H\d'?", name)
            if (len(float_str)>1):
                j = j+1
                if peakPPM = 'ee_pinkH1.xpk':
                    if 5<=float_num<=6.25:
                        value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

Answer 1

在使用in添加密钥之前，只需检查密钥是否已在字典中。

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value

由于看起来你有多个键值对来检查每一行，你可以在每一行重复这个功能，如下所示：

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value
    atom_name = line.split()[8][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[9])
        dict[atom_name] = atom_value

顺便说一句，你的意思是编辑这篇文章吗？我还回复了您较早的duplicate帖子。

Answer 2

您可以扩展正则表达式模式以包含化学位移并获得每场比赛所需的内容。将括号括在要保留的图案部分周围，以便捕获它们。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

迭代所有比赛;名称和班次将在match.groups()元组中;如果名字还没有被看到，那么就把它添加到词典中。

with open(filepath) as atom_name:
    data = atom_name.read()
result = {}
for match in rex.finditer(data):
    name, shift = match.groups()
    #print(name,shift)
    if name not in result:
        result[name] = float(shift)

如果文件太大而无法一次阅读，请一次提取一行信息。

with open(filepath) as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            #print(name,shift)
            if name not in result:
                result[name] = float(shift)

如何使用我在

2 个答案: