Question

我希望我的程序采用亚种标题（例如＆＃39; Ablepharus bivittatus＆＃39;）并将其存储为字符串键。然后我希望程序将以下行的序列ID（整数）向上，直到下一个亚种标题。整数将作为值存储到最初抓取的子物种键。

我希望程序能够提示用户输入字符串，然后通过所有字典键搜索输入以找到完全匹配（区分大小写，此处拼写很重要）然后返回序列ID 。

最有效的方法是什么？现在我可以将两个实体（ID和子物种名称）分开但我不知道如何在迭代文本文件时创建一个字典来存储这些值。

有些行包含相同的名称，但重复多次，我怎么能告诉程序检测到它并且只将多个相同的亚种名称中的第一个匹配为一个字符串键？

文本文件的格式如下

感谢您的时间

Ablepharus bivittatus   
36630
31764
31212
01996
09953
03744
14036
16094
01875
19076
09496
20583
24160
23142
26892
06533
05488
Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953
Ablepharus chernovi eiselti SCHMIDTLER 1997
Ablepharus chernovi isauriensis SCHMIDTLER 1997
Ablepharus chernovi ressli SCHMIDTLER 1997
31212
01996
09637
14036
20583
23142
21989
26892
28697
09207
09206
Ablepharus darvazi  
06245
26892

这里有一些代码我到目前为止一直在搞这个问题。

dictionary = {}

with open("repCleanSubs2.txt") as file:
    for line in file:
        (key, val) = line.split()
        dictionary[val(key)] = val
print key(1)







'''import re
file = open('repCleanSubs2.txt')
subspecies = []
dnaIDs = []
for line in file:
    match = re.findall('^[a-zA-Z]+', line)
        if match:
            subspecies.append(line)
            #Grab sequence IDs under this line ^ 
            #
            #Until you reach next string match





print dnaIDs
#userInput = raw_input("Which subspecies would you like to view?: ")
#if userInput == re.match(subspecies(line)):
#   print subspecies(line)'''
# print sequences IDs from the line grabbed here ^`

Answer 1

您可能希望使用file.read().splitlines()来获取行列表。
通过这些行进行迭代并检查它们是否是新的亚种或ID似乎是最合适的。
然后你可以使用＆＃34; current＆＃34;在迭代期间将名称作为字典键，并将新ID添加到该列表中。

这似乎符合您的要求：

import re


data = {}
lines = []

with open("data.txt") as f:
    lines = f.read().splitlines()
name = ""
for l in lines:
    if re.match("\d{5}", l):
        data[name].append(l)
    else:
        name = l.strip()
        data[name] = []

print data

它产生以下输出：

{
    "Ablepharus chernovi isauriensis SCHMIDTLER 1997": [], 
    "Ablepharus bivittatus": [
        "36630", 
        "31764", 
        "31212", 
        "01996", 
        "09953", 
        "03744", 
        "14036", 
        "16094", 
        "01875", 
        "19076", 
        "09496", 
        "20583", 
        "24160", 
        "23142", 
        "26892", 
        "06533", 
        "05488"
    ], 
    "Ablepharus chernovi ressli SCHMIDTLER 1997": [
        "31212", 
        "01996", 
        "09637", 
        "14036", 
        "20583", 
        "23142", 
        "21989", 
        "26892", 
        "28697", 
        "09207", 
        "09206"
    ], 
    "Ablepharus darvazi": [
        "06245", 
        "26892"
    ], 
    "Ablepharus chernovi eiselti SCHMIDTLER 1997": [], 
    "Ablepharus chernovi Ablepharus chernovi chernovi DAREVSKY 1953": []
}

我不确定你的意思是某些行包含重复的相同名称，如果你能详细说明这一点并表明你的预期输出那么可以合并。

最后，返回用户提供的给定密钥的序列ID将如下所示：

print(data[raw_input()])

将.txt文件中的字符串和整数存储到Python 2.7中的字典中

1 个答案: