我怎样才能加速我的代码?

时间:2017-03-16 03:37:06

标签: python json

这是一个程序,如果用户输入了拼写错误,则向用户建议玩家的姓名。这非常慢。

首先它必须发出一个get请求,然后检查玩家的名字是否在json数据中,如果是,则传递。否则,它需要所有玩家的名字和姓氏,并将其附加到li。然后,它会使用names检查first_namelast_name是否与列表中的名称非常相似。我从一开始就知道这将是非常缓慢的,但必须有一个更快的方法来做到这一点,只是我无法想出一个。有什么建议吗?

get_close_matches

1 个答案:

答案 0 :(得分:1)

好吧,既然我在评论中得出了我的建议,我不妨将其作为答案发布,并附上其他一些想法。

首先,将您的I / O操作从功能中取出,这样您每次运行功能时都不会浪费时间发出请求。相反,当您启动脚本时,您将获得json并将其加载到本地内存中。如果可能的话,事先下载json数据,而不是打开文本文件可能是一个更快的选择。

其次,每个循环应该得到一组唯一的候选者,因为不需要多次比较它们。当get_close_matches()丢弃名称时,我们知道不需要再次比较同名。 (如果丢弃名称的标准取决于后续名称,那将是一个不同的故事,但我怀疑这是在这种情况。)

第三,尝试使用批次。鉴于get_close_matches()合理有效,与10个候选者相比,不应该比1更慢。但是将for循环从超过100万个元素减少到超过100K个元素是非常显着的提升。

第四,我假设您正在检查last_name == ['LastName'] and first_name == ['FirstName'],因为在那种情况下不会有拼写错误。那么为什么不简单地突破这个功能呢?

将它们放在一起,我可以编写一个如下所示的代码:

from difflib import get_close_matches

# I/O operation ONCE when the script is run
my_request = get_request("https://www.mysportsfeeds.com/api/feed/pull/nfl/2016-2017-regular/active_players.json")

# Creating batches of 10 names; this also happens only once
# As a result, the script might take longer to load but run faster.
# I'm sure there is a better way to create batches, but I'm don't know any.
batch = []    # This will contain 10 names.
names = []    # This will contain the batches.

for player in my_request['activeplayers']['playerentry']:
    name = player['FirstName'] + " " + player['LastName']
    batch.append(name)

    # Obviously, if the number of names is not a multiple of 10, this won't work!
    if len(batch) == 10:
        names.append(batch)
        batch = []

def suggest(first_name, last_name, names):

    desired_name = first_name + " " + last_name
    suggestions = []

    for batch in names:

        # Just print the name if there is no typo
        # Alternatively, you can create a flat list of names outside of the function 
        # and see if the desired_name is in the list of names to immediately 
        # terminate the function. But I'm not sure which method is faster. It's
        # a quick profiling task for you, though.
        if desired_name in batch:
            return desired_name

        # This way, we only match with new candidates, 10 at a time.
        best_matches = get_close_matches(desired_name, batch)
        suggestions.append(best_matches)

    # We need to flatten the list of suggestions to print.
    # Alternatively, you could use a for loop to append in the first place.
    suggestions = [name for batch in suggestions for name in batch]

    return  "did you mean " + ", ".join(suggestions) + "?"

print suggestion("mattthews ", "stafffford") #should return Matthew Stafford