Question

我需要一个建议，如何比较未知数量的嵌套字典，以尽可能减少时间消耗。

所以这是一个例子：

我有房屋租赁公司的数据。一个房子可以在更多这些机构中。有关这些房屋的一些信息。

1。国家

2。日期

第3。其他信息，如房间数等。

此数据以这种方式存储：

DictionaryOnFirstLevel: key = Country, value = DictionaryOnSecondLevel
DictionaryOnSecondLevel: key = Date, value = instance of class House including price, Country, Date etc.

所以，我想要的是找到相同的房子（相同的两个房子不是同一个对象）并比较它们的价格和另一个数据。

由于我知道一个国家和一个约会，我不需要将每个房子与每个房子进行比较 - 我不必将爱尔兰的物体与土耳其的物体进行比较，同样的问题是日期 - 我不必比较具有不同日期的对象。

for date in first_agency.house_dict['Ireland']:
    for h1 in first_agency.house_dict['Ireland'][date]:
        if second_agency.house_dict['Ireland'].has_key(date): # to save some time
            for h2 in second_agency.house_dict['Ireland'][date]:
                if h1.equals(h2): # method equals do some approximative comparison of names of houses and other attributes
                     #do some calculations and stuff

以上代码仅适用于2个代理商（第1和第2个）和国家/地区＆＃39;爱尔兰＆＃39;。我只获得了爱尔兰＆＃39;关键，所以我不必与另一个约会合作，因为他们没有机会值得。

那么你能帮助我改进我的代码，以便比较所有代理商吗？

Answer 1

首先，我认为许多嵌套if + for循环的方法是一种不好的方法，因为你重复了一堆代码并且很难测试。

你最好把它分成几个函数

def get_all_duplicate_houses_on_date(date):
    """return a list of duplicate house lists (list within list)"""
    houses = get_all_houses_on_date(date)
    all_duplicates = []
    for house in houses:
         all_duplicate_lists.append(get_duplicates(house, houses)
    return all_duplicates

def get_duplicates(house, houselist):
    """return all duplicates of house in houselist"""
    duplicates = []
    for other_house in houselist:
        duplicates.append(other_house) if house.equal(other_house)
    return duplicates

def get_all_houses_on_date(date):
    """return all houses in every country on a given date"""
    all_houses = []
    for country in all_counties:
        all_houses.extend(get_all_houses_from_country(country, date))
    return [h for h in all_houses 

def get_all_houses_from_country_on_date(country, date):
    """returns all houses from all agencies in a given country
       and on a given date"""
    country_houses = []
    for agency in all_agencies:
        all_houses.extend(get_agency_houses_in_country(agency, country, date)
    return country_houses

def get_agency_houses_in_country_on_date(agency, country, date):
    """returns all houses from the given agency in a given country
       on a given date. Or return an empty list"""
    return agency.get(country, {}).get(date, [])

现在这完全被黑客攻击了，我还没有确认这里没有任何重大缺陷，但重点不在于代码的细节;关键是你创建了几个可测试的函数，每个函数执行一个独特的任务。一旦你有了重复列表，你可以随心所欲地做任何事情，但这应该是另一个功能。

在优化方面，单独尝试和优化上述任何一个功能可能要容易得多。看起来我上面的naiive实现就像O（n ^ n）时间费用;不是很好。我最好的猜测是你可能会把它降到O（n ^ 2）但它可能需要一些创造力。

比较更多嵌套词典（节省时间）

1 个答案: