Python有效地创建密度图

时间:2016-04-20 01:58:17

标签: python algorithm python-2.7 geospatial

我希望得到一些帮助,让我的代码运行得更快。

基本上我在列表insideoceanlist中有一个lat方格,长点。然后有一个包含lat, long坐标的数据文件的目录,代表特定日期的雷击。这个想法是每天,我们想知道方格上每个点周围有多少次雷击。目前只有两个环路,所以对于方形网格上的每个点,你都要检查当天每次雷击的距离。如果它在40公里以内,我在那个点上添加一个来制作密度图。

起始网格的整体形状为矩形,由宽度为0.11,长度为0.11的正方形组成。整个矩形约为50x30。最后我有一个shapefile,它概述了澳大利亚的“预测区域”,如果网格中的任何一点在该区域之外,那么我们省略它。因此,所有剩余的点(insideoceanlist)都是澳大利亚的。

方形网格上有大约100000个点,即使是缓慢的一天,也有大约1000次雷击,因此需要很长时间才能完成。有没有办法更有效地做到这一点?我真的很感激任何建议。

顺便说一下,我将list2更改为list3因为我听说迭代列表比python中的数组更快。

for i in range(len(list1)): #list1 is a list of data files containing lat,long coords for lightning strikes for each day
    dict_density = {}
    for k in insideoceanlist: #insideoceanlist is a grid of ~100000 lat,long points
        dict_density[k] = 0
    list2 = np.loadtxt(list1[i],delimiter = ",") #this open one of the files containing lat,long coords and puts it into an array
    list3 = map(list,list2) #converts the array into a list
    # the following part is what I wanted to improve
    for j in insideoceanlist:
        for l in list3:
            if great_circle(l,j).meters < 40000: #great_circle is a function which measures distance between points the two lat,long points
                dict_density[j] += 1
    #
    filename = 'example' +str(i) + '.txt'
        with open(filename, 'w') as f:
            for m in range(len(insideoceanlist)):
                f.write('%s\n' % (dict_density[insideoceanlist[m]])) #writes each point in the same order as the insideoceanlist
    f.close()

3 个答案:

答案 0 :(得分:3)

为了详细说明@ DanGetz的答案,这里有一些代码使用了打击数据作为驱动程序,而不是为每个打击点迭代整个网格。我假设你以澳大利亚的中位数为中心,有0.11度的网格正方形,即使一个度数的大小因纬度而异!

快速参考维基百科的一些背后计算告诉我,你的40公里距离是从北到南的±4网格范围,以及从东到西的±5网格范围。 (它在低纬度下降到4个方格,但是......嗯!)

如上所述,这里的技巧是以直接的,公式化的方式从打击位置(纬度/经度)转换为网格平方。弄清楚网格一个角的位置,从打击中减去该位置,然后除以网格的大小 - 0.11度,截断,并得到行/列索引。现在访问所有周围的广场,直到距离变得太大,最多1 +(2 * 2 * 4 * 5)= 81个方格检查距离。增加范围内的方块。

结果是,我在最多81次访问时间内进行了1000次攻击(或者你们拥有的数量很多),而不是访问100,000次网格方格次数达到1000次攻击。这是一个显着的性能提升。

请注意,您没有描述传入的数据格式,因此我只是随机生成数字。你想解决这个问题。 ; - )

#!python3

"""
Per WikiPedia (https://en.wikipedia.org/wiki/Centre_points_of_Australia)

Median point
============

The median point was calculated as the midpoint between the extremes of
latitude and longitude of the continent.

    24 degrees 15 minutes south latitude, 133 degrees 25 minutes east
    longitude (24°15′S 133°25′E); position on SG53-01 Henbury 1:250 000
    and 5549 James 1:100 000 scale maps.

"""
MEDIAN_LAT = -(24.00 + 15.00/60.00)
MEDIAN_LON = (133 + 25.00/60.00)

"""
From the OP:

The starting grid has the overall shape of a rectangle, made up of
squares with width of 0.11 and length 0.11. The entire rectange is about
50x30. Lastly I have a shapefile which outlines the 'forecast zones' in
Australia, and if any point in the grid is outside this zone then we
omit it. So all the leftover points (insideoceanlist) are the ones in
Australia.
"""

DELTA_LAT = 0.11
DELTA_LON = 0.11

GRID_WIDTH = 50.0 # degrees
GRID_HEIGHT = 30.0 # degrees

GRID_ROWS = int(GRID_HEIGHT / DELTA_LAT) + 1
GRID_COLS = int(GRID_WIDTH / DELTA_LON) + 1

LAT_SIGN = 1.0 if MEDIAN_LAT >= 0 else -1.0
LON_SIGN = 1.0 if MEDIAN_LON >= 0 else -1.0

GRID_LOW_LAT = MEDIAN_LAT - (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_HIGH_LAT = MEDIAN_LAT + (LAT_SIGN * GRID_HEIGHT / 2.0)
GRID_MIN_LAT = min(GRID_LOW_LAT, GRID_HIGH_LAT)
GRID_MAX_LAT = max(GRID_LOW_LAT, GRID_HIGH_LAT)

GRID_LOW_LON = MEDIAN_LON - (LON_SIGN * GRID_WIDTH / 2.0)
GRID_HIGH_LON = MEDIAN_LON + (LON_SIGN * GRID_WIDTH / 2.0)
GRID_MIN_LON = min(GRID_LOW_LON, GRID_HIGH_LON)
GRID_MAX_LON = max(GRID_LOW_LON, GRID_HIGH_LON)

GRID_PROXIMITY_KM = 40.0

"""https://en.wikipedia.org/wiki/Longitude#Length_of_a_degree_of_longitude"""
_Degree_sizes_km = (
    (0,  110.574, 111.320),
    (15, 110.649, 107.551),
    (30, 110.852, 96.486),
    (45, 111.132, 78.847),
    (60, 111.412, 55.800),
    (75, 111.618, 28.902),
    (90, 111.694, 0.000),
)

# For the Australia situation, +/- 15 degrees means that our worst
# case scenario is about 40 degrees south. At that point, a single
# degree of longitude is smallest, with a size about 80 km. That
# in turn means a 40 km distance window will span half a degree or so.
# Since grid squares a 0.11 degree across, we have to check +/- 5
# cols.

GRID_SEARCH_COLS = 5

# Latitude degrees are nice and constant-like at about 110km. That means
# a .11 degree grid square is 12km or so, making our search range +/- 4
# rows.

GRID_SEARCH_ROWS = 4

def make_grid(rows, cols):
    return [[0 for col in range(cols)] for row in range(rows)]

Grid = make_grid(GRID_ROWS, GRID_COLS)

def _col_to_lon(col):
    return GRID_LOW_LON + (LON_SIGN * DELTA_LON * col)

Col_to_lon = [_col_to_lon(c) for c in range(GRID_COLS)]

def _row_to_lat(row):
    return GRID_LOW_LAT + (LAT_SIGN * DELTA_LAT * row)

Row_to_lat = [_row_to_lat(r) for r in range(GRID_ROWS)]

def pos_to_grid(pos):
    lat, lon = pos

    if lat < GRID_MIN_LAT or lat >= GRID_MAX_LAT:
        print("Lat limits:", GRID_MIN_LAT, GRID_MAX_LAT)
        print("Position {} is outside grid.".format(pos))
        return None

    if lon < GRID_MIN_LON or lon >= GRID_MAX_LON:
        print("Lon limits:", GRID_MIN_LON, GRID_MAX_LON)
        print("Position {} is outside grid.".format(pos))
        return None

    row = int((lat - GRID_LOW_LAT) / DELTA_LAT)
    col = int((lon - GRID_LOW_LON) / DELTA_LON)

    return (row, col)


def visit_nearby_grid_points(pos, dist_km):
    row, col = pos_to_grid(pos)

    # +0, +0 is not symmetric - don't increment twice
    Grid[row][col] += 1

    for dr in range(1, GRID_SEARCH_ROWS):
        for dc in range(1, GRID_SEARCH_COLS):
            misses = 0
            gridpos = Row_to_lat[row+dr], Col_to_lon[col+dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row+dr][col+dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row+dr], Col_to_lon[col-dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row+dr][col-dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row-dr], Col_to_lon[col+dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row-dr][col+dc] += 1
            else:
                misses += 1
            gridpos = Row_to_lat[row-dr], Col_to_lon[col-dc]
            if great_circle(pos, gridpos).meters <= dist_km:
                Grid[row-dr][col-dc] += 1
            else:
                misses += 1
            if misses == 4:
                break

def get_pos_from_line(line):
    """
    FIXME: Don't know the format of your data, just random numbers
    """
    import random
    return (random.uniform(GRID_LOW_LAT, GRID_HIGH_LAT),
            random.uniform(GRID_LOW_LON, GRID_HIGH_LON))

with open("strikes.data", "r") as strikes:
    for line in strikes:
        pos = get_pos_from_line(line)
        visit_nearby_grid_points(pos, GRID_PROXIMITY_KM)

答案 1 :(得分:1)

如果您知道在网格上生成点的公式,则可以通过反转该公式来快速找到到给定点的最近网格点。

以下是一个激励性的例子,因为地球是一个球体,而不是平面或圆柱形,因此不适合您的目的。如果您无法轻松反转网格点公式以找到最近的网格点,那么您可以执行以下操作:

  • 创建第二个网格(我们称之为G2),一个简单的公式,如下所示,具有足够大的框,以便您可以确信最近的网格指向任何点在一个方框中,它们将位于同一个方框中,或者位于8个相邻方框中的一个方框中。
  • 创建dict,其中存储哪个原始网格(G1)点位于G2网格的哪个框中
  • 选择您要尝试分类的p点,然后找到它会进入的G2
  • p与此G1框中的所有G2点以及该框的所有直接邻居进行比较
  • 选择最接近G1
  • p

具有完美平面网格的激励示例

如果平面上有一个完美的正方形网格,那么它不会旋转,边长为d,那么它们的点可以通过一个简单的数学公式来定义。他们的纬度值都是

的形式
lat0 + d * i

表示某个整数值i,其中lat0是编号最小的纬度,其经度值的格式相同:

long0 + d * j

表示某个整数j。要查找给定(lat, long)对的最近网格点,您可以单独查找其纬度和经度。您网格上最近的纬度数字将在哪里

i = round((lat - lat0) / d)

同样j = round((long - long0) / d)为经度。

因此,您可以采用的一种方法是将其插入上面的公式,然后获取

grid_point = (lat0 + d * round((lat - lat0) / d),
              long0 + d * round((long - long0) / d)

只需在该网格点增加dict中的计数。这应该使您的代码比以前更快,因为不是检查数千个网格点的距离,而是通过几次计算直接找到网格点。

您可以使用ij数字作为多维数组的索引,而不是在grid_point中使用dict来提高速度。< / p>

答案 2 :(得分:0)

您是否尝试过将Numpy用于索引?您可以使用多维数组,索引应该更快,因为Numpy数组本质上是C数组的Python包装器。

如果你需要进一步加速,请看一下Cython,一个Python优化的C转换器。它对于多维索引特别有用,并且应该能够将这种类型的代码加速大约一个数量级。它将为您的代码添加一个额外的依赖项,但它是一个快速安装,并不太难实现。

Benchmarks),(Tutorial using Numpy with Cython

另外,请快速使用

for listI in list1:
    ...
    list2 = np.loadtxt(listI, delimiter=',')
 # or if that doesn't work, at least use xrange() rather than range()

当你明确需要range()函数生成的列表时,你应该只使用range()。在你的情况下,它不应该做太多,因为它是最外层的循环。