Question

我想找到与另一个numpy匹配的索引对应的行的总和。

以下示例更好地证明了这一点。

`apply plugin: 'com.android.application'

android {
    compileSdkVersion 26
    defaultConfig {
        applicationId "com.example.androidlife.myapplication"
        minSdkVersion 15
        targetSdkVersion 26
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
    }
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }
}

dependencies {
    implementation fileTree(dir: 'libs', include: ['*.jar'])
    implementation 'com.android.support:appcompat-v7:26.1.0'
    implementation 'com.android.support.constraint:constraint-layout:1.0.2'
    testImplementation 'junit:junit:4.12'
    androidTestImplementation 'com.android.support.test:runner:0.5'
    androidTestImplementation 'com.android.support.test.espresso:espresso-core:2.2.2'
}`

我更喜欢输出是字典，例如

A=np.array(['a-1','b-1','b-1','c-2','a-1','b-1','c-2']);
b = np.array([1.21,2.34,1.2,2.8,10.0,0.9,8.4]);;

结果是b数组的元素总和，对应于A数组中出现相同值的索引。有没有一种有效的方法来做到这一点？我的阵列很大（数百万的订单）

Answer 1

方法＃1

我们可以使用np.unique和np.bincount -

的组合

' assume normal screen  else go through GetDeviceCaps(hDCDesk, LOGPIXELSX) etc etc
' 1440 twips / inch  pts / pix = 3/4  inch 100 pts
'  so twips / pixel = 15

Sub GetRaXy(Ra As Range, X&, Y&)    ' in twips
    Dim ppz!
    ppz = ActiveWindow.Zoom / 75  '  zoom is %   so   100 * 3/4  =>75
' only  the pixels of rows and columns are zoomed 
  X = (ActiveWindow.PointsToScreenPixelsX(0) + Ra.Left * ppz) * 15
  Y = (ActiveWindow.PointsToScreenPixelsY(0) + Ra.Top * ppz) * 15
End Sub

Function InputRealVal!(Optional RaTAdd$ = "K11")
Dim IStr$, RAt As Range, X&, Y&
Set RAt = Range(RaTAdd)
    GetRaXy RAt, X, Y
    IStr = InputBox(" Value ", "ENTER The Value ", 25, X, Y)
    If StrPtr(IStr) = 0 Then
        MsgBox "Cancel Pressed"
        Exit Function
    End If
    If IsNumeric(IStr) Then
        InputRealVal = CDec(IStr)
    Else
        MsgBox "Bad data entry"
        Exit Function
    End If
End Function

因此，In [48]: unq, ids = np.unique(A, return_inverse=True) In [49]: dict(zip(unq, np.bincount(ids, b))) Out[49]: {'a-1': 11.210000000000001, 'b-1': 4.4400000000000004, 'c-2': 11.199999999999999}为np.unique中的每个字符串提供了唯一的整数映射，然后将其输入A，使用这些整数作为基于bin的加权求和的bin，权重来自np.bincount。

方法＃2（特定情况）

假设b中的字符串始终为A个字符，更快捷的方法是将这些字符串转换为数字，然后将其用作3的输入。我们的想法是np.unique使用数字比使用字符串更快。

因此，实施将是 -

np.unique

神奇的部分是重塑后的In [141]: n = A.view(np.uint8).reshape(-1,3).dot(256**np.arange(3)) In [142]: unq, st, ids = np.unique(n, return_index=1, return_inverse=1) In [143]: dict(zip(A[st], np.bincount(ids, b))) Out[143]: {'a-1': 11.210000000000001, 'b-1': 4.4400000000000004, 'c-2': 11.199999999999999}作为一个视图保留，因此非常有效：

viewing

或者我们可以使用In [150]: np.shares_memory(A,A.view(np.uint8).reshape(-1,3)) Out[150]: True的{{1}}参数（axis中添加的功能） -

np.unique

Answer 2

另一种方法，使用pandas：

import pandas as pd
df = pd.DataFrame(data=[pd.Series(A),pd.Series(b)]).transpose()
res = df.groupby(0).sum()

给出

res
Out[62]: 
         1
0         
a-1  11.21
b-1   4.44
c-2  11.20

你可以得到你想要的这个词：

res_dict = res[1].to_dict()

哪个给出了

Out[64]: 
{'a-1': 11.210000000000001,
 'b-1': 4.4400000000000004,
 'c-2': 11.199999999999999}

Answer 3

numpy_indexed包（dsiclaimer：我是它的作者）包含以高效和优雅的方式执行这些类型操作的功能：

import numpy_indexed as npi
k, v = npi.group_by(A).sum(b)
d = dict(zip(k, v))

我觉得大熊猫的分组语法非常笨重;并且没有必要将数据重组为新的数据结构以执行这样的基本操作。

有效地对与另一个数组匹配的索引相对应的numpy数组的元素求和

3 个答案: