Question

我想对第3列中的所有值求和，以使用熊猫认为更有效的熊猫第一和第二列将结果保存到新的csv文件中。

可以加在一起的最大值在0到2之间

如果存在除0.5，1或2以外的值或字符，则将忽略加法。

csv文件的示例：

encounterId|chartTime|11885|67187|6711|6711|6710|1356|1357|1358|1359|1360|1361|1362|1366|140|140

325|2014-01-01 00:00:00|0
325|2014-01-01 01:00:00|0|0|0
325|2014-01-01 02:00:00|0
325|2014-01-01 03:00:00|0|0|0
325|2014-01-01 04:00:00|0
325|2014-01-01 05:00:00|1
325|2014-01-01 06:00:00|0|0|0
325|2014-01-01 07:00:00|1|0|0.5|1
325|2014-01-01 08:00:00|0
325|2014-01-01 09:00:00|1|0|0
325|2014-01-01 10:00:00|0
325|2014-01-01 11:00:00|1|0|0
325|2014-01-01 12:00:00|0
325|2014-01-01 13:00:00|0|0|0.5|1
325|2014-01-01 14:00:00|0
325|2014-01-01 15:00:00|0

我正在寻找什么：

323|2013-06-03 00:00:00|0
323|2013-06-03 01:00:00|1
323|2013-06-03 02:00:00|1.5
323|2013-06-03 03:00:00|1.5
323|2013-06-03 04:00:00|0
323|2013-06-03 05:00:00|0.5
323|2013-06-03 06:00:00|0
323|2013-06-03 07:00:00|3.5
323|2013-06-03 08:00:00|0.5

我尝试过没有熊猫，这给了我一些奇怪的结果

Answer 1

您可以按照上一个答案here

的建议，求和并设置参数轴= 1

Answer 2

使用此，

Dim nameArray() As Variant
Dim resultArray() As Variant

nameArray = Array("france", "usa", "germany", "switzerland", "spain")

For each name in nameArray
    With w2.Worksheets(name)
        .Range("D2:S17").Value = w1.Worksheets(name).Range("D2:S17").Value
        .Range("AX2:BM17").Value = w1.Worksheets(name).Range("AX2:BM17").Value
        .Range("AB2:AQ17").Value = w1.Worksheets(name).Range("AB2:AQ17").Value
        .Name = .Name & "_tab1"

        resultArray = .Range("D2:S17").Value ' 2D array
        ' do array calculations here
    End With
Next

输出：

from io import StringIO
csvfile = StringIO("""323|2013-06-03 00:00:00|0|0|0
323|2013-06-03 01:00:00|1|
323|2013-06-03 02:00:00|1|0|0.5|86
323|2013-06-03 03:00:00|1|0|0.5|0
323|2013-06-03 04:00:00|0
323|2013-06-03 05:00:00|0|0|0.5|0
323|2013-06-03 06:00:00|0
323|2013-06-03 07:00:00|1|0|0.5|2
323|2013-06-03 08:00:00|0|0.5""")

df = pd.read_csv(csvfile, sep='|', names=['ID','date','A','B','C','D'])

df_out = df.set_index(['ID','date'])

df_out.where((df_out>0) & (df_out<=2), 0)\
      .sum(1)\
      .reset_index()\
      .to_csv('outfile.csv', index=False, header=False)

!type outfile.csv

Answer 3

请注意，pd.read_csv()如果读取列数可变的csv会抛出错误，除非您事先提供了列名。应该这样做：

import pandas as pd
import numpy as np

df = pd.read_csv('sample.txt', names=['Index','Date','Val1','Val2','Val3','Val4'], sep='|')

df[df[['Val1','Val2','Val3','Val4']]>2] = np.nan

df['Final'] = df.iloc[:,2:].sum(axis=1)

df = df[['Index','Date','Final']]

礼物：

   Index                 Date  Final
0    323  2013-06-03 00:00:00    0.0
1    323  2013-06-03 01:00:00    1.0
2    323  2013-06-03 02:00:00    1.5
3    323  2013-06-03 03:00:00    1.5
4    323  2013-06-03 04:00:00    0.0
5    323  2013-06-03 05:00:00    0.5
6    323  2013-06-03 06:00:00    0.0
7    323  2013-06-03 07:00:00    3.5
8    323  2013-06-03 08:00:00    0.5

这是一种更简洁的方法（与下面@Scott Boston的回答非常相似，但避免了创建单独的数据框）。通过将csv的前两列设置为数据框的索引，可以有条件地过滤仅包含浮点值的数据框的其余部分：

df = pd.read_csv('sample.txt', names=['Index','Date','Val1','Val2','Val3','Val4'], sep='|').set_index(['Index','Date'])

df['Final'] = df[(df>0) & (df<=2)].sum(axis=1)

df.reset_index()[['Index','Date','Final']].to_csv('output.csv', index=False, header=False)

礼物：

323,2013-06-03 00:00:00,0.0
323,2013-06-03 01:00:00,1.0
323,2013-06-03 02:00:00,1.5
323,2013-06-03 03:00:00,1.5
323,2013-06-03 04:00:00,0.0
323,2013-06-03 05:00:00,0.5
323,2013-06-03 06:00:00,0.0
323,2013-06-03 07:00:00,3.5
323,2013-06-03 08:00:00,0.5

Answer 4

怎么样？

for row in df.rows:
   row[row.columns[2]]=sum(row[row.columns[>1]])

使用熊猫的csv值的总和

4 个答案: