ValueError: arrays must all be same length in python using pandas DataFrame

时间:2017-07-12 08:05:05

标签: python-3.x pandas dataframe resampling

I'm a newbie in python and using Dataframe from pandas package (python3.6).

I set it up like below code,

df = DataFrame({'list1': list1, 'list2': list2, 'list3': list3, 'list4': list4, 'list5': list5, 'list6': list6})

and it gives an error like ValueError: arrays must all be same length

So I checked all the length of arrays, and list1 & list2 have 1 more data than other lists. If I want to add 1 data to those other 4 lists(list3, list4, list5, list6) by using pd.resample, then how should I write code...?

Also, those lists are time series list with 1 minute.

Does anybody have an idea or help me out here?

Thanks in advance.

EDIT So I changed as what EdChum said. and added time list at the front. it is like below.

2017-04-01 0:00 895.87  730 12.8    4   19.1    380
2017-04-01 0:01 894.4   730 12.8    4   19.1    380
2017-04-01 0:02 893.08  730 12.8    4   19.3    380
2017-04-01 0:03 890.41  730 12.8    4   19.7    380
2017-04-01 0:04 889.28  730 12.8    4   19.93   380

and I typed code like

df.resample('1min', how='mean', fill_method='pad')

And it gives me this error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

2 个答案:

答案 0 :(得分:3)

I'd just construct a Series for each list and then concat them all:

In [38]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1, name='list1')
s2 = pd.Series(l2, name='list2')
df = pd.concat([s1,s2], axis=1)
df

Out[38]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

As you can pass a name arg for the Series ctor it will name each column in the df, plus it will place NaN where the column lengths don't match

resample refers to when you have a DatetimeIndex for which you want to rebase or adjust the length based on some time period which is not what you want here. You want to reindex which I think is unnecessary and messy:

In [40]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1)
s2 = pd.Series(l2)
df = pd.DataFrame({'list1':s1.reindex(s2.index), 'list2':s2})
df

Out[40]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

Here you'd need to know the longest length and then reindex all Series using that index, if you just concat it will automatically adjust the lengths and fill missing elements with NaN

答案 1 :(得分:0)

根据this documentation,使用pd.resample()执行此操作看起来非常困难:您应该计算一个只为您的df添加一个值的频率,并且该功能似乎真的没有为此做出^^ (似乎允许轻松重塑,例如:1分钟到30秒或1小时)!你最好尝试EdChum做的事情:P