用熊猫创造开仓和收盘股票

时间:2017-01-04 10:53:17

标签: python pandas

我有以下数据

date                 qty  p_id  type    
2014-08-04 21:04:00   3   a     inward  
2014-08-04 22:04:00   3   a     outward 
2014-08-04 21:04:00   10  b     inward  
2014-08-04 10:04:00   5   b     outward 
2014-10-04 21:04:00   40  c     inward  
2014-11-04 21:04:00   5   c     outward 
2014-10-05 21:04:00   10  c     inward  
2014-09-05 21:04:00   4   b     outward

代码到目前为止我已经尝试过了。它看起来效率不高,数据也不合适。

df = pd.DataFrame({ 
    'date': ['2014-08-04 21:04:00','2014-08-04 22:04:00','2014-08-04 21:04:00','2014-08-04 10:04:00','2014-10-04 21:04:00','2014-11-04 21:04:00','2014-10-05 21:04:00','2014-09-05 21:04:00'], 
    'p_id'  :['a','a','b','b','c','c','c','b'],
    'qty' :[3,3,10,5,40,5,10,4], 
    'type' :['inward','outward','inward','outward','inward','outward','inward','outward'] 
})
inward = df['type'] == 0
outward = df['type'] == 1

df.date = pd.to_datetime(df.date)
df.set_index('date', inplace=True)
df.type = df.type.map({0:'inward', 1:'outward'})

df.groupby(['p_id', 'type']).resample('D')['qty'].sum().unstack(1, fill_value=0)
df1 = df.groupby(['p_id', 'type']).resample('D')['qty'].sum().unstack(1, fill_value=0).reset_index()

df1.sort_values(['date', 'p_id'])
df1['opening'] = df1['closing'] = 0
for i in range(1, len(df1)):
    df1.loc[i, 'opening'] = (df1.loc[i-1, 'closing'])
    df1.loc[i, 'closing'] = (df1.loc[i, 'inward'] + df1.loc[i, 'opening']) - df1.loc[i, 'outward']    

我试图获得以下结果,但失败了。

Date        open    inward  outward close   p_id
2014-08-04  0       3       3       0       a
2014-08-04  0       10      5       5       b
2014-08-04  0       40      5       35      c
2014-08-05  5       0       4       1       b
2014-08-05  35      10      0       45      c
2014-08-06  1       0       0       1       b
2014-08-06  45      0       0       45      c

2 个答案:

答案 0 :(得分:1)

问题不是很明确,但我认为以下代码应该让您走上正轨。一切都以某种方式评论,应该清楚发生了什么。

import pandas as pd

df = pd.DataFrame({
    'date': ['2014-08-04 21:04:00','2014-08-04 22:04:00','2014-08-04 21:04:00','2014-08-04 10:04:00','2014-10-04 21:04:00','2014-11-04 21:04:00','2014-10-05 21:04:00','2014-09-05 21:04:00'],
    'p_id'  :['a','a','b','b','c','c','c','b'],
    'qty' :[3,3,10,5,40,5,10,4],
    'type' :['inward','outward','inward','outward','inward','outward','inward','outward']
})

# change datetime strings to datetime objects
df.date = pd.to_datetime(df.date)
# change the datetime to date
df.date = df.date.apply(lambda x:x.date())

# Use pivot_table in order to perform such operations
df = pd.pivot_table(data=df,columns="type", values="qty", index=["p_id","date"])

# replace nans with zeros
df = df.fillna(0)

# move multiindex back to the columns and start a new, default index
df = df.reset_index()

# add the opening and closing calculation (not efficient, but not the problematic part after all)
df["opening"]=0
df["closing"]=0
for i in range(1, len(df)):
    df.loc[i, 'opening'] = (df.loc[i-1, 'closing'])
    df.loc[i, 'closing'] = (df.loc[i, 'inward'] + df.loc[i, 'opening']) - df.loc[i, 'outward']

# change the order of columns and index to the desired output outlay
df = df[["date","inward","outward","opening","closing","p_id"]]
df = df.set_index("date")
print df

这应该首先产生你想要的东西:

   type        inward  outward  opening  closing p_id
    date                                              
    2014-08-04     3.0      3.0      0.0      0.0    a
    2014-08-04    10.0      5.0      0.0      5.0    b
    2014-09-05     0.0      4.0      5.0      1.0    b
    2014-10-04    40.0      0.0      1.0     41.0    c
    2014-10-05    10.0      0.0     41.0     51.0    c
    2014-11-04     0.0      5.0     51.0     46.0    c

答案 1 :(得分:0)

我不确定你的结果表(打开和关闭的定义)

import datetime as dt
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA=StringIO("""date;qty;p_id;type
2014-08-04 21:04:00;3;a;inward
2014-08-04 22:04:00;3;a;outward
2014-08-04 21:04:00;10;b;inward
2014-08-04 10:04:00;5;b;outward
2014-10-04 21:04:00;40;c;inward
2014-11-04 21:04:00;5;c;outward
2014-10-05 21:04:00;10;c;inward
2014-09-05 21:04:00;4;b;outward""")

df = pd.read_csv(TESTDATA, sep=";")
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(lambda x: dt.datetime.strftime(x, '%Y-%m-%d'))

df = pd.pivot_table(df,  columns=['type'], values = ['qty'], index=['date', 'p_id'])
df.reset_index( inplace=True, drop=False)
df.columns = ['date', 'p_id', 'inward', 'outward']
df.fillna(0, inplace=True)
df

给出:

    date    p_id    inward  outward
0   2014-08-04  a   3.0     3.0
1   2014-08-04  b   10.0    5.0
2   2014-09-05  b   0.0     4.0
3   2014-10-04  c   40.0    0.0
4   2014-10-05  c   10.0    0.0
5   2014-11-04  c   0.0     5.0