Question

我想在图表中的某个时间可视化作业车间所需的机器数量，在x轴上是连续时间轴，在y轴上是移位数量。

在下面的数据框中，您可以找到我的数据示例。在这里，您可以看到Shift_ID s（它们是唯一的）以及该班次的开始和结束时间。在一天的时间里，我想知道在一定的时间间隔内需要多少台机器。这可以是5分钟，一刻钟，半小时和小时。

df:
   Shift_ID     Shift_Time_Start       Shift_Time_End
0         1   2016-03-22 9:00:00   2016-03-22 9:35:00
1         2   2016-03-22 9:20:00  2016-03-22 10:20:00
2         3   2016-03-22 9:40:00  2016-03-22 10:14:00
3         4  2016-03-22 10:00:00  2016-03-22 10:31:00

在本季度9：30-9：45的示例中，我需要3台机器才能在该特定时间完成每个班次。所需的输出看起来像这样：

df2:

                                    Interval  Count
0    2016-03-22 9:00:00 - 2016-03-22 9:15:00      1
1    2016-03-22 9:15:00 - 2016-03-22 9:30:00      2
2    2016-03-22 9:30:00 - 2016-03-22 9:45:00      3
3   2016-03-22 9:45:00 - 2016-03-22 10:00:00      2
4  2016-03-22 10:00:00 - 2016-03-22 10:15:00      2
5  2016-03-22 10:15:00 - 2016-03-22 10:30:00      2
6  2016-03-22 10:30:00 - 2016-03-22 10:45:00      1

使用此数据框，我可以将其四舍五入到区间的最低边界，然后将其绘制在图表中。

我被困在如何“看”换班是否位于多个区间内。你有什么想法解决这个问题吗？

注意：所有日期时间值当然是日期时间类型

在解决MaxU和knightofni之后编辑

我使用MaxU的代码来绘制你的代码。他们似乎都在15分钟内做得很好，但请用5分钟看看你的结果：

MaxU：

knightofni：

编辑2 2015年4月4日

Answer 1

你可以这样做：

代码：

import io
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# load sample data into DF (data frame)
data="""\
idx;Shift_ID;Shift_Time_Start;Shift_Time_End
0;1;2016-03-22 09:00:00;2016-03-22 09:35:00
1;2;2016-03-22 09:20:00;2016-03-22 10:20:00
2;3;2016-03-22 09:40:00;2016-03-22 10:14:00
3;4;2016-03-22 10:00:00;2016-03-22 10:31:00
4;5;2016-03-22 08:11:00;2016-03-22 09:35:00
4;6;2016-03-23 14:11:00;2016-03-23 14:23:00
"""
df = pd.read_csv(io.StringIO(data), sep=';', index_col=0,
                 parse_dates=['Shift_Time_Start','Shift_Time_End'])


# time interval
freq = '10min'

# prepare resulting DF with desired intervals
a  = pd.DataFrame({
  'begin': pd.date_range(df.Shift_Time_Start.min(),
                         df.Shift_Time_End.max(), freq=freq)
})
# resample 
a = a.set_index('begin').resample(rule='5min').first().reset_index()

a['end'] = a.begin + pd.Timedelta(freq)

# count number of unique Shift_ID's in `DF` for each interval in `a`
def f(x):
    return  df[( (x.begin >= df.Shift_Time_Start) \
                 & \
                 (x.begin <= df.Shift_Time_End)
               ) \
               | \
               ( (x.end >= df.Shift_Time_Start) \
                 & \
                 (x.end <= df.Shift_Time_End)
               ) \
              ].Shift_ID.nunique()


a['count'] = a.apply(f, axis=1)
# remove rows without any shifts
a = a[a['count'] > 0].reset_index(drop=True)


a['interval'] = a.begin.dt.strftime('%d.%m %H:%M').astype(str) + \
                ' - ' + \
                a.end.dt.strftime('%d.%m %H:%M').astype(str)

a = a.set_index('interval')[['count']]
print(a)

matplotlib.style.use('ggplot')

a.plot(kind='bar', alpha=0.75)
fig = plt.gcf()
fig.subplots_adjust(bottom=0.2)

plt.show()

源数据集：

In [135]: df
Out[135]:
     Shift_ID    Shift_Time_Start      Shift_Time_End
idx
0           1 2016-03-22 09:00:00 2016-03-22 09:35:00
1           2 2016-03-22 09:20:00 2016-03-22 10:20:00
2           3 2016-03-22 09:40:00 2016-03-22 10:14:00
3           4 2016-03-22 10:00:00 2016-03-22 10:31:00
4           5 2016-03-22 08:11:00 2016-03-22 09:35:00
4           6 2016-03-23 14:11:00 2016-03-23 14:23:00

In [136]: a
Out[136]:
                           count
interval
22.03 08:10 - 22.03 08:20      1
22.03 08:15 - 22.03 08:25      1
22.03 08:20 - 22.03 08:30      1
22.03 08:25 - 22.03 08:35      1
22.03 08:30 - 22.03 08:40      1
22.03 08:35 - 22.03 08:45      1
22.03 08:40 - 22.03 08:50      1
22.03 08:45 - 22.03 08:55      1
22.03 08:50 - 22.03 09:00      2
22.03 08:55 - 22.03 09:05      2
22.03 09:00 - 22.03 09:10      2
22.03 09:05 - 22.03 09:15      2
22.03 09:10 - 22.03 09:20      3
22.03 09:15 - 22.03 09:25      3
22.03 09:20 - 22.03 09:30      3
22.03 09:25 - 22.03 09:35      3
22.03 09:30 - 22.03 09:40      4
22.03 09:35 - 22.03 09:45      4
22.03 09:40 - 22.03 09:50      2
22.03 09:45 - 22.03 09:55      2
22.03 09:50 - 22.03 10:00      3
22.03 09:55 - 22.03 10:05      3
22.03 10:00 - 22.03 10:10      3
22.03 10:05 - 22.03 10:15      3
22.03 10:10 - 22.03 10:20      3
22.03 10:15 - 22.03 10:25      2
22.03 10:20 - 22.03 10:30      2
22.03 10:25 - 22.03 10:35      1
22.03 10:30 - 22.03 10:40      1
23.03 14:05 - 23.03 14:15      1
23.03 14:10 - 23.03 14:20      1
23.03 14:15 - 23.03 14:25      1
23.03 14:20 - 23.03 14:30      1

Answer 2

这不太容易。我无法真正想到一种完全矢量化的方法，但这里有两种方法可以工作。

1-重新组织您的数据，以便您只有一个日期时间列。目标是为每个 shift_ID 每最小间隔一行。然后你就可以到了一个timegrouper groupby。

工作示例：

重新创建您的DataFrame

import pandas as pd
import arrow

data = {
    'Shift_ID' : [1,2,3,4],
    'Shift_Time_Start' : [arrow.get('2016-03-22 09:00:00').datetime, 
                   arrow.get('2016-03-22 09:20:00').datetime,
                   arrow.get('2016-03-22 09:40:00').datetime,
                   arrow.get('2016-03-22 10:00:00').datetime
                   ],

    'Shift_Time_End' : [arrow.get('2016-03-22 09:35:00').datetime, 
                   arrow.get('2016-03-22 10:20:00').datetime,
                   arrow.get('2016-03-22 10:14:00').datetime,
                   arrow.get('2016-03-22 10:31:00').datetime
                   ],   
        }


df = pd.DataFrame(data)
min_int = '5T'
df

Shift_ID    Shift_Time_End  Shift_Time_Start
0   1   2016-03-22 09:35:00+00:00   2016-03-22 09:00:00+00:00
1   2   2016-03-22 10:20:00+00:00   2016-03-22 09:20:00+00:00
2   3   2016-03-22 10:14:00+00:00   2016-03-22 09:40:00+00:00
3   4   2016-03-22 10:31:00+00:00   2016-03-22 10:00:00+00:00

创建新的Df

new_data = {'time' : [], 'Shift_ID': []} # dict to hold the data

for row in df.iterrows():
    # creates a list of all dates of this shift, from start to end
    dates = pd.date_range(row[1].Shift_Time_Start, row[1].Shift_Time_End, freq=min_int)
    for date in dates:
        new_data['time'].append(date)
        new_data['Shift_ID'].append(row[1].Shift_ID)

# creating the new df    
newdf = pd.DataFrame(new_data).set_index('time')
newdf.head()


Shift_ID
time    
2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   1

Groupby Timegrouper

# We groupby the time column, resampling every min_int 
# (in our case 5 minutes, represented by '5T'), 
# then we check how many uniquer shift_id.
newdf.groupby(pd.TimeGrouper(freq=min_int)).agg({'Shift_ID': lambda x : len(set(x))})

    Shift_ID
time    
2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   2
2016-03-22 09:25:00+00:00   2
2016-03-22 09:30:00+00:00   2
2016-03-22 09:35:00+00:00   2
2016-03-22 09:40:00+00:00   2

在9:15读作，有一个班次正在进行，而在9:20，有2个

这不完全是您想要的输出，但我认为这更容易绘制。如果你想匹配你想要的输出，那应该很容易（只需用.shift创建一个移动的日期列的副本）。

**编辑

Link to notebook with code

在时间跨度的班次上计算所需的设备

2 个答案:

重新创建您的DataFrame

创建新的Df

Groupby Timegrouper