根据值插入行并更新其他列?

时间:2017-08-28 17:29:43

标签: python pandas matplotlib

我是pandas模块的新手,并在工作中使用它进行数据分析。我有一个excel表,每天从访问数据库导入数据,每次机器关闭时都会插入新记录。该表基本上显示了每台机器的正常运行时间百分比

ID | Area | Machine | Week | UTPercent
--------------------------------------
1  |  A1  |   M1    |   1  |  80
2  |  A1  |   M1    |   4  |  90
3  |  A2  |   M2    |   4  |  70
4  |  A2  |   M2    |   8  |  82

从上面可以看出,如果当前周是8,那么它已经超过了Machine1的2,3,5,6,7,8周和Machine2的1,2,3,5,6和7周。如何在中间添加行并将UTPercent相应地设置为所有这些行的100%?换句话说,这就是我需要的。

ID  | Area | Machine | Week | UTPercent
--------------------------------------
1   |  A1  |   M1    |   1  |  80
2   |  A1  |   M1    |   2  |  100
3   |  A1  |   M1    |   3  |  100
4   |  A1  |   M1    |   4  |  90
5   |  A1  |   M1    |   5  |  100
6   |  A1  |   M1    |   6  |  100
7   |  A1  |   M1    |   7  |  100
8   |  A1  |   M1    |   8  |  100
9   |  A1  |   M2    |   1  |  100
10  |  A2  |   M2    |   2  |  100
11  |  A2  |   M2    |   3  |  100
12  |  A2  |   M2    |   4  |  70
13  |  A2  |   M2    |   5  |  100
14  |  A2  |   M2    |   6  |  100
15  |  A2  |   M2    |   7  |  100
16  |  A2  |   M2    |   8  |  82

另外,如果在Area1中只对Machine1进行条形图绘制,如何添加数据标签?我制作了一周(x轴)与正常运行时间百分比(y轴)的条形图。我将需要Weeks作为我的数据标签。

这是我到目前为止所做的:

import matplotlib.plot as plt
import pandas as pd

df = pd.read_excel("targetFolder.xlsx", sheetname = 0, sep ='|')

area1 = df.loc[df['Area'] == 'A1']

# the data

data = list(area1['UTPercent'])
weekNum = list(df.Week)

## the bars
fig = plt.figure()
ax1 = fig.add_subplot(111)
plotData = ax1.bar(weekNum, data, width = 0.45, 
color='#556B2F')

# adding labels and title
ax1.set_xlabel("Weeks")
ax1.set_ylabel("Uptime Percentage")
ax1.set_title("Metrology Area", weight='bold')

fig.tight_layout()
fig.gca()

1 个答案:

答案 0 :(得分:0)

对于第一个问题,我会做这样的事情(假设你的表名为uptimes):

INSERT INTO uptimes (Week, Machine, Area, UTPercent)
    (SELECT SeqValue AS Week,
            machines.Machine,
            machines.Area,
            100 AS UTPercent
     FROM
         (SELECT (TWO_1.SeqValue + TWO_2.SeqValue + TWO_4.SeqValue + TWO_8.SeqValue + TWO_16.SeqValue + TWO_32.SeqValue) SeqValue
          FROM
              (SELECT 0 SeqValue
               UNION ALL SELECT 1 SeqValue) TWO_1
          CROSS JOIN
              (SELECT 0 SeqValue
               UNION ALL SELECT 2 SeqValue) TWO_2
          CROSS JOIN
              (SELECT 0 SeqValue
               UNION ALL SELECT 4 SeqValue) TWO_4
          CROSS JOIN
              (SELECT 0 SeqValue
               UNION ALL SELECT 8 SeqValue) TWO_8
          CROSS JOIN
              (SELECT 0 SeqValue
               UNION ALL SELECT 16 SeqValue) TWO_16
          CROSS JOIN
              (SELECT 0 SeqValue
               UNION ALL SELECT 32 SeqValue) TWO_32
          HAVING SeqValue <=
              (SELECT max(week)
               FROM uptimes)
          AND SeqValue > 0) AS integers
     LEFT JOIN
         (SELECT Machine,
                 Area
          FROM uptimes
          GROUP BY 1,
                   2) AS machines ON 1=1
     LEFT JOIN uptimes ON uptimes.week = integers.SeqValue
     AND machines.Machine = uptimes.Machine
     WHERE uptimes.week IS NULL);

它的工作方式:

  1. 在您的表格中生成从1到最高周的整数(与工会一起使用)
  2. 从您的桌子获取所有机器和区域(SELECT机器,区域......)
  3. 交叉连接以获得所有可能的组合(JOIN on 1 = 1)
  4. 过滤掉那些已存在的(WHERE uptimes.week为null)
  5. 将结果插入表格(插入)
  6. 对于另一个问题。尝试使用pandas plot功能。

    df = pd.read_excel("targetFolder.xlsx", sheetname = 0, sep ='|')
    area1 = df[df.Area == 'A1']
    area1.set_index('Week')['UTPercent'].plot(kind='bar')