对于非常相似的数据帧,为什么大熊猫的情节如此不同?

时间:2018-02-01 19:45:50

标签: python pandas dataframe plot

以下带有打印部分显示的代码段 执行没有问题:

import pandas as pd
from   matplotlib import pyplot as plt
import numpy as np
import csv

ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
df = pd.DataFrame(np.random.randn(1000, 3), index=ts.index, columns=list('ABC'))
print (df)
print (df.index)
print (df.columns)
#df = df.cumsum()   # this also is ok
df.plot()
plt.show()

                   A         B         C
2000-01-01  0.882544 -0.841398  1.745238
2000-01-02  1.798310  1.049662 -0.115292
2000-01-03  1.223243 -0.086322 -0.565204
2000-01-04 -0.587905 -0.609485  0.296248
2000-01-05 -1.603916 -0.397210  0.007550
2000-01-06 -0.821833  0.112760 -0.082558   
     ...
     ...
2002-09-22 -0.530537  0.373358  2.920919
2002-09-23  0.121657  0.634864 -0.964255
2002-09-24  1.153799  2.468507 -2.087136
2002-09-25 -1.079853  0.684926  1.556522
2002-09-26 -2.163454  0.874373  1.942925

[1000 rows x 3 columns]
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10',
               ...
               ...
               '2002-09-17', '2002-09-18', '2002-09-19', '2002-09-20',
               '2002-09-21', '2002-09-22', '2002-09-23', '2002-09-24',
               '2002-09-25', '2002-09-26'],
              dtype='datetime64[ns]', length=1000, freq='D')
Index(['A', 'B', 'C'], dtype='object')

并产生一个漂亮的情节(如预期的那样)enter image description here

但是,当我执行以下代码时

import datetime

        df = pd.read_csv(fullFileNameOutCSV, header=0, names=headerCSV,usecols=['SEK/EUR', 'SEK/DKK', 'SEK/NOK'])        
        pd.Timedelta(datetime.timedelta(days=1))
        df.index = pd.date_range('2000-01-01',periods=4888)
        df.index = pd.to_datetime(df.index)
        print (df)
        print (df.index)
        print (df.columns)   
        df.plot()
        plt.show() 

            SEK/EUR  SEK/DKK  SEK/NOK
2000-01-01   9.4696   7.4501   8.8550
2000-01-02   9.4025   7.4495   8.7745
2000-01-03   9.3050   7.4452   8.7335
2000-01-04   9.1800   7.4431   8.6295
2000-01-05   9.1650   7.4433   8.5900
2000-01-06   9.0985   7.4433   8.5585
     ...
     ...
2013-05-14   9.8188   7.4444   9.5858
2013-05-15   9.8005   7.4428   9.5655
2013-05-16   9.7823   7.4427   9.5548
2013-05-17   9.7825   7.4415   9.5628
2013-05-18   9.7645   7.4419   9.5620
2013-05-19   9.8030   7.4428   9.5705

[4888 rows x 3 columns]
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
               '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
               '2000-01-09', '2000-01-10',
               ...
               ...
               '2013-05-10', '2013-05-11', '2013-05-12', '2013-05-13',
               '2013-05-14', '2013-05-15', '2013-05-16', '2013-05-17',
               '2013-05-18', '2013-05-19'],
              dtype='datetime64[ns]', length=4888, freq='D')
情节不是我的预期。 enter image description here 我已经在这个片段上尝试了许多不同的变体来获得合理的情节,但到目前为止,我还没有找到能产生类似于第一个情节的图的代码。注意,垂直轴刻度---是什么原因造成的? 为什么我没有得到类似于第一个代码片段的情节?

3 个答案:

答案 0 :(得分:0)

插入以下代码,其中字符串被修改为输出到csv文件,这很好用 - Eureka: - )

if '.' not in nums[3]:
    # Append '.00' to integer strings
    nums[3] = nums[3] + '.00'  

答案 1 :(得分:0)

以下是代码修改后的更正图

https://github.com/StratusNetwork/OCN

答案 2 :(得分:0)

我使用以下函数计算每种货币的均值和方差。如果我一直在使用它,那么3个异常值很容易被注意到 - 我现在。

def updateMeanVar(x,k,mu,vr):
    '''
     Purpose: Update the estimates for the mean and variance
              (recursive algorithm)              
       Inputs:
          x -- new value (x_k)                 x_1,x_2,...
          k -- counter (index) for new value   1,2,
         mu -- previously estimated mean (x_k not included)
         vr -- previously estimated variance (x_k not included)
       Otputs:
         mu -- updated mean     (with x_k included)
         vr -- updated variance (with x_k included)

      Refs.
        Donald E. Knuth. Seminumerical  Algorithms, volume 2 of The  Art  of     Computer  Programming,
        chapter 4.2.2, page 232. Addison-Wesley, Boston, third edition, 1998.
    '''
    delta = x - mu
    mu += delta/k
    vr += delta*(x - mu)  
    return mu,vr

以下是欧洲中央银行(ECB)货币汇率的部分输出,包括今天的估算。

Processing ECB exchange rates ...
Saving data ...

ECB working days:  4889
strt date: 1999-01-04
stop date: 2018-02-02
Exchange rates stored in:
  Sweden  (SEK/EUR): E:/Data/ECB_ExchangeRates/Sweden/Rates.txt  (mean=  9.235,std=  0.496)
  Denmark (DKK/EUR): E:/Data/ECB_ExchangeRates/Denmark/Rates.txt (mean=  7.448,std=  0.011)
  Norway  (NOK/EUR): E:/Data/ECB_ExchangeRates/Norway/Rates.txt  (mean=  8.238,std=  0.565)
All rates stored in: E:/Data/ECB_ExchangeRates/AllRates.txt
                     E:/Data/ECB_ExchangeRates/AllRates.csv

和摘要图。enter image description here