使用日期

时间:2017-11-30 01:58:34

标签: python pandas dataframe

我已经使用此代码几天了,并且一直遇到与重新编制此数据框索引相关的问题。我正在为MLB玩家提取游戏日志并为我的目的格式化它们以便将它们导出到excel。我试图在4月2日到10月1日(2017年美国职棒大联盟常规赛)之间的每个日期排队,为该球员没有参加比赛的每个日期添加一个空行。今天早些时候,这个代码实际上在使用一个url时有效,但是当我尝试在循环中使用多个url时,输出是空白的。当我对该行进行注释时,将数据帧重新索引,并使代码正确运行。

import pandas as pd
from bs4 import BeautifulSoup
import requests
import time
import datetime
import re
import numpy as np


start_date = '2017-04-02'
end_date = '2017-10-01'
idx = pd.date_range(start_date,end_date)
dates = [d.strftime('%Y-%m-%d') for d in idx]

url = 'http://www.fangraphs.com/statsd.aspx?playerid=7859&position=OF&type=1&gds=2017-04-02&gde=2017-10-01&season='
resp = requests.get(url)    
soup = BeautifulSoup(resp.text, "lxml")
table = soup.find('table', attrs={'class' : 'rgMasterTable'})
df = pd.read_html(str(table))[0]


top2rows = df[0:1]
df = df.replace(['Total','Date'], np.nan).dropna()
df = df.reindex(index=df.index[::-1])
df.set_index(df.columns[0],inplace=True)
#df = df.reindex(dates)
df = pd.concat([top2rows, df], ignore_index=False, join='inner')

当我使用未注释掉的第二个reindex行运行它时,这是回溯代码。

Traceback (most recent call last):
File "Fangraphs Bug Fixer.py", line 36, in <module>
df = df.reindex(dates)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2733, in reindex
**kwargs)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2515, in reindex
fill_value, copy).__finalize__(self)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2679, in _reindex_axes
fill_value, limit, tolerance)
File "C:\Python36\lib\site-packages\pandas\core\frame.py", line 2690, in _reindex_index
allow_dups=False)
File "C:\Python36\lib\site-packages\pandas\core\generic.py", line 2627, in _reindex_with_indexers
copy=copy)
File "C:\Python36\lib\site-packages\pandas\core\internals.py", line 3886, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "C:\Python36\lib\site-packages\pandas\core\indexes\base.py", line 2836, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

0 个答案:

没有答案
相关问题