如何将JSON文件读入Dataframe?

时间:2016-12-22 18:44:52

标签: python json pandas dataframe

我是python的新手所以任何人都可以帮助我吗?

我在json文件中有以下内容(即。file12.json)

{
    "TimeSeries": {
        "Row": [
            {
                "CLOSE": 41.85,
                "TIMESTAMP": "2016-09-22T00:00:00+00:00"
            },
            {
                "CLOSE": 41.37,
                "TIMESTAMP": "2016-09-23T00:00:00+00:00"
            },
            {
                "CLOSE": 40.88,
                "TIMESTAMP": "2016-09-26T00:00:00+00:00"
            },
            {
                "CLOSE": 40.98,
                "TIMESTAMP": "2016-09-27T00:00:00+00:00"
            },
            {
                "CLOSE": 44.33,
                "TIMESTAMP": "2016-12-21T00:00:00+00:00"
            }
        ]
    }
}

我正在尝试创建一个结构化的Dataframe,如下所示:

      CLOSE        TIMESTAMP
0     41.85        2016-09-22T00:00:00+00:00 
1     41.37        2016-09-23T00:00:00+00:00 
2     40.88        2016-09-26T00:00:00+00:00
3     40.98        2016-09-27T00:00:00+00:00 

如果我想用csv做同样的事情,我只需使用'read_csv'但read_python会产生不同的输出。

我用过这段代码......

file = pd.read_json('file12.json')
print file

...但格式并不是我想要的。我得到以下内容:

TimeSeries
Row  [{u'CLOSE': 41.85, u'TIMESTAMP': u'2016-09-22T...

..即。一切都只是在一行,而不是在格式化的表格中。

谁能帮助我吗?请: - )

2 个答案:

答案 0 :(得分:3)

McKinney的 Python for Data Analysis ,他说

  

如何将JSON对象或对象列表转换为DataFrame或其他一些数据结构进行分析将取决于您。

试试这个(这个未经测试的代码,ymmv)

import json
import pandas as pd
with open('file12.json') as json_data:
   obj = json.load(json_data)
   frame = pd.DataFrame(obj['TimeSeries']['Row'], columns=['CLOSE', 'TIMESTAMP'])

答案 1 :(得分:2)

rows字符串的json值部分:

In [454]: txt1="""[
     ...:             {
     ...:                 "CLOSE": 41.85,
     ...:                 "TIMESTAMP": "2016-09-22T00:00:00+00:00"
     ...:             },
     ...:             {
     ...:                 "CLOSE": 41.37,
     ...:                 "TIMESTAMP": "2016-09-23T00:00:00+00:00"
     ...:             },
     ...:             {
     ...:                 "CLOSE": 40.88,
     ...:                 "TIMESTAMP": "2016-09-26T00:00:00+00:00"
     ...:             },
     ...:             {
     ...:                 "CLOSE": 40.98,
     ...:                 "TIMESTAMP": "2016-09-27T00:00:00+00:00"
     ...:             },
     ...:             {
     ...:                 "CLOSE": 44.33,
     ...:                 "TIMESTAMP": "2016-12-21T00:00:00+00:00"
     ...:             }
     ...:         ]"""

解析列表:

In [449]: json.loads(txt1)
Out[449]: 
[{'CLOSE': 41.85, 'TIMESTAMP': '2016-09-22T00:00:00+00:00'},
 {'CLOSE': 41.37, 'TIMESTAMP': '2016-09-23T00:00:00+00:00'},
 {'CLOSE': 40.88, 'TIMESTAMP': '2016-09-26T00:00:00+00:00'},
 {'CLOSE': 40.98, 'TIMESTAMP': '2016-09-27T00:00:00+00:00'},
 {'CLOSE': 44.33, 'TIMESTAMP': '2016-12-21T00:00:00+00:00'}]

并加载到pandas中(将日期解释为datetime64类型,convert_dates=True默认值):

In [451]: df=pd.read_json(txt1)
In [452]: df
Out[452]: 
   CLOSE  TIMESTAMP
0  41.85 2016-09-22
1  41.37 2016-09-23
2  40.88 2016-09-26
3  40.98 2016-09-27
4  44.33 2016-12-21
In [453]: df.dtypes
Out[453]: 
CLOSE               float64
TIMESTAMP    datetime64[ns]
dtype: object

但正如@Alex所示,您可以通过首先使用json.loads解析然后加载该字典的一部分来更好地控制转换。 obj['TimeSeries']['Row']就是这个列表。

你甚至可以进行json往返去除外层:

In [455]: dd = json.loads(txt)
In [456]: dd
Out[456]: 
{'TimeSeries': {'Row': [{'CLOSE': 41.85,
    'TIMESTAMP': '2016-09-22T00:00:00+00:00'},
   {'CLOSE': 41.37, 'TIMESTAMP': '2016-09-23T00:00:00+00:00'},
   {'CLOSE': 40.88, 'TIMESTAMP': '2016-09-26T00:00:00+00:00'},
   {'CLOSE': 40.98, 'TIMESTAMP': '2016-09-27T00:00:00+00:00'},
   {'CLOSE': 44.33, 'TIMESTAMP': '2016-12-21T00:00:00+00:00'}]}}
In [457]: pd.read_json(json.dumps(dd['TimeSeries']['Row']))
Out[457]: 
   CLOSE  TIMESTAMP
0  41.85 2016-09-22
1  41.37 2016-09-23
2  40.88 2016-09-26
3  40.98 2016-09-27
4  44.33 2016-12-21