从orgmode表创建DataFrame

时间:2019-04-16 18:49:09

标签: pandas dataframe

是否可以通过orgmode(ascii)表创建Pandas DataFrame?

所以我有这个:

data = """\
| binance         | BTC   | Bitcoin           |      0.00000386 | Buy | 0 |
| binance         | DNT   | district0x        |            1998 | Buy | 0 |
| binance         | TNT   | Tierion           |        1855.143 | Buy | 0 |
| binance         | VIB   | Viberate          |             999 | Buy | 0 |
| Coinexchange.io | BUZZ  | BuzzCoin          |          500000 | Buy | 0 |
| Coinexchange.io | ECC   | ECC               |       81094.078 | Buy | 0 |
| Coinexchange.io | ESP   | Espers            | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON  | Mooncoin          |       1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS  | FedoraCoin        |         4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise             |            5000 | Buy | 0 |
| Coinexchange.io | VSX   | Vsync             |            5000 | Buy | 0 |
| Coinexchange.io | XP    | Experience Points |          100000 | Buy | 0 |
| Cryptopia       | BTC   | Bitcoin           |            1e-8 | Buy | 0 |
| Cryptopia       | DGB   | DigiByte          |           10000 | Buy | 0 |
| Cryptopia       | XBY   | XTRABYTES         |  17458.51615734 | Buy | 0 |
"""

并像这样创建一个Pandas数据框:

import io
import pandas as pd
from tabulate import tabulate  # <- just for demo purpose (printing out df)

data = """\
| binance         | BTC   | Bitcoin           |      0.00000386 | Buy | 0 |
| binance         | DNT   | district0x        |            1998 | Buy | 0 |
| binance         | TNT   | Tierion           |        1855.143 | Buy | 0 |
| binance         | VIB   | Viberate          |             999 | Buy | 0 |
| Coinexchange.io | BUZZ  | BuzzCoin          |          500000 | Buy | 0 |
| Coinexchange.io | ECC   | ECC               |       81094.078 | Buy | 0 |
| Coinexchange.io | ESP   | Espers            | 509079.92787805 | Buy | 0 |
| Coinexchange.io | MOON  | Mooncoin          |       1496999.5 | Buy | 0 |
| Coinexchange.io | TIPS  | FedoraCoin        |         4989997 | Buy | 0 |
| Coinexchange.io | VOISE | Voise             |            5000 | Buy | 0 |
| Coinexchange.io | VSX   | Vsync             |            5000 | Buy | 0 |
| Coinexchange.io | XP    | Experience Points |          100000 | Buy | 0 |
| Cryptopia       | BTC   | Bitcoin           |            1e-8 | Buy | 0 |
| Cryptopia       | DGB   | DigiByte          |           10000 | Buy | 0 |
| Cryptopia       | XBY   | XTRABYTES         |  17458.51615734 | Buy | 0 |
"""

raw_data = io.StringIO(data)
df = pd.read_csv(raw_data, sep='|', header=None)   # << Relevant line
print(tabulate(df))

这就是我得到的:

 0  nan  binance          BTC    Bitcoin                 3.86e-06   Buy  0  nan
 1  nan  binance          DNT    district0x           1998          Buy  0  nan
 2  nan  binance          TNT    Tierion              1855.14       Buy  0  nan
 3  nan  binance          VIB    Viberate              999          Buy  0  nan
 4  nan  Coinexchange.io  BUZZ   BuzzCoin           500000          Buy  0  nan
 5  nan  Coinexchange.io  ECC    ECC                 81094.1        Buy  0  nan
 6  nan  Coinexchange.io  ESP    Espers             509080          Buy  0  nan
 7  nan  Coinexchange.io  MOON   Mooncoin                1.497e+06  Buy  0  nan
 8  nan  Coinexchange.io  TIPS   FedoraCoin              4.99e+06   Buy  0  nan
 9  nan  Coinexchange.io  VOISE  Voise                5000          Buy  0  nan
10  nan  Coinexchange.io  VSX    Vsync                5000          Buy  0  nan
11  nan  Coinexchange.io  XP     Experience Points  100000          Buy  0  nan
12  nan  Cryptopia        BTC    Bitcoin                 1e-08      Buy  0  nan
13  nan  Cryptopia        DGB    DigiByte            10000          Buy  0  nan
14  nan  Cryptopia        XBY    XTRABYTES           17458.5        Buy  0  nan

但这并不是完美的,因为我不得不去除字符串列中所有多余的空格。另外,我必须删除第一个和最后一个col。 (空)。

那么有没有更方便的方法呢?

1 个答案:

答案 0 :(得分:1)

您可以将正则表达式传递给sep参数。由于C解析器无法处理1个以上的字符分隔符,因此请使用engine='python'

df = pd.read_csv(raw_data, sep=r'\s*\|\s*', header=None, engine='python')