Question

我正在使用pandas.read_sql()命令从postgresql数据库中获取数据。 SQL查询通常使用许多列创建，我只希望使用一列作为索引来获取特定列。像这样创建一个示例表test_table：

column1 column2 column3
1       2       3
2       4       6
3       6       9

我尝试使用index_col中的columns和pandas.read_sql()参数将column1作为索引，将column2作为数据（忽略column3 }！）。但它总是返回整个表格。写columns=['column1', 'column2']时也没有任何改变......

我正在使用python 2.7.6和pandas 0.17.1 - 感谢您的帮助！

示例代码：

import pandas
import psycopg2
import sqlalchemy


def connect():
    connString = (
        "dbname=test_db "
        "host=localhost "
        "port=5432 "
        "user=postgres "
        "password=password"
    )
    return psycopg2.connect(connString)

engine = sqlalchemy.create_engine(
            'postgresql://',
            creator=connect)
sql = (
    'SELECT '
    'column1, '
    'column2, '
    'column3 '
    'FROM test_table'
)
data = pandas.read_sql(
    sql,
    engine,
    index_col=['column1'],
    columns=['column2'])
print(data)

Answer 1

我认为参数columns对您不起作用，因为您使用的是sql语句而不是为其提供表名。

如熊猫网站所述：

columns：list，default：None要从sql中选择的列名列表 table（仅在阅读表格时使用）。

因此，我想如果你尝试：

pandas.read_sql('test_table', engine, index_col=['column1'], columns=['column2'])

columns参数实际上会起作用。

使用index_col时，Pandas read_sql列无效 - 返回所有列

1 个答案: