Question

我刚刚从0.23.4（Python 2.7.12）升级到Pandas 0.24.0，我的许多pd.read_sql查询都中断了。看起来与MySQL相关，但奇怪的是，这些错误仅在更新我的熊猫版本之后才会发生。有什么想法吗？

这是我的MySQL表：

CREATE TABLE `xlations_topic_update_status` (
  `run_ts` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

这是我的查询：

import pandas as pd
from sqlalchemy import create_engine
db_engine = create_engine('mysql+mysqldb://<><>/product_analytics', echo=False)
pd.read_sql('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]

这是错误：

OperationalError: (_mysql_exceptions.OperationalError) (1059, "Identifier name 'select max(run_ts) from product_analytics.xlations_topic_update_status;' is too long") [SQL: 'DESCRIBE `select max(run_ts) from product_analytics.xlations_topic_update_status;`']

对于其他更复杂的查询，我也可以使用此功能，但不会在此处发布。

Answer 1

根据documentation，第一个参数是字符串（表名）或SQLAlchemy Selectable（select或text对象）。换句话说，pd.read_sql()委托给pd.read_sql_table()，并将整个查询字符串视为表标识符。

首先将查询字符串包装在text()构造中：

stmt = text('select max(run_ts) from product_analytics.xlations_topic_update_status')
pd.read_sql(stmt, con = db_engine).values[0][0]

通过这种方式pd.read_sql()将委派给pd.read_sql_query()。另一种选择是直接调用它。

Answer 2

尝试使用pd.read_sql_query(sql, con)而不是pd.read_sql(...)。

所以：

pd.read_sql_query('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]

熊猫0.24 read_sql操作错误

2 个答案: