我正在使用 sql alchemy 会话在 Azure Databricks 中的数据库上运行 SQL 查询。我的查询包含一个用户定义的函数,但是当我运行查询时它返回
<块引用>infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Error 运行查询:org.apache.spark.sql.AnalysisException:未定义 函数:my_method
下面的例子
class DatabaseQuery(DatabaseLibrary):
def __init__(self):
self.table_model = None
self.column_names = []
self.conn = None
self.meta = None
self.engine = None
self.session = None
self.query_list = []
def connect_database(self, region, token, database, http_path):
try:
dbfs_engine = create_engine(
"databricks+pyhive://token:"
+ token
+ "@"
+ region
+ "xxxxxx/"
+ database,
connect_args={"http_path": http_path},
echo=True,
)
self._set_metadata_databricks(dbfs_engine)
Session = sessionmaker(bind=dbfs_engine)
self.session = Session()
self.engine = dbfs_engine
self.conn = dbfs_engine.connect()
except Exception as e:
traceback.print_exc()
raise
def my_method(name):
return Upper(name)
query =("select my_method(names.NAME) from db.names").fetchall()
result = self.session.execute(query).fetchall()
使用 pyspark,我只需将 udf 注册到 spark 对象即可轻松完成此操作
convert_maximo_date = udf(common.convert_maximo_date)
self.spark.udf.register("convert_maximo_date", convert_maximo_date)
是否可以对 SQL Alchemy 连接执行类似的操作,以便可以执行具有用户定义函数的查询?