我有一个JDBC查询,运行速度比Python(使用cx_Oracle)中的同一个查询慢8-20倍。我希望语言之间可以有一些正常的百分比差异,但是慢8倍或更多倍似乎是我做错了。
我不包括实际查询,因为它显示了一些业务逻辑(表结构,命名等),但是我认为它不应该非常相关,因为我是在比较Python与Java的执行时间,而不是尝试分析查询本身。但是,这是跨7个表的左联接,并从这些表的25列中提取值。
查询针对具有数百万行的表运行,最终按预期返回28,705行。
Python代码实际上比Java代码做得更多,因为在Python中,我遍历结果并将值提取到对象中。对于Java代码,我只是在ResultSet上循环以获取数据,但是我不会以任何其他方式处理数据/行。
我希望Java代码中缺少一些明显/非明显的设置,使它与Python执行时间更加内联。
我尝试了我认为显而易见的解决方案,这些解决方案也是其他SO职位要解决的主要问题。这些包括:
我有各种配置的计时,它们的运行速度都比Python代码慢许多倍。
注意为简洁起见,对输出进行了稍微清理,以删除每行中的驱动程序版本。
Java代码:
public void runJdbc() throws Exception {
Connection conn = DriverManager.getConnection(url, username, password);
// Pull driver version info
OracleDatabaseMetaData meta = (OracleDatabaseMetaData)(conn.getMetaData());
String name = meta.getDriverName();
String version = meta.getDriverVersion();
String driverInfo = name + '.' + version;
// Run the query for each fetch size and time operations
int sizing[] = {100, 1000, 5000, 10000, 25000};
for(int s : sizing) {
Statement stmt = conn.createStatement();
stmt.setFetchSize(s);
// query
long start = System.currentTimeMillis();
ResultSet rs = stmt.executeQuery(query);
long querySeconds = (System.currentTimeMillis() - start) / 1000;
// iterate over results just to fetch them, no additional processing for now
int count = 0;
long start2 = System.currentTimeMillis();
while ( rs.next() ) {
count++;
}
long rsSeconds = (System.currentTimeMillis() - start) / 1000;
System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds );
}
conn.close();
}
执行时间:
Python:
Completed query. rows=28705, queryTime=11.48
--- 13.22 seconds ---
Java 8:
java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)
Driver 11.2.0.1.0
Execution completed: fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=107
Execution completed: fetchSize=1000, rows=28705, query.seconds=4, resultSet.seconds=98
Execution completed: fetchSize=5000, rows=28705, query.seconds=18, resultSet.seconds=100
Execution completed: fetchSize=10000, rows=28705, query.seconds=34, resultSet.seconds=108
Execution completed: fetchSize=25000, rows=28705, query.seconds=100, resultSet.seconds=117
Driver 9.3.0.0.0
Execution completed: fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=121
Execution completed: fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=109
Execution completed: fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=111
Execution completed: fetchSize=10000, rows=28705, query.seconds=37, resultSet.seconds=109
Execution completed: fetchSize=25000, rows=28705, query.seconds=95, resultSet.seconds=108
Java 12:
java version "12.0.2" 2019-07-16
Java(TM) SE Runtime Environment (build 12.0.2+10)
Java HotSpot(TM) 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)
Driver 11.2.0.1.0
Execution completed: fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=115
Execution completed: fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=105
Execution completed: fetchSize=5000, rows=28705, query.seconds=18, resultSet.seconds=101
Execution completed: fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=100
Execution completed: fetchSize=25000, rows=28705, query.seconds=85, resultSet.seconds=98
Driver 9.3.0.0.0
Execution completed: fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=118
Execution completed: fetchSize=1000, rows=28705, query.seconds=4, resultSet.seconds=107
Execution completed: fetchSize=5000, rows=28705, query.seconds=21, resultSet.seconds=109
Execution completed: fetchSize=10000, rows=28705, query.seconds=37, resultSet.seconds=109
Execution completed: fetchSize=25000, rows=28705, query.seconds=94, resultSet.seconds=109
我希望JDBC执行在时间上更接近Python执行,可能为+/- 25%。我实际上看到的是JDBC查询比python查询慢8到20倍。
在最坏的情况下,我从Java代码的ResultSet中提取数据,其运行速度比Python代码慢20倍。如果我只用Java遍历ResultSet,那么它的速度可能会慢8倍。
基于这些注释,我更改了代码和命令以固定时间并增加了Heap,与Python相比,总体运行时间仍然很差。
代码更改:
long rsSeconds = (System.currentTimeMillis() - start2) / 1000;
long overall = (System.currentTimeMillis() - start) / 1000;
stmt.close();
System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds + ", overall.seconds=" + overall );
时间:
java -Xms2g -Xmx32g -jar target/JdbcTest-1.0-SNAPSHOT-shaded.jar
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=128, overall.seconds=129
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=120, overall.seconds=126
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=89, overall.seconds=108
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=69, overall.seconds=105
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=25000, rows=28705, query.seconds=92, resultSet.seconds=13, overall.seconds=105
Python代码是:
import cx_Oracle
import time
class DataObject:
def __init__(self, columns, row):
index = 0
for key in columns:
value = row[index]
index += 1
setattr(self, key, value)
def __str__(self):
s = ''
for k, v in self.__dict__.items():
s += f'{k}={v}, '
return s
class Oracle:
def __init__(self, args):
self.args = args
def run(self):
start = time.time()
# connect
conn = cx_Oracle.connect(self.args.ora_uid, self.args.ora_pwd, self.args.ora_dsn)
cursor = conn.cursor()
# run the query
cursor.execute(self.args.query)
columns = [column[0] for column in cursor.description]
# fetch results
results = cursor.fetchall()
count = cursor.rowcount
print(f'Query returned {count} rows')
# create objects from the data
dao = []
for r in results:
dao.append(DataObject(columns, r))
# cleanup
cursor.close()
elapsed = time.time() - start
print(f'Query completed. rows={len(dao)}, seconds={elapsed:.2f}')
# output to validate
for d in dao:
print(d)
return(columns, dao)
输出为:
Query returned 28705 rows
Query completed. rows=28705, seconds=11.34
private static final String url = "jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=***)(PORT=***))(CONNECT_DATA=(SERVICE_NAME=***)(SERVER=DEDICATED)))";