如何通过使用pyspark内部连接两个DB2表来提取数据?

时间:2017-11-05 09:26:06

标签: db2 pyspark

我正在使用以下查询从DB2中提取数据

SELECT A.Emp_ID,B1.Manager_Name,B1.Manager_Phone,B1.Manager_mail
FROM Employee A
INNER JOIN Manager_DETAIL B1
ON (B1.EMP_ID = A.EMP_ID
OR B1.Manager_mail = A.SuperVisor_mail
AND B1.Join_year = '2017' AND B1.QTR = 'Q1' 
AND B1.Dept_Name
IN ('support')

如何使用pyspark做同样的事情?

我尝试使用此代码

tab_A= spark.read.jdbc("My Connection String","Employee",
             properties={"user": "my user id", 
                      "password": "my passwore",
                      'driver' : 'com.ibm.db2.jcc.DB2Driver'})

tab_A.registerTempTable('data_table')
# query to get columns necessary to create indexes
sql = "SELECT * FROM data_table"
A = spark.sql(sql)

tab_B= spark.read.jdbc("My Connection String","Manager_DETAIL",
             properties={"user": "my user id", 
                      "password": "my passwore",
                      'driver' : 'com.ibm.db2.jcc.DB2Driver'})

tab_B.registerTempTable('data_table1')
# query to get columns necessary to create indexes
sql = "SELECT * FROM data_table1"
B1 = spark.sql(sql)
C=spark.sql("SELECT A.Emp_ID,B1.Manager_Name,B1.Manager_Phone,B1.Manager_mail
FROM  A
INNER JOIN B1
ON (B1.EMP_ID == A.EMP_ID) |\
(B1.Manager_mail == A.SuperVisor_mail) \
& (B1.Join_year == '2017' & B1.QTR == 'Q1') \ 
& B1.Dept_Name IN ('support')")

但是我的语法错误无效

0 个答案:

没有答案
相关问题