我正在使用以下查询从DB2中提取数据
SELECT A.Emp_ID,B1.Manager_Name,B1.Manager_Phone,B1.Manager_mail
FROM Employee A
INNER JOIN Manager_DETAIL B1
ON (B1.EMP_ID = A.EMP_ID
OR B1.Manager_mail = A.SuperVisor_mail
AND B1.Join_year = '2017' AND B1.QTR = 'Q1'
AND B1.Dept_Name
IN ('support')
如何使用pyspark做同样的事情?
我尝试使用此代码
tab_A= spark.read.jdbc("My Connection String","Employee",
properties={"user": "my user id",
"password": "my passwore",
'driver' : 'com.ibm.db2.jcc.DB2Driver'})
tab_A.registerTempTable('data_table')
# query to get columns necessary to create indexes
sql = "SELECT * FROM data_table"
A = spark.sql(sql)
tab_B= spark.read.jdbc("My Connection String","Manager_DETAIL",
properties={"user": "my user id",
"password": "my passwore",
'driver' : 'com.ibm.db2.jcc.DB2Driver'})
tab_B.registerTempTable('data_table1')
# query to get columns necessary to create indexes
sql = "SELECT * FROM data_table1"
B1 = spark.sql(sql)
C=spark.sql("SELECT A.Emp_ID,B1.Manager_Name,B1.Manager_Phone,B1.Manager_mail
FROM A
INNER JOIN B1
ON (B1.EMP_ID == A.EMP_ID) |\
(B1.Manager_mail == A.SuperVisor_mail) \
& (B1.Join_year == '2017' & B1.QTR == 'Q1') \
& B1.Dept_Name IN ('support')")
但是我的语法错误无效