由dayofweek DataFrame Spark SQL分组和计数值

时间:2016-12-27 01:05:39

标签: python apache-spark pyspark apache-spark-sql spark-dataframe

我已经加载了一个DataFrame。它看起来像这样:

uber_converted.show()

+--------------------+--------------------+-------------------+----------+---------+--------------------+
|dispatching_base_num|         pickup_date|affiliated_base_num|locationID|     zone|             borough|
+--------------------+--------------------+-------------------+----------+---------+--------------------+
|              B02765|2015-05-08 19:05:...|             B02764|       262|Manhattan|      Yorkville East|
|              B02765|2015-05-08 19:06:...|             B00013|       234|Manhattan|            Union Sq|
|              B02765|2015-05-08 19:06:...|             B02765|       107|Manhattan|            Gramercy|
|              B02765|2015-05-08 19:06:...|             B02765|       137|Manhattan|            Kips Bay|
|              B02765|2015-05-08 19:06:...|             B02765|       220|    Bronx|Spuyten Duyvil/Ki...|
|              B02765|2015-05-08 19:06:...|             B02765|       138|   Queens|   LaGuardia Airport|
|              B02765|2015-05-08 19:06:...|             B02749|       143|Manhattan| Lincoln Square West|
|              B02765|2015-05-08 19:06:...|             B02765|       244|Manhattan|Washington Height...|
|              B02765|2015-05-08 19:06:...|             B02617|       262|Manhattan|      Yorkville East|
|              B02765|2015-05-08 19:06:...|             B02765|       144|Manhattan| Little Italy/NoLiTa|
|              B02765|2015-05-08 19:06:...|             B00381|       209|Manhattan|             Seaport|
|              B02765|2015-05-08 19:06:...|             B02765|       234|Manhattan|            Union Sq|
|              B02765|2015-05-08 19:06:...|             B02765|       163|Manhattan|       Midtown North|
|              B02765|2015-05-08 19:06:...|             B02765|       181| Brooklyn|          Park Slope|
|              B02765|2015-05-08 19:06:...|             B02765|       116|Manhattan|    Hamilton Heights|
|              B02765|2015-05-08 19:06:...|             B02765|       236|Manhattan|Upper East Side N...|
|              B02765|2015-05-08 19:06:...|             B02765|       140|Manhattan|     Lenox Hill East|
|              B02765|2015-05-08 19:07:...|             B02765|       162|Manhattan|        Midtown East|
|              B02765|2015-05-08 19:07:...|             B02788|       263|Manhattan|      Yorkville West|
|              B02765|2015-05-08 19:07:...|             B02765|       181| Brooklyn|          Park Slope|
+--------------------+--------------------+-------------------+----------+---------+--------------------+

我需要使用pickup_date字段按星期几进行分组和计数。结果必须像这样

dayofweek   count
1         -> 234 (Monday)
2         -> 343 (Tuesday)

等...

任何帮助,非常感谢!

1 个答案:

答案 0 :(得分:0)

您可以使用date_format

from pyspark.sql.functions import date_format

df.groupBy(date_format(df["pickup_date"], "u").alias("dayofweek")).count()
相关问题