在 Flink SQL 流中的“Group Aggregation”子查询之后使用“Over Aggregation”

时间:2021-07-07 15:17:58

标签: apache-flink flink-streaming flink-sql

我正在使用链接中的数据:https://github.com/ververica/sql-training/wiki/Setting-up-the-Training-Environment 并且我想找到 30 分钟内的前 10 条常用路线。

首先我按路线计算了乘车次数(组聚合):

  SELECT
    T1.Starting_areaId,
    T1.Ending_areaId,
    TUMBLE_END(T1.matchTime, INTERVAL '30' MINUTE) AS End_Time,
    COUNT(T1.Starting_areaId) AS No_Rides
  FROM (
    SELECT * FROM Rides
    MATCH_RECOGNIZE(
      PARTITION BY taxiId, rideId
      ORDER BY rideTime
      MEASURES
        toAreaId(P.lon, P.lat) AS Starting_areaId,
        toAreaId(D.lon, D.lat) AS Ending_areaId,
        MATCH_ROWTIME() AS matchTime
      AFTER MATCH SKIP PAST LAST ROW
      PATTERN(P D)
      DEFINE
        P AS P.isStart = true,
        D AS D.isStart = false
    )
  ) AS T1
  GROUP BY 
    T1.Starting_areaId,
    T1.Ending_areaId,
    TUMBLE(T1.matchTime, INTERVAL '30' MINUTE)

但是当我尝试使用(过度聚合)通过以下查询按路线对乘车次数进行排名时:

SELECT
  T2.Starting_areaId,
  T2.Ending_areaId,
  T2.End_Time,
  T2.No_Rides,
  RANK() OVER(
    PARTITION BY T2.End_Time
    ORDER BY T2.No_Rides
  ) AS Ranking
FROM (
  SELECT
    T1.Starting_areaId,
    T1.Ending_areaId,
    TUMBLE_END(T1.matchTime, INTERVAL '30' MINUTE) AS End_Time,
    COUNT(T1.Starting_areaId) AS No_Rides
  FROM (
    SELECT * FROM Rides
    MATCH_RECOGNIZE(
      PARTITION BY taxiId, rideId
      ORDER BY rideTime
      MEASURES
        toAreaId(P.lon, P.lat) AS Starting_areaId,
        toAreaId(D.lon, D.lat) AS Ending_areaId,
        MATCH_ROWTIME() AS matchTime
      AFTER MATCH SKIP PAST LAST ROW
      PATTERN(P D)
      DEFINE
        P AS P.isStart = true,
        D AS D.isStart = false
    )
  ) AS T1
  GROUP BY 
    T1.Starting_areaId,
    T1.Ending_areaId,
    TUMBLE(T1.matchTime, INTERVAL '30' MINUTE)
) AS T2

我遇到了这个错误:

[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.table.api.TableException: OVER windows' ordering in stream mode must be defined on a time attribute.

0 个答案:

没有答案