Question

我试图在配置单元中参数化该值，而不是在查询中对其进行硬编码。下面是查询。

select * from employee where sal >30000

但是，与其使用30000值作为硬编码，我需要它来自如下所示的同一查询。但是我遇到了问题：

select * from employee where sal > (select max(sal) from employee)

感谢您的帮助。

谢谢

Answer 1

您可以尝试使用这种形式的Hive查询。这将使员工的薪水等于最高薪水。

SELECT e1.* FROM employee e1 
JOIN
(SELECT MAX(sal) as max_sal FROM employee) e2
ON e1.sal = e2.max_sal;

示例：

Table: employee
id  fname   sal
1   AAA     15000
2   BBB     35000
3   CCC     12000
4   DDD     35000
5   EEE     9000

查询执行输出：

2   BBB     35000
4   DDD     35000

Answer 2

Hive不支持此类子查询，也不允许计算变量，Hive中的变量是无需计算的简单文本替换。您可以在shell中计算谓词，然后像下面的答案一样传递给您的蜂巢脚本：https://stackoverflow.com/a/37821218/2700344

如果要在同一个配置单元查询中执行此操作，则在计算子查询并对其进行交叉联接时没错，然后进行过滤。将首先计算子查询，然后将其结果放入分布式缓存中，并应用于读取表的每个映射器的过滤器中：

with sub as(--this is example only and makes no sense
            --replace with real query
            --of course there is no rows with sal>max sal in the same table
select max(S.sal) AS MaxSal from employee S  
)

select * 
  from employee e 
       cross join sub s  
where e.sal>s.MaxSal

如果您编写时不带CROSS JOIN，简单地from employee e, sub s或JOIN不带条件，它仍然是相同的交叉联接better write it explicitly using cross join。

从配置单元中的子查询获取值

2 个答案: