与GROUP BY联接导致SUM()逻辑问题

时间:2016-10-24 14:41:52

标签: sql join teradata

查询 -

sel TableName, DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
        from dbc.tablesize
        group by 1,2
        order by GB desc

结果 -

+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP       | A      |  28,350.01 |
| CPC       | B      |  19,999.37 |
| SDF       | C      |  13,263.67 |
| DB1400    | D      |  13,200.26 |
+-----------+--------+------------+

从上面的简单查询中我可以看到数据库A 的表 WRP 接近 28350 GB

现在我正在尝试加入另一个表dbc.indices以使用列IndexType进行过滤,但现在所有表的Size_in_GB都会更改。

sel a.TableName,a.DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB from dbc.tablesize a
join dbc.indices b on a.TableName = b.TableName and a.DatabaseName=b.DatabaseName
--where b.indexType='P'
group by 1,2
order by Size_in_GB desc

结果就是这个 -

+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP       | A      |  56,700.02 |
| CPC       | B      |  39,998.74 |
| DB1400    | D      |  39,600.78 |
+-----------+--------+------------+

现在同一张表的大小是两倍,即 WRP 56700 GB 。 (与其他表相似)

我不确定我用于加入的逻辑有什么问题。

P.S - 我的目标是查找大小超过100GB的所有表格,并将indexType设为“P”

编辑 - 分享DBC.INDICES

中的相关列
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| DatabaseName | TableName  | IndexNumber | IndexType | UniqueFlag |   IndexName   | ColumnName | ColumnPosition |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| Some DB      | Some Table |           1 | P         | N          | IndexNamehere | ColumnA    |              1 |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+

4 个答案:

答案 0 :(得分:2)

什么令人困惑?

您显然拥有包含多个索引的表。每个索引都会导致表格出现多次聚合。

你想要的是什么:

  

我的目标是找到大小超过100GB的所有表格   并将indexType设为'P'

我建议将索引比较移到where子句:

select t.TableName, t.DatabaseName,
       sum(tCurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize t
where exists (select 1
              from dbc.indices i
              where t.TableName = i.TableName and t.DatabaseName = i.DatabaseName and
                    i.indexType = 'P'
             )
group by 1,2
order by Size_in_GB desc

如果您想添加该过滤器,可以在having Size_in_GB > 100之前添加order by

答案 1 :(得分:1)

您的密钥可能在dbc.indices表格中重复。对于单个TableNamedbc.indices表有多个条目,因此当您加入dbc.tablesize时,表记录重复,因此SUM应用于重复记录,因此计算错误。

试试这种方式

SELECT a.TableName,
       a.DatabaseName,
       Sum(CurrentPerm / ( 1024 * 1024 * 1024 )) AS Size_in_GB
FROM   dbc.tablesize a
       JOIN (SELECT DISTINCT b.TableName,
                             b.DatabaseName
             FROM   dbc.indices b
             --where b.indexType='P'
             ) b
         ON a.TableName = b.TableName
            AND a.DatabaseName = b.DatabaseName

GROUP  BY a.TableName,
          a.DatabaseName
ORDER  BY Size_in_GB DESC 

答案 2 :(得分:0)

dbc.IndidesV(从不使用旧的已弃用的非V视图)每个索引每列有一行。

您只需添加条件即可将其限制为一行:where IndexType = 'P' and ColumnPosition = 1

进行早期聚合更有效,即在加入之前进行聚合:

select t.*
from 
 (
   select TableName, DatabaseName,
      sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
   from dbc.TableSizeV
   group by 1,2
   having Size_in_GB > 100
 ) as dt
join dbc.IndicesV b 
  on a.TableName = b.TableName
 and a.DatabaseName=b.DatabaseName
where IndexType = 'P' 
  and ColumnPosition = 1
order by Size_in_GB desc;

但是你为什么要过滤IndexType=P,你不关心其他对象> 100GB(NoPI / Columnar表,加入指数)?顺便说一句,这不会返回所有带有PI的表,只有IndexNumber=1会这样做。

根据您的需要,您最好加入dbc.TablesV

答案 3 :(得分:0)

  

P.S - 我的目标是找到所有大于100GB的表   在Size中,indexType为'P'

如果您只想查找存在索引的某些表,则根本不能加入。请改用EXISTS。这会将您的条件放在它所属的WHEREHAVING子句中,并且您的条件没有重复记录的问题(在您的情况下:当表有更多时,它不再重要比一个匹配的索引)。

select tablename, databasename, sum(currentperm/(1024*1024*1024)) as size_in_gb 
from dbc.tablesize ts
group by tablename, databasename
having sum(currentperm/(1024*1024*1024)) > 100
and exists
(
  select *
  from dbc.indices i
  where i.tablename = ts.tablename and i.databasename = ts.databasename
  and i.indexType = 'P'
)
order by Size_in_GB desc;