是否使用位置索引......

Question

我有三张桌子：

unmatched_purchases table:
unmatched_purchases_id --primary key
purchases_id --foreign key to events table
location_id --which store
purchase_date
item_id --item purchased

purchases table:
purchases_id --primary key
location_id --which store
customer_id

credit_card_transactions:
transaction_id --primary key
trans_timestamp --timestamp of when the transaction occurred
item_id --item purchased
customer_id
location_id

所有三个表都非常大。购买表有590130404条记录。（是的，5亿）Unmatched_purchases有192827577条记录。 Credit_card_transactions有79965740条记录。

我需要找出unmatched_purchases表中有多少次购买与credit_card_transactions表中的条目匹配。我需要一次为一个位置执行此操作（IE运行location_id = 123的查询。然后为location_id = 456运行它）“匹配”定义为：

1) same customer_id
2) same item_id
3) the trans_timestamp is within a certain window of the purchase_date
  (EG if the purchase_date is Jan 3, 2005 
  and the trans_timestamp is 11:14PM Jan 2, 2005, that's close enough)

我需要以下聚合：

1）该位置有多少不匹配的购买

2）这些不匹配的购买中有多少可以与某个地点的credit_card_transactions匹配。

那么，什么是查询（或查询）以获取不会永远运行的信息？

注意：所有三个表都在location_id

上建立索引

编辑：事实证明，credit_card_purchases表已根据location_id进行了分区。所以这将有助于我加快速度。我问我们的DBA是否可以对其他人进行分区，但这个决定不在我手中。

澄清：我只需要在我们很多地方的一些地方运行，而不是全部分开。我需要在3个位置运行它。我们的系统中有155个location_ids，但其中一些不在我们系统的这一部分中使用。

Answer 1

尝试这个（我不知道它会有多快 - 这取决于你的指数）

  Select Count(*) TotalPurchases, 
     Sum(Case When c.transaction_id Is Not Null 
          Then 1 Else 0 End) MatchablePurchases
  From unmatched_purchases u
     Join purchases p 
        On p.purchases_id = u.unmatched_purchases_id
     Left Join credit_card_transactions c
        On customer_id = p.customer_id
           And item_id = u.item_id 
           And trans_timestamp - purchase_date < @DelayThreshold
  Where Location_id = @Location

Answer 2

至少，你需要更多的索引。我至少提出了文件：

unmatched_purchases.purchases_id上的索引，purchases.location_id上的索引 credit_card_transactions.(location_id, customer_id, item_id, trans_timestamp)上的另一个索引。

没有这些索引，IMO就没什么希望了。

Answer 3

我建议您一次查询所有位置。它将花费你3次完整扫描（每个表一次）+排序。我敢打赌，这将比逐个查询位置更快。

但如果你不想猜测，你至少需要检查EXPLAIN PLAN和10046查询的跟踪......

Answer 4

查询应该是直截了当的，但棘手的部分是让它执行。我想问为什么你需要为每个位置运行一次，因为在单个查询中为每个位置运行它可能会更有效。

加入将是一个巨大的挑战，但聚合应该是直截了当的。我猜想，对于连接，你最好的表现是希望在客户和项目列上进行散列连接，并在日期范围内进行后续过滤操作。您可能不得不将客户和项目连接放在内联视图中，然后尝试停止将日期谓词推入内联视图。

如果可以安排在所有连接列上具有相同散列分区键的等连接表，则散列连接会更有效。

是否使用位置索引......

索引是否值得使用取决于位置索引的聚类因子，您可以从user_indexes表中读取该聚类因子。您可以发布聚类因子以及该表包含的块数吗？这将衡量每个位置的值在整个表中的分布方式。您还可以提取查询的执行计划，例如：

select some_other_column
from   my_table
where  location_id in (value 1, value 2, value 3)

...看看oracle是否认为索引是有用的。

SQL聚合问题

4 个答案:

是否使用位置索引......