有效地选择相关表中存在行的行

时间:2017-12-07 21:30:26

标签: mysql performance percona mysql-5.7 percona-xtradb-cluster

我在我目前正在处理的系统中有一个重复出现的模式,例如,我需要选择在可能公司列表下拥有订单的所有用户。或者,如果存在标记此用户的记录,则需要选择用户。

我的users表包含430,825条记录,所以这不应该是那么难以处理的。现在我关闭了,我有一个查询得到了我寻找的.047s执行时间,但是如果我再添加一个,它会变得很慢。

这是我当前的查询,快速查询:

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

(额外的select*from(...)是因为这个https://stackoverflow.com/a/1434712/728236

在中间,我通过电子邮件地址进一步拉动用户,同时检查其他相关表格,查找可能与此用户相关的电子邮件。比如,下一篇文章会在向客户发送报价时搜索用户,包括他们的CC地址。

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`userquotesemails`
            where`Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

我已经添加了备用表来搜索电子邮件,但现在查询需要3.016秒,这样会慢一些。看起来奇怪的是,当我构建这个查询时,最后一部分似乎是性能的转折点,这是什么原因?

第一个和第二个分别解释

+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
|  1 | PRIMARY            | <subquery6>       |  | ALL            |                                                                                             |                              |      |                       |   | 0.00  | Using temporary; Using filesort |
|  1 | PRIMARY            | users             |  | eq_ref         | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers                                        | PRIMARY                      |  144 | <subquery6>.UserID    | 1 | 50.00 | Using where                     |
|  6 | MATERIALIZED       | userassociations  |  | ref            | userassociations_UserID,userassociations__Email                                             | userassociations__Email      | 1026 | const                 | 3 | 10.00 | Using where                     |
| 10 | DEPENDENT SUBQUERY | usercustomerflags |  | ref            | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx |  144 | sterling.users.UserID | 1 | 10.00 | Using where                     |
|  8 | DEPENDENT SUBQUERY | usercustomerflags |  | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           | usercustomerflags_UserID_idx |  144 | func                  | 1 | 4.95  | Using where                     |
|  4 | DEPENDENT SUBQUERY | quoterequests     |  | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest        |  145 | func                  | 2 | 5.00  | Using where                     |
|  2 | DEPENDENT SUBQUERY | invoices          |  | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_UserID_idx           |  145 | func                  | 1 | 3.33  | Using where                     |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+

+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
|  1 | PRIMARY            | users             |  | ref | fk_users_1_idx,users_Customers                                                              | users_Customers                |    4 | const                 | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags |  | ref | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx   |  144 | sterling.users.UserID |      1 | 10.00  | Using where |
| 10 | SUBQUERY           | usercustomerflags |  | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           |                                |      |                       |   3509 | 4.94   | Using where |
|  8 | SUBQUERY           | userquotesemails  |  | ref | userquotesemails_Email__Active,userquotesemails_UserID                                      | userquotesemails_Email__Active | 1027 | const,const           |      1 | 100.00 |             |
|  6 | SUBQUERY           | userassociations  |  | ref | userassociations_UserID,userassociations__Email                                             | userassociations__Email        | 1026 | const                 |      3 | 10.00  | Using where |
|  4 | SUBQUERY           | quoterequests     |  | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID        |  144 | const                 |  16702 | 10.00  | Using where |
|  2 | SUBQUERY           | invoices          |  | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_CompanyID_idx          |  144 | const                 |  17678 | 10.00  | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+

另外,我尝试过使用连接,例如加入invoices表等,但后来我遇到了每个invoicequoterequest联接接收的重复用户行的问题,以及分组/不同&amp;在几分钟内,对结果数据的排序变得非常缓慢。

我也试过了#34;存在&#34;第一个查询的版本,由文档https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html建议,如此

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or exists(select 1 
        from`invoices`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or exists(select 1 
        from`quoterequests`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`userassociations`
        where`_Email`='brian@yeet.com'
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`usercustomerflags`
        where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

但这让我达到5.516秒,所以这绝对不是正确的方向。

以我尝试的方式选择数据的最有效方法是什么?或者我是否需要重新构建一些表格以获得我正在寻找的性能?

我认为我已经分离出了最小的子问题和瓶颈。这是我的轻量级查询

select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1 
where(`users`.`Email`='brian@stumpyinc.com'
    or`userassociations`.`_Email`='brian@stumpyinc.com')
and`users`.`Deleted`=0 
order by`users`.`DateTimeAdded`desc 
limit 50;

和解释

+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users            |  | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers         |   4 | const                 | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations |  | ref | userassociations_UserID                                | userassociations_UserID | 144 | sterling.users.UserID |      1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+

此查询大约需要1.5秒才能执行

CREATE TABLE `users` (
  `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
    ...
  `Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  `DateTimeAdded` datetime DEFAULT NULL,
    ...
  `Deleted` int(1) NOT NULL DEFAULT '0',
    ...
  `_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
  `_InvoiceCount` int(11) NOT NULL DEFAULT '0',
  `_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  PRIMARY KEY (`UserID`),
  UNIQUE KEY `UserID_UNIQUE` (`UserID`),
    ...
  KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
  KEY `unify_email_INDEX` (`Email`(191),`UserID`),
    ...
  KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
    ...
  KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
  FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
    ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


CREATE TABLE `userassociations` (
   `UserAssociationID` binary(16) NOT NULL,
   `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
   `__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
   `__Active` tinyint(1) NOT NULL DEFAULT '1',
   `__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
   `__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
   PRIMARY KEY (`UserAssociationID`),
   KEY `userassociations_UserID` (`UserID`),
   KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
   KEY `userassociations___UserID` (`__UserID`),
   KEY `userassociations__Email` (`_Email`),
   CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

嗯......所以看起来它确实有效,但是我发现了一对它看起来效率不高的表,这是我的users和{ {1}}表。

我有这些索引:

invoices

和查询

users:    INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`) 
invoices: INDEX(`UserID`, `__Active`) 
invoices: INDEX(`CompanyID`)
users:    INDEX(`UserID`, `Deleted`)

这个查询单独需要0.3秒,这对我来说感觉很慢,因为它没有充分利用索引,特别是因为select`users`.`UserID`,`users`.`DateTimeAdded` from`users` join`invoices`on`invoices`.`UserID`=`users`.`UserID` and`invoices`.`__Active`=1 where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8' and`users`.`Deleted`=0 order by`DateTimeAdded`desc limit 200; 只有430,997行和users有194,180,这看起来应该是一个非常简单的查询。

编辑:实际上它比这更糟糕,如果给出的CompanyID只包含~4行,则此查询需要3.5秒

invoices

2 个答案:

答案 0 :(得分:1)

对于那个较小的问题:

( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
    from  `users` AS u
    where  u.`Email` = 'brian@stumpyinc.com'
      and  u.`Deleted` = 0
      AND EXISTS ( SELECT * FROM `userassociations`
                       WHERE UserId = u.UserID
                         AND __Active = 1 )
    order by  u.`DateTimeAdded` desc
    limit  50
)
UNION DISTINCT
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
    from  `users` AS u
    JOIN  `userassociations` AS ua
         ON  ua.`UserID` = u.`UserID`
        and  ua.`__Active` = 1
    where  ua.`_Email` = 'brian@stumpyinc.com' 
      and  u.`Deleted`=0
    order by  u.`DateTimeAdded` desc
    limit  50
)
order by `DateTimeAdded` desc
limit  50

需要这些:

u:  INDEX(Email, Deleted, DateTimeAdded)  -- date last
ua: INDEX(UserId, __Active)   -- either order
ua: INDEX(_Email)
u:  INDEX(UserID, Deleted)

(如果您遇到语法错误,请告诉我。如果速度过慢,请提供EXPLAIN。)

索引前缀(Email(191))通常没用。如果它,摆脱它。以下是5种避免它的方法:http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes

PK是一个UNIQUE键,所以摆脱第二个:

PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),

闻起来像UUID;使用ascii(ascii_general_ci),而不是utf8mb4:

... char(36) COLLATE utf8mb4_unicode_ci

INT(1)占用4个字节;使用TINYINT作为标志。

答案 1 :(得分:0)

非常确定您可以使用 public ImageView dealer_Card1, dealer_Card2, dealer_Card3, dealer_Card4, dealer_Card5; public ImageView player_Card1, player_Card2, player_Card3, player_Card4, player_Card5; public void btn_Stand_Click() { do { dealer_Call(); calculate_Dealer_Score(); if (dealer_Score > 21) { for (int i = 0; i < 5; i++) { if (dealer_Card_Array[i] == 'A' && dealer_Score_Count[i] == 11) { dealer_Score_Count[i] = 1; break; } } calculate_Dealer_Score(); } } while (dealer_Score < 17 && dealer_Score <= player_Score && dealer_Card_Number < 5); results(); } public void btn_Stand(View view) { do { dealer_Call(); calculate_Dealer_Score(); if (dealer_Score > 21) { for (int i = 0; i < 5; i++) { if (dealer_Card_Array[i] == 'A' && dealer_Score_Count[i] == 11) { dealer_Score_Count[i] = 1; break; } } calculate_Dealer_Score(); } } while (dealer_Score < 17 && dealer_Score <= player_Score && dealer_Card_Number < 5); if (player_Score == 21) { black_Jack(); } else if (dealer_Score == 21) { dealer_black_Jack(); } else if (dealer_Score > 21) { total = total + (bet * 2); Toast toast = Toast.makeText(getApplicationContext(), "Dealer Bust! You won!", Toast.LENGTH_LONG); toast.setGravity(Gravity.CENTER, 0, 0); toast.show(); disable_Buttons(); alert_Box(); } else results(); } private void card_Image_Switcher() { dealer_Card1.setImageResource(R.drawable.cardback); dealer_Card2.setImageResource(R.drawable.cardback); dealer_Card3.setImageResource(R.drawable.cardback); dealer_Card4.setImageResource(R.drawable.cardback); dealer_Card5.setImageResource(R.drawable.cardback); player_Card1.setImageResource(R.drawable.cardback); player_Card2.setImageResource(R.drawable.cardback); player_Card3.setImageResource(R.drawable.cardback); player_Card4.setImageResource(R.drawable.cardback); player_Card5.setImageResource(R.drawable.cardback); }WHERE的组合替换INNER子句中的所有子查询。试试这个:

LEFT JOIN