Question

我正在尝试编写一个插入数据的存储过程，但有一些相当简单的检查似乎是一种很好的做法。

该表目前有300列，其中有一个序列primary_key_id，我们要在插入之前检查一列，比如address，一个child_of列，当有新数据（我们正在插入的内容），然后是剩下的297列。

所以，让我们说这个表目前是这样的：

----------------------------------------------------------------------
|PK    |Address             |child_of    |other_attr_1|other_attr2|...
----------------------------------------------------------------------
|1     | 123 Main St        |NULL        |...         |...        |...
|2     | 234 South Rd       |NULL        |...         |...        |...
|3     | 345 West Rd        |NULL        |...         |...        |...
----------------------------------------------------------------------

我们要添加此行，其中地址在new列中有新属性other_attr_1。我们将使用child_of来引用上一行记录的primary_key_id。这将允许一个基本的历史（我希望）。

|4     | 123 Main St        |1           |new         |...        |...

如何检查存储过程中的重复？如果它存在于数据库中，我是否会迭代每个输入参数？

这是我到目前为止的代码：

USE [databaseINeed]
-- SET some_stuff ON --or off :)
-- ....
-- GO
CREATE Procedure [dbo].[insertNonDuplicatedData]
  @address text, @other_attr_1 numeric = NULL, @other_attr_2 numeric = NULL, @other_attr_3 numeric = NULL,....;
AS
BEGIN TRY
  -- If the address already exists, lets check for updated data
  IF EXISTS (SELECT 1 FROM tableName WHERE address = @address)  
    BEGIN 
      -- Look at the incoming data vs the data already in the record

      --HERE IS WHERE I THINK THE CODE SHOULD GO, WITH SOMETHING LIKE the following pseudocode:
      if any attribute parameter values is different than what is already stored
        then Insert into tableName (address, child_of, attrs) Values (@address, THE_PRIMARY_KEY_OF_THE_RECORD_THAT_SHARES_THE_ADDRESS, @other_attrs...)    

      RETURN
    END       
  -- We don't have any data like this, so lets create a new record altogther
  ELSE
    BEGIN
      -- Every time a SQL statement is executed it returns the number of rows that were affected.  By using "SET NOCOUNT ON" within your stored procedure you can shut off these messages and reduce some of the traffic.
      SET NOCOUNT ON
      INSERT INTO tableName (address, other_attr_1, other_attr_2, other_attr_3, ...)
      VALUES(@address,@other_attr_1,@other_attr_2,@other_attr_3,...)
    END
END TRY
BEGIN CATCH
  ...
END CATCH

我尝试在桌面上添加CONSTRAINT，以便在通过以下方式检查address列时，需要唯一的所有297个属性：

ALTER TABLE tableName ADD CONSTRAINT
  uniqueAddressAttributes UNIQUE -- tried also with NONCLUSTERED
   (other_attr_1,other_attr_2,...)

但是我收到了错误

错误：在索引SQL状态中不能使用超过32列：54011

我认为我可能会走错路，试图依靠独特的约束。

Answer 1

肯定有这样数量的列不是一个好习惯，无论如何你可以尝试使用INTERSECT一次检查值

-- I assume you get the last id to set the 
-- THE_PRIMARY_KEY_OF_THE_RECORD_THAT_SHARES_THE_ADDRESS
DECLARE @PK int = (SELECT MAX(PK) FROM tableName WHERE address = @address)

-- No need for an EXISTS(), just check the @PK
IF @PK IS NOT NULL 
BEGIN

    IF EXISTS(
        -- List of attributes from table
        -- Possibly very poor performance to get the row by ntext
        SELECT other_attr_1, other_attr_2 ... FROM tableName WHERE PK = @PK
        INTERSECT
        -- List of attributes from variables
        SELECT @other_attr_1, @other_attr_2 ...
    )
    BEGIN
        Insert into tableName (address, child_of, attrs) Values 
        (@address, @PK, @other_attr_1, @other_attr_2 ...)   
    END

END

Answer 2

使用那么多列，您可以考虑在插入时对所有列进行哈希处理，然后将结果存储在（又一个）列中。在存储过程中，您可以对输入参数执行相同的哈希，然后检查哈希冲突，而不是在所有这些字段上进行字段比较。

您可能需要进行一些数据转换，以使您的300ish列全部为nvarchar，以便它们可以连接到HASHBYTES函数的输入。此外，如果任何列可能为NULL，则您必须考虑如何处理它们。例如，如果现有记录的字段216设置为NULL并且尝试添加的行完全相同，除了字段216是空字符串，那是匹配吗？

此外，对于那么多列，并置可能会在hashbytes函数的最大输入大小上运行，因此您可能需要将其分解为多个较小块的哈希值。

大家都这么说，你的架构真的需要这个300的列结构吗？如果你能摆脱这种局面，我不会在这里变得非常有创意。

Answer 3

我没有足够的代表发表评论，所以我发帖作为答案。

Eric的SQL应该从IF EXISTS更改为IF NOT EXISTS

我认为理想的逻辑应该是：

如果存在现有地址记录，请检查是否有任何属性不同。
如果任何属性不同，请插入新地址记录，将最新现有地址记录的主键存储在child_of列中

重构克里斯＆amp; Eric的SQL：

USE [databaseINeed]
-- SET some_stuff ON --or off :)
-- ....
-- GO
CREATE Procedure [dbo].[insertNonDuplicatedData]
  @address text, @other_attr_1 numeric = NULL, @other_attr_2 numeric = NULL, @other_attr_3 numeric = NULL,....;
AS
BEGIN TRY
  -- If the address already exists, lets check for updated data
  IF EXISTS (SELECT 1 FROM tableName WHERE address = @address)  
    BEGIN 
      -- Look at the incoming data vs the data already in the record

      --HERE IS WHERE I THINK THE CODE SHOULD GO, WITH SOMETHING LIKE the following pseudocode:

        DECLARE @PK int = (SELECT MAX(PK) FROM tableName WHERE address = @address)
        IF NOT EXISTS(
            -- List of attributes from table
            -- Possibly very poor performance to get the row by ntext
            SELECT other_attr_1, other_attr_2 ... FROM tableName WHERE PK = @PK
            INTERSECT
            -- List of attributes from variables
            SELECT @other_attr_1, @other_attr_2 ...
        )
        BEGIN
            -- @simplyink: existing address record has different combination of (297 column) attribute values
            --          at least one attribute column is different (no intersection)
            Insert into tableName (address, child_of, attrs) Values 
            (@address, @PK, @other_attr_1, @other_attr_2 ...)   
        END


      RETURN
    END       
  -- We don't have any data like this, so lets create a new record altogther
  ELSE
    BEGIN
      -- Every time a SQL statement is executed it returns the number of rows that were affected.  By using "SET NOCOUNT ON" within your stored procedure you can shut off these messages and reduce some of the traffic.
      SET NOCOUNT ON
      INSERT INTO tableName (address, other_attr_1, other_attr_2, other_attr_3, ...)
      VALUES(@address,@other_attr_1,@other_attr_2,@other_attr_3,...)
    END
END TRY
BEGIN CATCH
  ...
END CATCH

在插入存储过程时检查重复

3 个答案: