更好的实践:删除和插入与更新

时间:2015-07-23 12:40:39

标签: sql-server database-performance database-administration

我需要从CSV文件解析数据(~60000行)并将它们写入MSSQL表(数据是日期/时间和值,这是一个十进制数)。 每天我都会得到一个这样的CSV文件。 问题是,在我每天获得的CSV文件中,我有过去5天的数据,这意味着我有过去几天已经写过的日期数据,但是我需要将其替换为来自文件。

我试图在两种方法之间做出决定: 批量删除我需要在获取新CSV文件时重新写入的旧数据,并插入INSERT,然后根据日期和时间以及ID查找每条记录并更新它。

1.什么是更好的做法,可以减少数据库中的碎片和维护问题?

  1. 从Performance的角度来看哪个会更便宜?
  2. 如果要在两者之间进行选择,我更喜欢保持我的数据库在高性能状态下保持良好状态,因为无论如何都会在夜间写入文件。

    编辑:如果我在批量删除和插入新数据后添加了每日重建索引的维护计划,这是否足以避免碎片问题,或者有什么我可以我不见了?

3 个答案:

答案 0 :(得分:1)

更快更好更好的方法是删除所有旧数据,使用SSIS导入数据或在没有SSIS的情况下批量插入,然后重建碎片索引。以script为例。

答案 1 :(得分:0)

我将插入您CSV文件中的所有数据并删除重复数据。

以下代码可帮助您删除重复项。 我希望它可以帮助你:)

delete b from    your_table c join
(SELECT max(a.id) id, a.date
 FROM your_table a 

GROUP BY a.date 
having count(0) > 1
) as b
on c.date = b.date
and c.id <> b.id

答案 2 :(得分:0)

这是一种使用登台表和解析的CSV数据的MERGE技术。或者,您可以使用表值参数而不是登台表源。

关于碎片问题,它主要取决于在现有目标表日期范围内插入的新行数。如果没有该范围内的新行,碎片是无关紧要的(如下面的脚本所示,不足3%。如果碎片成为问题,您可以在ETL之后执行索引REBUILDREORGANIZE

CREATE TABLE dbo.Test(
      TestDateTime datetime2(0) NOT NULL 
        CONSTRAINT PK_Test PRIMARY KEY
    , TestData int NOT NULL
    );
CREATE TABLE dbo.TestStaging(
      TestDateTime datetime2(0) NOT NULL
        CONSTRAINT PK_TestStaging PRIMARY KEY
    , TestData int NOT NULL
    );
GO

--load 10 days into main table (61710 per day)
WITH 
    t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
    ,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
    ,t256K AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) - 1 AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t4 AS c)
INSERT INTO dbo.Test WITH(TABLOCKX) (TestDateTime, TestData) 
SELECT DATEADD(second, num*7, CAST('2015-07-01T00:00:00' AS datetime2(0))), num
FROM t256K
WHERE num <= 123420;
GO

--load 4 most recent days with new values plus 1 new day into staging table
WITH 
    t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
    ,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
    ,t256K AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) - 1 AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t4 AS c)
INSERT INTO dbo.TestStaging WITH(TABLOCKX) (TestDateTime, TestData) 
SELECT DATEADD(second, num*7, CAST('2015-07-07T00:00:06' AS datetime2(0))), num
FROM t256K
WHERE num <= 61710;
GO

--show fragmentation before MERGE
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Test'), NULL, NULL, 'DETAILED');
GO

MERGE dbo.Test AS target
USING dbo.TestStaging AS source ON
    source.TestDateTime = target.TestDateTime
WHEN MATCHED THEN
    UPDATE SET TestData = source.TestData
WHEN NOT MATCHED BY target THEN
    INSERT (TestDateTime, TestData) VALUES (source.TestDateTime, source.TestData);
GO

--show fragmentation after MERGE
SELECT * 
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID(N'dbo.Test'), NULL, NULL, 'DETAILED');
GO
相关问题