将字符串拆分为多行多列

时间:2021-02-09 20:56:14

标签: sql sql-server azure-sql-database

我有一个类似于下面的数据集(有更多的列,但仅用作示例目的)。

<头>
PersonId LocationId 开始日期 出勤字符串
123 987 2018-09-01 XXXXZZZZ######PPLL
234 678 2018-10-01 PPPPLL######ZZZZXX

我需要实现的是将 AttendanceString 列拆分为多行和多列。考勤字符串需要每 2 个字符分解并分成 2 个不同的列,代表一个上午和下午的时段。一个例子将使这一点更清楚,所以让我们使用第一条记录。期望的结果是:

<头>
PersonId LocationId 开始日期 MorningAttendanceString AfternoonAttendanceString
123 987 2018-09-01 X X
123 987 2018-09-02 X X
123 987 2018-09-03 Z Z
123 987 2018-09-04 Z Z
123 987 2018-09-05 # #

对于每个字符串,我们需要迭代直到我们到达最后一个字符,将新记录添加到对应于不同日期的表中,并且早上/下午有单独的记录。

我能够使用本文末尾的代码实现所需的逻辑。但是,因为这可能涉及大约 90/100k 条记录,并且其中大部分需要分成 365 条记录,所以我们谈论的是需要创建 33-35M 条记录。我正在使用一个带有一段时间循环的游标来获得这些结果,即使通常要避免使用游标,我也不认为这是这里的问题。这通常需要大约 30 分钟才能在 S6 标准层中的 Azure SQL 数据库上运行。

我是否缺少任何选项来提高效率?理想情况下,我想减少处理数据所需的时间。我不能真正使用 split_string 因为它需要一个特定的字符来打破字符串。

DECLARE @LocationID as int;
DECLARE @AttendanceString as varchar(1000);
DECLARE @StartDate as date;
DECLARE @PersonId as int;

DECLARE @MorningValue as char(1);
DECLARE @AfternoonValue as char(1);

DECLARE @DayDate as date;
DECLARE @AttendanceCursor as cursor;

DECLARE @i as int;

SET @AttendanceCursor = CURSOR LOCAL FAST_FORWARD FOR
SELECT 
    PersonId,
    LocationId, 
    StartDate, 
    AttendanceString
FROM 
    SourceTable
WHERE 
    StartDate >= '2019-08-01'

BEGIN
    
    SET NOCOUNT ON
    OPEN @AttendanceCursor 
    FETCH NEXT FROM @AttendanceCursor INTO @PersonId , @AttendanceString, @StartDate, @LocationId;
    WHILE @@FETCH_STATUS = 0
    BEGIN

        SET @i = 1;
        SET @DayDate = @StartDate;

        WHILE (@i < len(@AttendanceString))
        BEGIN
            SET @MorningValue = SUBSTRING(@AttendanceString,@i,1);
            SET @AfternoonValue = SUBSTRING(@AttendanceString,@i+1,1);
            
            BEGIN TRY
                INSERT INTO FinalTable
                SELECT @DayDate , @LocationId, @PersonId, @MorningValue @AfternoonValue
            END TRY
            BEGIN CATCH
               ...
            END CATCH
            
            SET @i = @i+2;
            SET @DayDate = DATEADD(DD,1,@DayDate );
        END

        FETCH NEXT FROM @AttendanceCursor INTO @PersonId , @AttendanceString, @StartDate, @LocationId;

    END  

    CLOSE @AttendanceCursor ;
    DEALLOCATE @AttendanceCursor ;

1 个答案:

答案 0 :(得分:3)

我将它重写为基于集合的查询而不是游标,并且这个整个脚本,包括在我的 Azure SQL DB 上运行大约 40 秒生成 100k 测试记录,这是一个无服务器代5 个,只有 1 个 vCore。完成脚本以确保您理解它。

注意,我要删除表格,因为这是一个测试设备 - 不是生产代码:

------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------

DROP TABLE IF EXISTS dbo.sourceTable
DROP TABLE IF EXISTS dbo.finalTable
GO

CREATE TABLE dbo.sourceTable
(
    PersonId            INT IDENTITY PRIMARY KEY,
    LocationId          INT,
    StartDate           DATE,
    AttendanceString    VARCHAR(1000)
)
GO

CREATE TABLE dbo.finalTable
(
    DayDate         DATE, 
    LocationId      INT, 
    PersonId        INT, 
    MorningValue    CHAR(1), 
    AfternoonValue  CHAR(1)
)
GO

-- Generate some test data
SET IDENTITY_INSERT dbo.sourceTable ON

INSERT INTO dbo.sourceTable ( PersonId, LocationId, StartDate, AttendanceString )
VALUES
    ( 123, 987, '2018-09-01', 'XXXXZZZZ######PPLL' ),
    ( 234, 678, '2018-10-01', 'PPPPLL######ZZZZXX' ),
    ( 567, 999, '2018-10-01', 'abcdefghijklmnopqr' )

SET IDENTITY_INSERT dbo.sourceTable OFF
GO

-- Setup END
------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------
-- Test Data START
------------------------------------------------------------------------------------------------

;WITH cte AS (
SELECT 1 rn, 1 locationId, CAST( '1 Jan 2018' AS DATE ) startDate, REPLACE( NEWID(), '-', '' ) AttendanceString
UNION ALL
SELECT rn + 1, rn % 42, DATEADD( day, 1, startDate ), REPLACE( NEWID(), '-', '' )
FROM cte
WHERE rn < 100
)
INSERT INTO dbo.sourceTable ( LocationId, StartDate, AttendanceString )
SELECT LocationId, StartDate, AttendanceString 
FROM cte
ORDER BY 1;
GO 1000


-- Test Data END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Rewritten query START
------------------------------------------------------------------------------------------------





DROP TABLE IF EXISTS #tmp;

;WITH cte AS (
SELECT 1 n, 1 x, 2 y
UNION ALL
SELECT n + 1, x + 2, y + 2
FROM cte
WHERE n < 20
)
SELECT
    personId,
    locationId,
    DATEADD( day, c.n - 1, startDate ) xdate,
    SUBSTRING ( attendanceString, c.x, 1 ) a,
    SUBSTRING ( attendanceString, c.y, 1 ) b

INTO #tmp

FROM dbo.sourceTable s
    CROSS APPLY cte c
WHERE c.y <= LEN(attendanceString);


select *
from sourceTable
WHERE personId = 999

select *
from #tmp
WHERE personId = 999

select *
from #tmp
WHERE locationId = 999

-- Rewritten query END
------------------------------------------------------------------------------------------------

修改后的脚本版本以延长出勤 ID here