在400k记录之后,SSIS脚本转换速度变慢

时间:2013-08-07 21:51:36

标签: ssis

我有一个SSIS转换任务,我将其用作将数据插入SQL Server表的最终目标任务。我使用转换任务而不是SQL Server目标任务的原因是因为我事先不知道我们将要插入的表中的列。

在每个循环容器中,我正在寻找访问DB(以97格式)。控制流的其余部分基本上创建了一个新的SQL数据库和一个表。访问文件是我们称之为“分钟”的数据库,其中包含由另一个进程收集的微小信息。我需要创建一个以'minute'db命名的新SQL DB和一个名为'MINUTE'的表,其中列是根据访问db中的某些信息创建的。对于我们的每个客户,根据他们在其站点上的参数数量,确定我需要在SQL Minute表中创建的列数。

在数据流中,我有两个关键组件:OLE DB源组件(源 - 分钟表)和脚本转换任务(目标 - 分钟表)。

“来源 - 分钟表”从访问数据库中获取数据。 “目标 - 分钟表”转换数据并将其插入相应的数据库和表中。

一切都按预期运作。我在拥有491,000多条记录的数据库上进行了测试,花了1分钟。但是,我正在测试一个拥有50多个参数的大客户,访问数据库包含200多万条记录。包裹飞行直到我达到大约477,000条记录,然后它几乎停止了。我可以等待10分钟,甚至更长时间,直到记录计数更新,然后再继续等待。

我做了很多研究,并遵循了我发现的所有建议和指南。我的数据源没有排序。我在OLE DB Source中使用SQL命令而不是Table等。我已多次更改DefaultBufferMaxRows和DefaultBufferSize的值,并获得相同的结果。

代码:

Public Class ScriptMain
Inherits UserComponent

Private conn As SqlConnection
Private cmd As SqlCommand
Private DBName As SqlParameter
Private columnsForInsert As SqlParameter
Private tableValues As SqlParameter
Private numberOfParams As Integer
Private db As String
Private folderPath As String
Private dbConn As String
Private folder As String
Private columnParamIndex As Integer
Private columnDate As DateTime
Private columnMinValue As Double
Private columnStatus As String
Private columnCnt1 As Int16
Private dateAdded As Boolean = False
Private columnStatusCnt As String
Private columnsConstructed As Boolean = False
Private buildValues As StringBuilder
Private columnValues As StringBuilder
Private i As Integer = 0

'This method is called once, before rows begin to be processed in the data flow.
'
'You can remove this method if you don't need to do anything here.
Public Overrides Sub PreExecute()
    MyBase.PreExecute()

    Try
        'Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
        'conn = New SqlConnection(dbConnection)
        'conn.Open()
        'cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}

        'columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(columnsForInsert)

        'DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(DBName)

        'tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        'cmd.Parameters.Add(tableValues)

        db = Variables.varMinFileName.ToString
        folder = Variables.varMinFolderName.ToString
        folderPath = folder & "\" & db & ".mdb"
        dbConn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & folderPath

        Using SourceDataAdapter As OleDbDataAdapter = New OleDbDataAdapter("SELECT DISTINCT PARAM_INDEX FROM [MINUTE];", dbConn)
            Dim SourceDatatable As New DataTable

            SourceDataAdapter.Fill(SourceDatatable)

            numberOfParams = SourceDatatable.Rows.Count
        End Using

        'columnValues.Append("dtmTime, ")
        buildValues = New StringBuilder
        columnValues = New StringBuilder

        columnValues.Append("dtmTime, ")

    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    Finally

    End Try
End Sub

' This method is called after all the rows have passed through this component.
'
' You can delete this method if you don't need to do anything here.
Public Overrides Sub PostExecute()
    MyBase.PostExecute()
    '
    ' Add your code here
    '
    buildValues = Nothing
    columnValues = Nothing
End Sub

Public Overrides Sub Input0_ProcessInput(Buffer As Input0Buffer)
    While Buffer.NextRow()
        Input0_ProcessInputRow(Buffer)
    End While
End Sub

'This method is called once for every row that passes through the component from Input0.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Dim column As IDTSInputColumn100
    Dim rowType As Type = Row.GetType()
    Dim columnValue As PropertyInfo
    Dim result As Object
    Dim rtnValue As String = Variables.varMinFileName.Replace("_", "")
    Dim colName As String

    Try
        For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
            columnValue = rowType.GetProperty(column.Name)

            colName = column.Name.ToString

            If Not colName.Contains("NULL") Then
                'If Not columnValue Is Nothing Then
                Select Case column.Name.ToString
                    Case "PARAM_INDEX"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.PARAMINDEX
                        columnParamIndex = CType(result, Byte)
                        If columnsConstructed = False And i <= numberOfParams - 1 Then
                            columnValues.Append(String.Format("VALUE_{0}, STATUS_{0}, ", columnParamIndex.ToString))
                        End If
                        Exit Select
                    Case "dtmTIME"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.dtmTIME
                        columnDate = CType(result, DateTime)
                        If dateAdded = False Then ' only need to add once since rows are vertical
                            buildValues.Append("'" & columnDate & "', ")
                            dateAdded = True
                        End If
                        Exit Select
                    Case "MIN_VALUE"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINVALUE
                        columnMinValue = CType(result, Double)
                        buildValues.Append(columnMinValue & ", ")
                        Exit Select
                    Case "MIN_STATUS"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINSTATUS
                        columnStatus = CType(result, String)
                        Exit Select
                    Case "MIN_CNT_1"
                        'result = columnValue.GetValue(Row, Nothing)
                        result = Row.MINCNT1
                        columnCnt1 = CType(result, Byte)
                        columnStatusCnt = columnStatus & "010" & columnCnt1.ToString.PadLeft(5, "0"c) & "-----"
                        buildValues.Append("'" & columnStatusCnt & "', ")
                    Case Else
                        Exit Select
                End Select
                'End If
            End If
        Next

        If i = numberOfParams - 1 Then
            If columnsConstructed = False Then
                columnValues.Remove(columnValues.Length - 2, 1)
            End If

            buildValues.Remove(buildValues.Length - 2, 1)

            Dim valueResult As String = buildValues.ToString()

            SetStoredProc()

            cmd.Parameters("@Columns").Value = columnValues.ToString
            cmd.Parameters("@DBName").Value = "[" & rtnValue & "].[dbo].[MINUTE]"
            cmd.Parameters("@Values").Value = valueResult
            cmd.ExecuteNonQuery()

            buildValues.Clear()

            columnsConstructed = True
            dateAdded = False
            columnParamIndex = 0
            columnMinValue = 0
            columnStatus = String.Empty
            columnCnt1 = 0

            i = 0
            conn.Close()
            conn.Dispose()
        Else
            i += 1
        End If
    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    Finally
        'buildValues = Nothing
        'columnValues = Nothing
    End Try
End Sub

Private Sub SetStoredProc()
    Try
        Dim dbConnection As String = "Server=(local)\SQLExpress;Database=DataConversion;User ID=sa;Password=sa123;"
        conn = New SqlConnection(dbConnection)
        conn.Open()
        cmd = New SqlCommand("dbo.InsertValues", conn) With {.CommandType = CommandType.StoredProcedure}

        columnsForInsert = New SqlParameter("@Columns", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(columnsForInsert)

        DBName = New SqlParameter("@DBName", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(DBName)

        tableValues = New SqlParameter("@Values", SqlDbType.VarChar, -1) With {.Direction = ParameterDirection.Input}
        cmd.Parameters.Add(tableValues)
    Catch ex As Exception
        Dim writer As New StreamWriter("C:\MinuteLog.log", True, System.Text.Encoding.ASCII)

        writer.WriteLine(ex.Message)
        writer.Close()
        writer.Dispose()
    End Try
End Sub
End Class

由于我无法在此处上传图片,因此我添加了一个我创建的博客链接,其中包含大量屏幕截图,以帮助您了解此处提到的问题: SSIS slows down during transformation task

在确定我的包裹在400k记录之后减速以及在合理时间内不处理所有200多万条记录的任何帮助都非常感谢!

谢谢, 麦

2 个答案:

答案 0 :(得分:2)

这可能不是非常有用,但我的猜测是你的内存不足。如果SSIS必须在页面上按照我的经验使用它。

你可以在几个较小的运行中以某种方式批量处理工作吗?

答案 1 :(得分:1)

完整的解决方案可以在我的博客上查看截图 - SSIS slowdown solved

为了避免在大量记录被转换并插入SQL Server作为我的目的地时SSIS变慢,我重新设计了我的SSIS包。我没有为每个通过缓冲区的记录在数据转换任务中进行插入,而是将其删除并使用存储过程进行批量插入。为了实现这一点,我将每个访问数据库中的数据读入我的SQL Server实例中名为“MINUTE”的表中。这个分钟表具有与访问DB相同的模式,我让SSIS将所有数据导入到该表中。导入数据后,执行我的存储过程,转换此分钟表中的数据(水平记录),然后批量插入到我的新目标MINUTE SQL表(一个垂直记录)。

执行批量插入和转换数据的存储过程如下所示:

PROCEDURE [dbo].[InsertMinuteBulk]
 -- Add the parameters for the stored procedure here
 (@Columns varchar(MAX), @DBName varchar(4000))
 AS
 BEGIN
 DECLARE @SQL varchar(MAX)

SET @SQL =’;WITH Base AS (
 SELECT dtmTime,
 param_index,
 CONVERT(nvarchar(16), MIN_VALUE) AS [VALUE_],
 CONVERT(nvarchar(3), MIN_STATUS) + ”000” + LEFT(replicate(”0”,5) + CONVERT(nvarchar(5), MIN_CNT_1),5) + ”—–” AS [STATUS_]
 FROM [DataConversion].[dbo].[MINUTE]
 )
 ,norm AS (
 SELECT dtmTime, ColName + CONVERT(varchar, param_index) AS ColName, ColValue
 FROM Base
 UNPIVOT (ColValue FOR ColName IN ([VALUE_], [STATUS_])) AS pvt
 )
 INSERT INTO ‘ + @DBName + ‘
SELECT *
 FROM norm
 PIVOT (MIN(ColValue) FOR ColName IN (‘+@Columns+’)) AS pvt’

EXEC (@SQL);

在“数据流”任务中,“分钟数据源”是ADO.NET数据源,并将数据提供到我的SQL Server目标 - “分钟数据目标”。

在控制流程中,“批量插入分钟数据”的最终任务执行批量插入存储过程。

考虑到我正在阅读,转换和插入的数据大小,该软件包现在可以不间断地运行并且非常快。

我已将该程序包作为SSIS作业运行,并且需要38分钟才能完成转换7个月(或7分钟访问数据库)的分钟数据,每个访问数据库中有超过200万行。