如何使用脚本组件过滤平面文件源

时间:2015-05-14 11:37:20

标签: ssis-2012

我有以下情况: 我有数以千计的文本文件,格式如下。列名写在单独的行中,行值(|)分隔行值。

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd

#Some Text
#Some Text
#Some Text
#Some Text
#Some Text
START-OF-FIELDS
Field1
Field2
Field3
------
FieldN
END-OF-FIELDS
TIMESTARTED=Tue May 12 16:04:42 JST 2015
START-OF-DATA
Field1Value|Field2value|Field3Value|...|Field N Value
Field1Value|Field2value|Field3Value|...|Field N Value
------|...........|----|-------
END-OF-DATA
DATARECORDS=30747
TIMEFINISHED=Tue May 12 16:11:53 JST 2015
END-OF-FILE

现在我有一个相应的SQL Server表,我可以轻松地将数据作为目标加载。 由于我是SSIS的新手,因此无法编写脚本组件以便我可以过滤源文本文件并轻松加载到sql server表中。

提前致谢!

2 个答案:

答案 0 :(得分:0)

有几种方法可以做到这一点。如果文件的格式是常量,则平面文件连接管理器编辑器有一些有用的属性。例如,您可以将新的平面文件连接添加到连接管理器中。有一些属性,如"行要跳过"对于上面的文件,你可以将它设置为18.然后它将从" |"开始在列行。

平面文件连接管理器的另一个可能有用的属性是,如果打开平面文件连接管理器,然后单击侧面菜单中的列,则可以将列分隔符设置为管道" | "

但是如果文件的格式会改变,例如可变数量的标题行,您可以使用脚本任务删除任何非管道行。例如页眉和页脚。

例如,您可以添加file.readalllines等方法,然后根据需要编辑或删除行,然后保存文件。

有关该方法的信息如下: https://msdn.microsoft.com/en-us/library/s2tte0y1%28v=vs.110%29.aspx

e.g。删除脚本任务中的最后一行

string[] lines = File.ReadAllLines( "input.txt" );
StringBuilder sb = new StringBuilder();
int count = lines.Length - 1; // all except last line
for (int i = 0; i < count; i++)
{
    sb.AppendLine(lines[i]);
}
File.WriteAllText("output.txt", sb.ToString());

答案 1 :(得分:0)

在SSIS SCript组件任务中使用VB脚本作为源

enter code here

Imports System
Imports System.Data
Imports System.Math
Imports System.IO
Imports Microsoft.SqlServer.Dts.Runtime
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper



<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
    Inherits UserComponent
    'Private strSourceDirectory As String
    'Private strSourceFileName As String
    Private strSourceSystem As String
    Private strSourceSubSystem As String
    Private dtBusinessDate As Date


    Public Overrides Sub PreExecute()
        MyBase.PreExecute()
        '
        ' Add your code here for preprocessing or remove if not needed
        ''

    End Sub

    Public Overrides Sub PostExecute()
        MyBase.PostExecute()
        '
        ' Add your code here for postprocessing or remove if not needed
        ' You can set read/write variables here, for example:
        Dim strSourceDirectory As String = Me.Variables.GLOBALSourceDirectory.ToString()
        Dim strSourceFileName As String = Me.Variables.GLOBALSourceFileName.ToString()
        'Dim strSourceSystem As String = Me.Variables.GLOBALSourceSystem.ToString()
        'Dim strSourceSubSystem As String = Me.Variables.GLOBALSourceSubSystem.ToString()
        'Dim dtBusinessDate As Date = Me.Variables.GLOBALBusinessDate.Date


    End Sub

    Public Overrides Sub CreateNewOutputRows()
        '
        ' Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
        ' For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
        '
        Dim sr As System.IO.StreamReader
        Dim strSourceDirectory As String = Me.Variables.GLOBALSourceDirectory.ToString()
        Dim strSourceFileName As String = Me.Variables.GLOBALSourceFileName.ToString()
        'Dim strSourceSystem As String = Me.Variables.GLOBALSourceSystem.ToString()
        'Dim strSourceSubSystem As String = Me.Variables.GLOBALSourceSubSystem.ToString()
        'Dim dtBusinessDate As Date = Me.Variables.GLOBALBusinessDate.Date

        'sr = New System.IO.StreamReader("C:\QRM_SourceFiles\BBG_BONDS_OUTPUT_YYYYMMDD.txt")
        sr = New System.IO.StreamReader(strSourceDirectory & strSourceFileName)
        Dim lineIndex As Integer = 0
        While (Not sr.EndOfStream)
            Dim line As String = sr.ReadLine()
            If (lineIndex <> 0) Then 'remove header row
                Dim columnArray As String() = line.Split(Convert.ToChar("|"))
                If (columnArray.Length > 1) Then
                    Output0Buffer.AddRow()
                    Output0Buffer.Col0 = columnArray(0).ToString()
                    Output0Buffer.Col3 = columnArray(3).ToString()
                    Output0Buffer.Col4 = columnArray(4).ToString()
                    Output0Buffer.Col5 = columnArray(5).ToString()
                    Output0Buffer.Col6 = columnArray(6).ToString()
                    Output0Buffer.Col7 = columnArray(7).ToString()
                    Output0Buffer.Col8 = columnArray(8).ToString()
                    Output0Buffer.Col9 = columnArray(9).ToString()
                    Output0Buffer.Col10 = columnArray(10).ToString()
                    Output0Buffer.Col11 = columnArray(11).ToString()
                    Output0Buffer.Col12 = columnArray(12).ToString()
                    Output0Buffer.Col13 = columnArray(13).ToString()
                    Output0Buffer.Col14 = columnArray(14).ToString()
                    Output0Buffer.Col15 = columnArray(15).ToString()
                    Output0Buffer.Col16 = columnArray(16).ToString()
                    Output0Buffer.Col17 = columnArray(17).ToString()
                    Output0Buffer.Col18 = columnArray(18).ToString()
                    Output0Buffer.Col19 = columnArray(19).ToString()
                    Output0Buffer.Col20 = columnArray(20).ToString()
                    Output0Buffer.Col21 = columnArray(21).ToString()
                    Output0Buffer.Col22 = columnArray(22).ToString()
                    Output0Buffer.Col23 = columnArray(23).ToString()
                    Output0Buffer.Col24 = columnArray(24).ToString()

                End If
            End If
            lineIndex = lineIndex + 1
        End While
        sr.Close()

    End Sub

End Class

代码结束