在双引号和额外引号内阅读CSV

时间:2017-12-19 15:00:34

标签: .net vb.net csv

我的CSV文件如下所示:

"Name1", "A test, which "fails" all the time"
"Name2", "A test, which "fails" all the time"
"Name3", "A test, which "fails" all the time"

我的代码是:

Using parser As New FileIO.TextFieldParser(filepath)
        parser.Delimiters = New String() {","}
        parser.HasFieldsEnclosedInQuotes = True
        parser.TrimWhiteSpace = False
        Dim currentRow As String()
        While Not parser.EndOfData
            Try
                currentRow = parser.ReadFields()

            Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                MsgBox("Line " & ex.Message &
                "is not valid and will be skipped.")
            Finally

            End Try

        End While
    End Using

我得到的错误是无法使用当前的分隔符传递第1行。无效,将被跳过。 起初,我认为逗号是问题,但看起来问题是引号内的引号

任何想法如何阅读?

PS。我的代码所面对的文件通常在引号内没有引号,所以我正在寻找一种快速,可靠,通用的方式来读取文件。 从我读到的结果来看,正则表达式非常重要。

4 个答案:

答案 0 :(得分:0)

此文件包含无效的CSV,通常无法解析。所以你应该修复“乱七八糟”的来源。但是,如果你不能这样做,你可以写一个试图解决它的方法:

Function FixRowFieldsQuoteIssue(parser As TextFieldParser) As String()
    If Not parser.HasFieldsEnclosedInQuotes Then Return Nothing 'method fixes quote issue

    Dim errorLine As String = parser.ErrorLine
    If String.IsNullOrWhiteSpace(errorLine) Then Return Nothing ' empty line no quote issue

    errorLine = errorLine.Trim()
    If Not errorLine.StartsWith("""") Then Return Nothing ' must start with quote otherwise fix not supported

    Dim lineFields As New List(Of String)
    Dim insideField As Boolean = False
    Dim currentField As New List(Of Char)

    For i As Int32 = 0 To errorLine.Length - 1
        Dim c As Char = errorLine(i)
        Dim isDelimiter = parser.Delimiters.Contains(c)
        Dim isQuote = c = """"

        If insideField Then
            If isQuote Then
                If i = errorLine.Length - 1 OrElse 
                    parser.Delimiters.Contains(errorLine(i + 1)) Then
                    ' delimiter follows, this is a valid end field quote
                    ' can be improved by skipping spaces until delimiter
                    insideField = False
                    lineFields.Add(String.Concat(currentField))
                    currentField = New List(Of Char)
                Else
                    ' next char not a delimiter, this is invalid
                    ' add this quote to regular field-chars to fix it
                    currentField.Add(c)
                End If
            Else
                ' regular char, add it to the current field chars
                currentField.Add(c)
            End If
        ElseIf isQuote Then
            insideField = True
        End If
    Next

    Return lineFields.ToArray()
End Function

Catch

调用它
Dim allRowFields As New List(Of String())

Using parser As New FileIO.TextFieldParser("filePath")
    parser.Delimiters = New String() {","}
    parser.HasFieldsEnclosedInQuotes = True
    parser.TrimWhiteSpace = False

    While Not parser.EndOfData
        Try
            Dim currentRowFields As String() = parser.ReadFields()
            allRowFields.Add(currentRowFields)
        Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
            Dim fixedFields As String() = FixRowFieldsQuoteIssue(parser)
            If fixedFields IsNot Nothing Then
                allRowFields.Add(fixedFields)
            Else
                MsgBox("Line " & ex.Message & "Is Not valid And will be skipped.")
            End If
        End Try
    End While
End Using

答案 1 :(得分:0)

由于CSV数据格式不正确,您需要手动解析数据。幸运的是,因为你只有两个字段而且第一个字段不包含无效格式,你可以通过简单地获取逗号的第一个实例的索引并将这些字段分开来实现。

这是一个简单的例子:

Private Function Parse_CSV(ByVal csv As String) As DataTable
  'Create a new instance of a DataTable and create the two columns
  Dim dt As DataTable = New DataTable("CSV")
  dt.Columns.AddRange({New DataColumn("Column1"), New DataColumn("Column2")})

  'Placeholder variable for the separator
  Dim separator As Integer = -1

  'Iterate through each line in the data
  For Each line As String In csv.Split({Environment.NewLine}, StringSplitOptions.None)
    'Get the first instance of a comma
    separator = line.IndexOf(","c)

    'Check to make sure the data has two fields
    If separator = -1 Then
      Throw New MissingFieldException("The current line is missing a separator: " & line)
    ElseIf separator = line.Length - 1 Then
      Throw New MissingFieldException("The separator cannot appear at the end of the line, this is occuring at: " & line)
    Else
      'Add the two fields to the datatable(getting rid of the starting and ending quotes)
      dt.Rows.Add({line.Substring(0, separator), line.Substring(separator + 2)})
    End If
  Next

  'Return the data
  Return dt
End Function

小提琴:Live Demo

答案 2 :(得分:0)

这会将您的CSV拆分为2列,并在内部留下引号。 将xline替换为CSV的1行

Dim xdata As New List(Of KeyValuePair(Of String, String))
Dim xline As String = """Name3"", ""A test, which ""fails"" all the time"""
Dim FirstCol As Integer = Strings.InStr(xline, ",")
xdata.Add(New KeyValuePair(Of String, String)(Strings.Left(xline, FirstCol - 1).Replace(Chr(34), ""), Strings.Mid(xline, FirstCol + 2).Remove(0, 1).Remove(Strings.Mid(xline, FirstCol + 2).Remove(0, 1).Length - 1, 1)))

答案 3 :(得分:0)

您可以尝试使用Cinchoo ETL - 一个开源库来读取和写入CSV文件。

您可以通过多种方式解析文件

方法1:指定列名

using (var parser = new ChoCSVReader("NestedQuotes.csv")
    .WithFields("name", "desc")
    )
{
    foreach (dynamic x in parser)
        Console.WriteLine(x.name + "-" + x.desc);
}

方法2:按索引访问(不指定列名)

using (var parser = new ChoCSVReader("NestedQuotes.csv"))
{
    foreach (dynamic x in parser)
        Console.WriteLine(x[0] + "-" + x[1]);
}

希望它有所帮助。

有关更多帮助,请阅读以下codeproject文章。 https://www.codeproject.com/Articles/1145337/Cinchoo-ETL-CSV-Reader