没有第三方库的文本解析应用程序c#

时间:2019-01-16 12:44:47

标签: c# .net

例如,一行:

name, tax, company.

要分离它们,我需要使用split方法。

  string[] text = File.ReadAllLines("file.csv", Encoding.Default);
   foreach (string line in text)
    {
     string[] words = line.Split(',');
     foreach (string word in words)
      {
       Console.WriteLine(word);
      }
     }
   Console.ReadKey();

但是指出了如何用引号引起来的逗号分隔:

name, tax, "company, Ariel";<br>
"name, surname", tax, company;<br>  and so on.

要使它像这样:

  Max | 12.3 | company, Ariel
Alex, Smith| 13.1 | Oriflame

有必要考虑到输入数据将不会总是处于理想格式(如示例中所示)。也就是说,一行中可能有3个引号,或者一个字符串中没有逗号。该程序在任何情况下都不应失败。如果无法解析,请发出有关它的消息。

3 个答案:

答案 0 :(得分:0)

首先使用双引号将其分隔。然后在第一个字符串上使用逗号分割。

答案 1 :(得分:0)

您可以使用TextFieldParser中的Microsoft.VisualBasic.FileIO

var list = new List<Data>();
var isHeader=true;
using (TextFieldParser parser = new TextFieldParser(filePath))
{

        parser.Delimiters = new string[] { "," };
        while (true)
        {
            string[] parts = parser.ReadFields();
            if(isHeader)
            {
                isHeader = false; 
                continue;
            }
            if (parts == null)
                break;

            list.Add(new Data
                {
                    People = parts[0],
                    Tax = Double.Parse(parts[1]),
                    Company = parts[2]
                });

        }
 }

数据定义为

public class Data
{
    public string People{get;set;}
    public double Tax{get;set;}
    public string Company{get;set;}
}

请注意,您需要包括Microsoft.VisualBasic.FileIO

示例数据

Name,Tax,Company
Max,12.3,"company, Ariel"
Ariel,13.1,"company, Oriflame"

输出

enter image description here

答案 2 :(得分:0)

下面的一些代码可能会有所帮助,虽然不是最有效,但我使用它来“查看”如果特定行出现问题,解析的过程。

string[] text = File.ReadAllLines("file.csv", Encoding.Default);
string[] datArr;
string tmpStr;
foreach (string line in text)
{
  ParseString(line, ",", "!@@@@!", out datArr, out tmpStr)
  foreach(string s in datArr)
  {
    Console.WriteLine(s);
  }
}
Console.ReadKey();

private static void ParseString(string inputString, string origDelim, string newDelim, out string[] retArr, out string retStr)
{
    string tmpStr = inputString;
    retArr = new[] {""};
    retStr = "";

    if (!string.IsNullOrWhiteSpace(tmpStr))
    {
        //If there is only one Quote character in the line, ignore/remove it:
        if (tmpStr.Count(f => f == '"') == 1)
            tmpStr = tmpStr.Replace("\"", "");

        string[] tmpArr = tmpStr.Split(new[] {origDelim}, StringSplitOptions.None);
        var inQuote = 0;

        StringBuilder lineToWrite = new StringBuilder();
        foreach (var s in tmpArr)
        {
            if (s.Contains("\""))
                inQuote++;

            switch (inQuote)
            {
                case 1:
                    //Begin quoted text
                    lineToWrite.Append(lineToWrite.Length > 0
                        ? newDelim + s.Replace("\"", "")
                        : s.Replace("\"", ""));

                    if (s.Length > 4 && s.Substring(0, 2) == "\"\"" && s.Substring(s.Length - 2, 2) != "\"\"")
                    {
                        //if string has two quotes at the beginning and is > 4 characters and the last two characters are NOT quotes,
                        //inquote needs to be incremented.
                        inQuote++;
                    }
                    else if ((s.Substring(0, 1) == "\"" && s.Substring(s.Length - 1, 1) == "\"" &&
                              s.Length > 1) || (s.Count(x => x == '\"') % 2 == 0))
                    {
                        //if string has more than one character and both begins and ends with a quote, then it's ok and counter should be reset.
                        //if string has an EVEN number of quotes, it should be ok and counter should be reset.
                        inQuote = 0;
                    }
                    else
                    {
                        inQuote++;
                    }

                    break;
                case 2:
                    //text between the quotes
                    //If we are here the origDelim value was found between the quotes
                    //include origDelim so there is no data loss.
                    //Example quoted text: "Dr. Mario, Sr, MD";
                    //      ", Sr" would be handled here
                    //      ", MD" would be handled in case 3 end of quoted text.
                    lineToWrite.Append(origDelim + s);
                    break;
                case 3:
                    //End quoted text
                    //If we are here the origDelim value was found between the quotes
                    //and we are at the end of the quoted text
                    //include origDelim so there is no data loss.
                    //Example quoted text: "Dr. Mario, MD"
                    //      ", MD" would be handled here.
                    lineToWrite.Append(origDelim + s.Replace("\"", ""));
                    inQuote = 0;
                    break;
                default:
                    lineToWrite.Append(lineToWrite.Length > 0 ? newDelim + s : s);
                    break;

            }

        }

        if (lineToWrite.Length > 0)
        {
                retStr = lineToWrite.ToString();
                retArr = tmpLn.Split(new[] {newDelim}, StringSplitOptions.None);

        }

    }
}