使用正则表达式将字符串拆分为三列

时间:2014-09-25 11:17:52

标签: c# regex

我的字符串如下:

rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines

我的目标是将此字符串拆分为三列,以便将其放入数据库表中:

    -------------------------------------------------------------
   | COL1     |      COL 2        | COL 3                        |
    -------------------------------------------------------------
   | rta_geo5 | 09/24/14 15:10:38 |Reset_count = 6               |
    ------------------------------------------------------------- 
   |rta_geo5  | 09/24/14 15:10:38 |restarting                    |
    -------------------------------------------------------------
   | rta_geo5 | 09/24/14 15:10:38 |memory allocation: 3500 lines |
    -------------------------------------------------------------

可以使用以下声明吗?

string[] substrings = Regex.Split(input, pattern);

我只需要正确的正则表达式。

4 个答案:

答案 0 :(得分:1)

您可以使用named groups in regex

代替拆分

图案:

Regex ptrn = new Regex(@"^(?<col1>[^:]+):\s+(?<col2>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\s+-\s+(?<col3>[^\r\n]+?)\s*$", 
    RegexOptions.ExplicitCapture|RegexOptions.IgnoreCase|RegexOptions.Multiline);

用法:

string s = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";

var matches = ptrn.Matches(s);

访问:

matches.OfType<Match>()
     .Select(match => new string[] 
      { 
         match.Groups["col1"].Value, 
         match.Groups["col2"].Value,
         match.Groups["col3"].Value 
      })
     .ToList().ForEach(a=>System.Console.WriteLine(string.Join("\t|\t",a)));

或者:

foreach (Match match in matches)
        {
            string col1 = match.Groups["col1"].Value;
            string col2 = match.Groups["col2"].Value;
            string col3 = match.Groups["col3"].Value;
            System.Console.WriteLine(col1 + "\t|\t" + col2 + "\t|\t" + col3);
        }

输出:

rta_geo5    |   09/24/14 15:10:38   |   Reset_count = 6
rta_geo5    |   09/24/14 15:10:38   |   restarting
rta_geo5    |   09/24/14 15:10:38   |   memory allocation: 3500 lines

答案 1 :(得分:0)

分开:

(?:(?<=geo5):\s|(?<=\d{2}:\d{2}:\d{2})\s-\s)

演示:

http://regex101.com/r/xF7iD7/1

答案 2 :(得分:0)

我不会为此使用正则表达式(或String.Split),而是一个解析每一行的循环。我还会使用自定义类映射到数据库表,以提高可重用性和可重用性。

班级(简化):

public class Data
{
    public string Token1 { get; set; } // use a meaningful name
    public string Token2 { get; set; } // use a meaningful name
    public DateTime Date { get; set; } // use a meaningful name

    public override string ToString()
    {
        return string.Format("Token1:[{0}] Date:[{1}] Token2:[{2}]", 
            Token1,
            Date.ToString("MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture), 
            Token2);
    }
}

您的示例字符串:

string data = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6
rta_geo5: 09/24/14 15:10:38 - restarting
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";

现在您可以使用普通字符串方法将此循环解析为List<Data>

string[] lines = data.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
List<Data> allData = new List<Data>();
foreach (string line in lines)
{
    string token1 = null, token2 = null;
    DateTime dt;
    int firstColonIndex = line.IndexOf(": ");
    if (firstColonIndex >= 0)
    {
        token1 = line.Remove(firstColonIndex);
        firstColonIndex += 2; // start next search after first token to find DateTime
        int indexOfMinus = line.IndexOf(" - ", firstColonIndex);
        if (indexOfMinus >= 0)
        {
            string datePart = line.Substring(firstColonIndex, indexOfMinus - firstColonIndex);
            if (DateTime.TryParseExact(datePart, "MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dt))
            {
                indexOfMinus += 3;  // start next search after DateTime to get last token
                token2 = line.Substring(indexOfMinus);
                Data d = new Data { Token1 = token1, Token2 = token2, Date = dt };
                allData.Add(d);
            }
        }
    }
}

测试:

foreach (Data d in allData)
    Console.WriteLine(d.ToString());

Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[Reset_count = 6]
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[restarting]
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[memory allocation: 3500 lines]

这种方法比其他方法更冗长,但更有效/可维护。它还允许记录异常或使用其他方法来解析它。

答案 3 :(得分:0)

好吧,考虑过这个问题,不确定这是100%,但请尝试:

(rta_geo5): (.*?) - (.*)

应根据需要将其分为3组。但是,它假定前导标识符始终为(rta_geo5)

[edit] - 我注意到其中一个答案引用了在线正则表达式服务,因此您可以尝试在我的内部使用我的正则表达式:http://regex101.com/r/xF7iD7/1(对不起,没有还有一个帐户 - 但现在会创建) - 也就是说,关于rta_geo5块,您当然可以完全原生

(.*): (.*) - (.*)

看看它是如何工作的