查找字符串数组的哪个单元格包含文本的最佳方法

时间:2009-05-13 03:21:20

标签: c# arrays methods find gedcom

我有来自Gedcom(HereHere)文件的文本块

文字是扁平的,基本上分为“节点”

我正在分割\ r \ n字符上的每个节点,从而将其细分为每个部分(“行”的数量可能会有所不同)

我知道0地址永远是ID,但之后一切都可以在任何地方,所以我想测试数组的每个Cell,看看它是否包含正确的标签供我进行处理

两个节点看起来像

的示例

0 @ind23815@ INDI <<<<<<<<<<<<<<<<<<< Start of node 1
1 NAME Lawrence /Hucstepe/
2 DISPLAY Lawrence Hucstepe
2 GIVN Lawrence
2 SURN Hucstepe
1 POSITION -850,-210
2 BOUNDARY_RECT (-887,-177),(-813,-257)
1 SEX M
1 BIRT 
2 DATE 1521
1 DEAT Y
2 DATE 1559
1 NOTE     * Born: Abt 1521, Kent, England
2 CONT     * Marriage: Jane Pope 17 Aug 1546, Kent, England
2 CONT     * Died: Bef 1559, Kent, England
2 CONT 
1 FAMS @fam08318@
0 @ind23816@ INDI  <<<<<<<<<<<<<<<<<<<<<<< Start of Node 2
1 NAME Jane /Pope/
2 DISPLAY Jane Pope
2 GIVN Jane
2 SURN Pope
1 POSITION -750,-210
2 BOUNDARY_RECT (-787,-177),(-713,-257)
1 SEX F
1 BIRT 
2 DATE 1525
1 DEAT Y
2 DATE 1609
1 NOTE     * Born: Abt 1525, Tenterden, Kent, England
2 CONT     * Marriage: Lawrence Hucstepe 17 Aug 1546, Kent, England
2 CONT     * Died: 23 Oct 1609
2 CONT 
1 FAMS @fam08318@
0 @ind23817@ INDI  <<<<<<<<<<< start of Node 3

So a when im done i have an array that looks like

address , string
0 = "1 NAME Lawrence /Hucstepe/"
1 = "2 DISPLAY Lawrence Hucstepe"
2 = "2 GIVN Lawrence"
3 = "2 SURN Hucstepe"
4 = "1 POSITION -850,-210"
5 = "2 BOUNDARY_RECT (-887,-177),(-813,-257)"
6 = "1 SEX M"
7 = "1 BIRT "
8 = "1 FAMS @fam08318@"

So my question is what is the best way to search the above array to see which Cell has the SEX tag or the NAME Tag or the FAMS Tag

this is the code i have

private int FindIndexinArray(string[] Arr, string search)
{
    int Val = -1;
    for (int i = 0; i < Arr.Length; i++)
    {
        if (Arr[i].Contains(search))
        {
            Val = i;
        }
    }
    return Val;
}

But it seems inefficient because i end up calling it twice to make sure it doesnt return a -1

Like so

if (FindIndexinArray(SubNode, "1 BIRT ") != -1) { // add birthday to Struct I.BirthDay = SubNode[FindIndexinArray(SubNode, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim(); }
抱歉这是一个较长的帖子,但希望你们有一些专家建议

4 个答案:

答案 0 :(得分:3)

简单regular expression怎么样?

^(\d)\s=\s\"\d\s(SEX|BIRT|FAMS){1}.*$

第一组捕获地址,第二组捕获标记。

此外,将所有数组项目转储到字符串中可能会更快,并且可以立即对整个数组执行正则表达式。

答案 1 :(得分:3)

可以使用Array类的静态方法FindAll: 它会返回字符串本身,如果有效..

string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
Array.FindAll(test, item => item.Contains("Sex") || item.Contains("Drugs") || item.Contains("Computer"));

=&gt;表示lamda表达式。基本上没有具体实现的方法。 如果lamda为你提供了小兵,你也可以这样做。

//Declare a method 

     private bool HasTag(string s)
     {
         return s.Contains("Sex") || s.Contains("Drugs") || s.Contains("Computer");
     }

     string[] test = { "Sex", "Love", "Rock and Roll", "Drugs", "Computer"};
     Array.FindAll(test, HasTag);

答案 2 :(得分:0)

  

“但它似乎效率低下,因为我最终调用它两次以确保它不会返回-1”

在测试之前将返回的值复制到变量以防止多次调用。

IndexResults = FindIndexinArray(SubNode, "1 BIRT ")
if (IndexResults != -1)
        {
            // add birthday to Struct 
            I.BirthDay = SubNode[IndexResults].Replace("2 DATE ", "").Trim();
        }

答案 3 :(得分:0)

如果您只对第一场比赛感兴趣,一旦找到匹配项,FindIndexinArray方法中的for循环就会中断。