如何使用IEnumerable.GroupBy通过比较元素之间的多个属性?

时间:2019-05-13 14:55:26

标签: c# linq

如何将“相邻”网站分组:

给出数据:

List<Site> sites = new List<Site> {
    new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
    new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
    new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
    new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
    new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
    new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
};

我想要结果:

[
    [
        Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
        Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m }
    ],
    [
        Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m }
    ],
    [
        Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m }
    ],
    [
        Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
        Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m }
    ]
]

我尝试使用具有自定义比较器功能的GroupBy来检查routeIds匹配,并且第一个站点的终点里程等于下一个站点的起点里程。我的HashKey函数只是签出routeId,所以路由内的所有站点都将被合并在一起,但是我认为比较器会做出一个假设,例如A = B,B = C,然后A = C,因此C不会与A分组,B,C,因为在我的邻接案例中,A将不等于C。

4 个答案:

答案 0 :(得分:3)

首先,让Site类成为(用于调试/演示)

public class Site {
  public Site() { }

  public string RouteId;
  public Decimal StartMilepost;
  public Decimal EndMilepost;

  public override string ToString() => $"{RouteId} {StartMilepost}..{EndMilepost}";
}

好吧,正如您所看到的,我们必须打破规则:平等必须是可传递的,即,只要

A equals B
B equals C

然后

A equals C

在您的示例中情况并非如此。但是,如果我们按StartMilepost对站点进行排序,则从技术上讲,我们可以像这样实现IEqualityComparer<Site>

public class MySiteEqualityComparer : IEqualityComparer<Site> {
  public bool Equals(Site x, Site y) {
    if (ReferenceEquals(x, y))
      return true;
    else if (null == x || null == y)
      return false;
    else if (x.RouteId != y.RouteId)
      return false;
    else if (x.StartMilepost <= y.StartMilepost && x.EndMilepost >= y.StartMilepost)
      return true;
    else if (y.StartMilepost <= x.StartMilepost && y.EndMilepost >= x.StartMilepost)
      return true;

    return false;
  }

  public int GetHashCode(Site obj) {
    return obj == null
      ? 0
      : obj.RouteId == null
         ? 0
         : obj.RouteId.GetHashCode();
  }
}

然后照常GroupBy;请注意,OrderBy是必需的,因为这里比较的顺序。假设我们有

A = {RouteId="X", StartMilepost=0.00m, EndMilepost=1.00m}
B = {RouteId="X", StartMilepost=1.00m, EndMilepost=2.00m}
C = {RouteId="X", StartMilepost=2.00m, EndMilepost=3.00m}

这里A == BB == C(因此,在A, B, C的情况下,所有项目都在同一组中),但是A != C(因此在A, C, B中)以3组结束)

代码:

 List<Site> sites = new List<Site> {
    new Site { RouteId="A", StartMilepost=0.00m, EndMilepost=1.00m },
    new Site { RouteId="A", StartMilepost=1.00m, EndMilepost=2.00m },
    new Site { RouteId="A", StartMilepost=5.00m, EndMilepost=7.00m },
    new Site { RouteId="B", StartMilepost=3.00m, EndMilepost=5.00m },
    new Site { RouteId="B", StartMilepost=11.00m, EndMilepost=13.00m },
    new Site { RouteId="B", StartMilepost=13.00m, EndMilepost=14.00m },
  };

  var result = sites
    .GroupBy(item => item.RouteId)
    .Select(group => group
        // Required Here, since MySiteEqualityComparer breaks the rules
       .OrderBy(item => item.StartMilepost)  
       .GroupBy(item => item, new MySiteEqualityComparer())
       .ToArray())
    .ToArray();

  // Let's have a look
  var report = string.Join(Environment.NewLine, result
    .Select(group => string.Join(Environment.NewLine, 
                                 group.Select(g => string.Join("; ", g)))));

  Console.Write(report);

结果:

A 0.00..1.00; A 1.00..2.00
A 5.00..7.00
B 3.00..5.00
B 11.00..13.00; B 13.00..14.00

答案 1 :(得分:2)

这里有两个实现,其中Site的顺序无关紧要。您可以使用LINQ Aggregate函数:

return sites.GroupBy(x => x.RouteId)
            .SelectMany(x =>
            {
                var groupedSites = new List<List<Site>>();
                var aggs = x.Aggregate(new List<Site>(), (contiguous, next) =>
                {
                    if (contiguous.Count == 0 || contiguous.Any(y => y.EndMilepost == next.StartMilepost))
                    {
                        contiguous.Add(next);
                    }
                    else if (groupedSites.Any(y => y.Any(z => z.EndMilepost == next.StartMilepost)))
                    {
                        var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == next.StartMilepost));
                        var el = groupedSites.ElementAt(groupMatchIndex);
                        el.Add(next);
                        groupedSites[groupMatchIndex] = el;
                    }
                    else
                    {
                        groupedSites.Add(contiguous);
                        contiguous = new List<Site>();
                        contiguous.Add(next);
                    }
                    return contiguous;
                }, final => { groupedSites.Add(final); return final; });
                return groupedSites;
            });

或者,仅使用foreach

return sites.GroupBy(x => x.RouteId)
            .SelectMany(x =>
            {
                var groupedSites = new List<List<Site>>();
                var aggList = new List<Site>();
                foreach (var item in x)
                {
                    if (aggList.Count == 0 || aggList.Any(y => y.EndMilepost == item.StartMilepost))
                    {
                        aggList.Add(item);
                        continue;
                    }

                    var groupMatchIndex = groupedSites.FindIndex(y => y.Any(z => z.EndMilepost == item.StartMilepost));
                    if (groupMatchIndex > -1)
                    {
                        var el = groupedSites.ElementAt(groupMatchIndex);
                        el.Add(item);
                        groupedSites[groupMatchIndex] = el;
                        continue;
                    }

                    groupedSites.Add(aggList);
                    aggList = new List<Site>();
                    aggList.Add(item);
                }

                groupedSites.Add(aggList);
                return groupedSites;
            });

答案 2 :(得分:1)

这是用于对特定类别(Site)的列表进行分组的扩展方法。它是通过内部迭代器函数GetGroup来实现的,该函数生成一组具有相邻站点的组。在while循环中调用此函数以产生所有组。

public static IEnumerable<IEnumerable<Site>> GroupAdjacent(
    this IEnumerable<Site> source)
{
    var ordered = source
        .OrderBy(item => item.RouteId)
        .ThenBy(item => item.StartMilepost);
    IEnumerator<Site> enumerator;
    bool finished = false;
    Site current = null;
    using (enumerator = ordered.GetEnumerator())
    {
        while (!finished)
        {
            yield return GetGroup();
        }
    }

    IEnumerable<Site> GetGroup()
    {
        if (current != null) yield return current;
        while (enumerator.MoveNext())
        {
            var previous = current;
            current = enumerator.Current;
            if (previous != null)
            {
                if (current.RouteId != previous.RouteId) yield break;
                if (current.StartMilepost != previous.EndMilepost) yield break;
            }
            yield return current;
        }
        finished = true;
    }
}

用法示例:

var allGroups = sites.GroupAdjacent();
foreach (var group in allGroups)
{
    foreach (var item in group)
    {
        Console.WriteLine(item);
    }
    Console.WriteLine();
}

输出:

  

A 0,00..1,00
  1,00..2,00

     

5,00..7,00

     

B 3,00..5,00

     

B 11,00..13,00
  B 13,00..14,00

答案 3 :(得分:1)

令我感到惊讶的是,GroupBy在进行就地分组时没有Func<..., bool>的重载,而无需实现自定义类。

所以我创建了一个:

public static IEnumerable<IEnumerable<T>> GroupBy<T>(this IEnumerable<T> source, Func<T, T, bool> func)
{
    var items = new List<T>();
    foreach (var item in source)
    {
        if (items.Count != 0)
            if (!func(items[0], item))
            {
                yield return items;
                items = new List<T>();
            }
        items.Add(item);
    }
    if (items.Count != 0)
        yield return items;
}

用法:

var result = sites.GroupBy((x, y) => x.RouteId == y.RouteId &&
    x.StartMilepost <= y.EndMilepost && x.EndMilepost >= y.StartMilepost).ToList();

这应该产生想要的结果。

关于实施的几句话。在上述扩展方法中,您必须提供委托,如果将xy分组,则该委托应返回true。该方法是愚蠢的,将简单地按照相邻项目的顺序比较它们。您的输入是有序的,但是您可能需要先使用OrderBy / ThenBy,然后再将其用于其他用途。