除了满足条件的最后一项之外的所有项目?

时间:2017-03-30 16:40:48

标签: c# linq generic-collections

我的具体要求是我有一个IEnumerable<IEnumerable<string>>,我想“获取”外部枚举中的所有项目,除了任何“空”尾随项,其中“空”表示所有字符串都为空/空或内部枚举是空的。请注意,我想保留在最后一个非空项之前出现的任何空项。例如:

Item 1: a, b, c
Item 2: (nothing)
Item 3: a, f, g, h
Item 4: (nothing)
Item 5: (nothing)

我想保留1-3项,但修剪项目4和5。

在更一般的意义上,我有一个项目的枚举,我想要修剪任何满足条件的尾随项目,这些项目出现在最后一个不满足条件的项目后面。

为了选择合适的解决方案,我可以补充一点,外部枚举通常包含几百到几十万个项目,而内部枚举每个只包含几个项目。我可能只需要修剪几个空项目。

我当前的解决方案将所有外部项放在一个列表中(在用.Select(...)转换它们之后),然后在循环中继续删除最后一项(如果它是空的),直到找到非空项。

2 个答案:

答案 0 :(得分:4)

没有标准的高效LINQ解决方案。我会使用自定义扩展程序&#34; LINQ like&#34;像这样的方法:

public static class EnumerableExtensions
{
    public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
    {
        var skipBuffer = new List<T>();
        foreach (var item in source)
        {
            if (predicate(item))
                skipBuffer.Add(item);
            else
            {
                if (skipBuffer.Count > 0)
                {
                    foreach (var skipped in skipBuffer)
                        yield return skipped;
                    skipBuffer.Clear();
                }
                yield return item;
            }
        }
    }
}

它需要额外的空间来缓冲满足跳过谓词的最长项目序列,而LINQ Reverse方法必须缓冲整个输入序列。

用法将是:

var result = input.SkipLastWhile(e => !e.Any());

答案 1 :(得分:2)

这个怎么样?

var trimmedItems = items.Reverse().SkipWhile(e => !e.Any()).Reverse();

如果你有非常大的数据集,这将需要比你想出的其他解决方案更多的内存,但它很容易阅读和遵循。

juharr的建议只是稍微复杂一点,如果你有大量的项目,表现要好得多:

var trimmedItems = items.Take(items.Reverse().TakeWhile(e => !e.Any()).Count());

这是我使用的基准代码。它意味着在LINQPad中运行,但您可以更改result.Dump();调用以将结果输出到控制台或其他内容(如果您愿意)。另外,我使用IEnumerable<string>代替IEnumerable<IEnumerable<string>>只是为了简单起见,但这不会影响算法的性能:

/* This is a benchmarking template I use in LINQPad when I want to do a
 * quick performance test. Just give it a couple of actions to test and
 * it will give you a pretty good idea of how long they take compared
 * to one another. It's not perfect: You can expect a 3% error margin
 * under ideal circumstances. But if you're not going to improve
 * performance by more than 3%, you probably don't care anyway.*/
void Main()
{
    // Enter setup code here
    var items = new[] { "a, b, c",
    "",
    "a, f, g, h",
    "",
    ""}.AsEnumerable();
    var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items);

    var actions = new[]
    {
        new TimedAction("Control", () =>
        {
            // ToList() is the one thing that all of these have to do.
            manyitems.ToList();
        }),
        new TimedAction("Reverse().SkipWhile().Reverse()", () =>
        {
            manyitems.Reverse().SkipWhile(e => !e.Any()).Reverse().ToList();
        }),
        new TimedAction("Take(Reverse().TakeWhile().Count())", () =>
        {
            manyitems.Take(manyitems.Reverse().TakeWhile(e => !e.Any()).Count()).ToList();
        }),
        new TimedAction("SkipLastWhile", () =>
        {
            manyitems.SkipLastWhile(e => !e.Any()).ToList();
        }),
        // Add tests as desired
    };
    const int TimesToRun = 100; // Tweak this as necessary
    TimeActions(TimesToRun, actions);
}

public static class EnumerableExtensions
{
    public static IEnumerable<T> SkipLastWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
    {
        var skipBuffer = new List<T>();
        foreach (var item in source)
        {
            if (predicate(item))
                skipBuffer.Add(item);
            else
            {
                foreach (var skipped in skipBuffer)
                    yield return skipped;
                skipBuffer.Clear();
                yield return item;
            }
        }
    }
}

#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
    Stopwatch s = new Stopwatch();
    int length = actions.Length;
    var results = new ActionResult[actions.Length];
    // Perform the actions in their initial order.
    for (int i = 0; i < length; i++)
    {
        var action = actions[i];
        var result = results[i] = new ActionResult { Message = action.Message };
        // Do a dry run to get things ramped up/cached
        result.DryRun1 = s.Time(action.Action, 10);
        result.FullRun1 = s.Time(action.Action, iterations);
    }
    // Perform the actions in reverse order.
    for (int i = length - 1; i >= 0; i--)
    {
        var action = actions[i];
        var result = results[i];
        // Do a dry run to get things ramped up/cached
        result.DryRun2 = s.Time(action.Action, 10);
        result.FullRun2 = s.Time(action.Action, iterations);
    }
    results.Dump();
}

public class ActionResult
{
    public string Message { get; set; }
    public double DryRun1 { get; set; }
    public double DryRun2 { get; set; }
    public double FullRun1 { get; set; }
    public double FullRun2 { get; set; }
}

public class TimedAction
{
    public TimedAction(string message, Action action)
    {
        Message = message;
        Action = action;
    }
    public string Message { get; private set; }
    public Action Action { get; private set; }
}

public static class StopwatchExtensions
{
    public static double Time(this Stopwatch sw, Action action, int iterations)
    {
        sw.Restart();
        for (int i = 0; i < iterations; i++)
        {
            action();
        }
        sw.Stop();

        return sw.Elapsed.TotalMilliseconds;
    }
}
#endregion

结果:

Benchmark Results

如果您的IEnumerable由List支持,则基准测试结果更为深远,因为LINQ可以对Reverse()进行一些额外的优化:

var manyitems = Enumerable.Range(1, 10000).SelectMany(i => items).ToList().AsEnumerable();

Benchmark Results with List-backed IEnumerable

相关问题