根据列表中的一些对象属性删除重复项

时间:2017-10-17 22:27:44

标签: java list filter

我有一个List集合,其中每个Metric包含几个属性,例如:metricName,namespace,fleet,type,component,firstSeenTime,lastSeenTime等。此列表中有重复项,除了firstSeenTime和lastSeenTime之外,所有属性都相同。我正在寻找一种优雅的方法来过滤此列表,并且只有在存在此类重复项时才返回最近的lastSeenTime指标。

比这更好的东西:

private List<Metric> processResults(List<Metric metrics) {
    List<Metric> results = new ArrayList<>();

    for (Metric incomingMetric: metrics) {

        // We need to implement "contains" below so that only properties
        // other than the two dates are checked.
        if (results.contains(incomingMetric) { 
            int index = results.indexOf(incomingMetric);
            Metric existing = results.get(index); 
            if (incomingMetric.getLastSeen().after(existing.getLastSeen())) {
                results.set(index, metricName);
            } else {
                // do nothing, metric in results is already the latest 
            }
        } else {
            // add incomingMetric to results for the first time
            results.add(incomingMetric);
        }
    }

    return results;
}

results.contains检查是通过迭代结果中的所有度量标准并检查每个对象是否与除两个日期之外的属性匹配来完成的。

对于优雅和性能,这可能是一个更好的方法吗?

3 个答案:

答案 0 :(得分:1)

我不确定你是如何产生List<Metric>的。但是,如果您可以保留Map<String, Metric>而不是该列表,则可以尝试以下方法。

因此,此地图的关键字将是您需要比较的所有这些值的组合。 (日期属性除外。)

  

密钥:“{metricName} $ {type} $ .....”

为此,您可以使用getter在Metric对象中维护另一个属性。当您调用getter时,它将返回密钥。

然后在放入地图之前检查钥匙是否存在。如果它存在,则在该映射中获取该键的存储度量标准,并进行日期比较以查找最新的Metric对象。如果它是最新的用新对象替换地图的存储对象。

PS:执行两种情况的执行时间比较。所以你会找到最好的方法。

答案 1 :(得分:1)

在java中,最优雅的比较方法是Comparator界面。您应该使用以下内容删除重复项:

public List<Metric> removeDuplicates(List<Metric> metrics) {

    List<Metric> copy = new ArrayList<>(metrics);
    //first sort the metrics list from most recent to older
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(new Comparator<Metric>() {

        @Override
        public int compare(Metric o1, Metric o2) {
            int result = 0;
            // compare the two metrics given your rules
            return result;
        }
    });

    for(Metric metric : copy) {
        set.add(metric);
    }

    List<Metric> result = Arrays.asList(set.toArray());
    return result;
 }

class SortComparator implements Comparator<Metric> {

    @Override
    public int compare(Metric o1, Metric o2) {
        int result = 0;
        if(o2.getLastSeenTime() != null && o1.getLastSeenTime() != null) {
            result = o2.getLastSeenTime().compareTo(o1.getLastSeenTime());
        }
        return result;
    }

}

这种方法的优势在于您可以编写一系列比较器并提供Factory以在运行时选择比较指标的最佳方法,并在运行时条件中删除或不删除实例:

public void removeDuplicates(List<Metric> metrics, Comparator<Metric> comparator) {

    List<Metric> copy = new ArrayList<>(metrics);
    Collections.sort(copy, new SortComparator());

    Set<Metric> set = new TreeSet<Metric>(comparator);
    for(Metric metric : copy) {
        set.add(metric);
    }
    List<Object> result = Arrays.asList(set.toArray());
    return result;
 }

答案 2 :(得分:0)

感谢您的回答。我采用了地图方法,因为它不会产生额外的种类和副本。

#include <iostream>
#include <thread>
#include <future>
#include <memory>
#include <functional>
#include <queue>
#include <random>
#include <chrono>
#include <mutex>

typedef std::packaged_task<void()> task;
typedef std::shared_ptr<task> task_ptr;
typedef std::lock_guard<std::mutex> glock;
typedef std::unique_lock<std::mutex> ulock;
typedef unsigned int uint;
typedef unsigned long ulong;

// For sync'd std::cout
std::mutex cout_mtx;

// For task scheduling
std::mutex task_mtx;
std::condition_variable task_cv;

// Prevents main() from exiting
// before the last worker exits
std::condition_variable kill_switch;

// RNG engine
std::mt19937_64 engine;

// Random sleep (in ms)
std::uniform_int_distribution<int> sleep(100, 10000);

// Task queue
std::queue<task_ptr> task_queue;

static uint tasks = 0;
static std::thread::id main_thread_id;
static uint workers = 0;

template<typename T>
class Task
{
    // Not sure if this needs
    // to be std::atomic.
    // A simple bool might suffice.
    std::atomic<bool> working;
    task_ptr tp;

public:

    Task(task_ptr _tp)
        :
          working(false),
          tp(_tp)
    {}

    inline T get()
    {
        working.store(true);
        (*tp)();
        return tp->get_future().get();
    }

    inline bool is_working()
    {
        return working.load();
    }
};

auto task_factory()
{
    return std::make_shared<task>([&]
    {
        uint task_id(0);
        {
            glock lk(cout_mtx);
            task_id = ++tasks;
            if (std::this_thread::get_id() == main_thread_id)
            {
                std::cout << "Executing task " << task_id << " in main thread.\n";
            }
            else
            {
                std::cout << "Executing task " << task_id << " in worker " << std::this_thread::get_id() << ".\n";
            }
        }
        std::this_thread::sleep_for(std::chrono::milliseconds(sleep(engine)));
        {
            glock lk(cout_mtx);
            std::cout << "\tTask " << task_id << " completed.\n";
        }
    });
}

auto func_factory()
{
    return [&]
    {

        while(true)
        {
            ulock lk(task_mtx);
            task_cv.wait(lk, [&]{ return !task_queue.empty(); });
            Task<void> task(task_queue.front());
            task_queue.pop();

            // Check if the task has been assigned
            if (!task.is_working())
            {
                // Sleep for a while and check again.
                // If it is still not assigned after 1 s,
                // start working on it.
                // You can also place these checks
                // directly in Task::get()
                {
                    glock lk(cout_mtx);
                    std::cout << "\tTask not started, waiting 1 s...\n";
                }
                lk.unlock();
                std::this_thread::sleep_for(std::chrono::milliseconds(1000));
                lk.lock();
                if (!task.is_working())
                {
                    {
                        glock lk(cout_mtx);
                        std::cout << "\tTask not started after 1 s, commencing work...\n";
                    }
                    lk.unlock();
                    task.get();
                    lk.lock();
                }

                if (task_queue.empty())
                {
                    break;
                }
            }
        }
    };
}

int main()
{
    engine.seed(std::chrono::high_resolution_clock::now().time_since_epoch().count());

    std::cout << "Main thread: " << std::this_thread::get_id() << "\n";
    main_thread_id = std::this_thread::get_id();

    for (int i = 0; i < 50; ++i)
    {
        task_queue.push(task_factory());
    }

    std::cout << "Tasks enqueued: " << task_queue.size() << "\n";

    // Spawn 5 workers
    for (int i = 0; i < 5; ++i)
    {
        std::thread([&]
        {
            {
                ulock lk(task_mtx);
                ++workers;
                task_cv.wait(lk);
                {
                    glock lk(cout_mtx);
                    std::cout << "\tWorker started\n";
                }
            }

            auto fn(func_factory());
            fn();

            ulock lk(task_mtx);
            --workers;
            if (workers == 0)
            {
                kill_switch.notify_all();
            }

        }).detach();
    }

    // Notify all workers to start processing the queue
    task_cv.notify_all();

    // This is the important bit:
    // Tasks can be executed by the main thread
    // as well as by the workers.
    // In fact, any thread can grab a task from the queue,
    // check if it is running and start working
    // on it if it is not.
    auto fn(func_factory());
    fn();

    ulock lk(task_mtx);
    if (workers > 0)
    {
        kill_switch.wait(lk);
    }

    return 0;
}