同时对两个向量(键/值)进行排序的最快方法?

时间:2014-01-21 07:10:11

标签: c++ sorting c++11 vector stl

对于超级计算模拟的目的,我的结构包含两个大的(数十亿个元素)std::vector:一个std::vector个“键”(64位整数)和一个std::vector “值”。我无法使用std::map,因为在我考虑的模拟中,向量远比std::map更优化。此外,由于单独的向量提供了一些优化和缓存效率,我不能使用对向量。此外,我不能使用任何额外的记忆。

因此,考虑到这些构造,通过增加键的值来对两个向量进行排序的最优化方法是什么? (模板元编程和疯狂的编译时技巧是受欢迎的)

3 个答案:

答案 0 :(得分:3)

我的头脑中有两个想法:

  • 快速实施并将其应用于“关键”向量;但修改代码,以便每次在键向量上进行交换时,它也会对值向量执行相同的交换。

  • 或者,或许更符合C ++的精神,编写一个自定义的“包装器”迭代器,它一次迭代两个向量(当取消引用时返回std::pair)。也许Boost有一个?然后,您可以将其与std::sort和仅考虑“密钥”的自定义比较功能结合使用。

编辑:

我在这里使用了第一个建议来解决过去作为C程序员的类似问题。由于显而易见的原因,它远非理想,但它可能是最快捷的方式。

我没有尝试使用std::sort这样的包装器迭代器,但是注释中的TemplateRex说它不起作用,我很高兴在那个上推迟他。

答案 1 :(得分:0)

我认为问题可能会分为两个独立的部分:

  1. 如何为虚拟地图制作有效的迭代器
  2. 使用哪种排序算法
  3. <强>迭代

    实现迭代器主要问题如何返回未创建的键/值对 不必要的副本我们可以通过使用value_type&amp;的不同类型来实现它。 reference。我的实施就在这里。

    template <typename _Keys, typename _Values>
    class virtual_map
    {
    public:
        typedef typename _Keys::value_type key_type;
        typedef typename _Values::value_type mapped_type;
        typedef std::pair<key_type, mapped_type> value_type;
        typedef std::pair<key_type&, mapped_type&> proxy;
        typedef std::pair<const key_type&, const mapped_type&> const_proxy;
    
        class iterator : 
            public boost::iterator_facade < iterator, value_type, boost::random_access_traversal_tag, proxy >
        {
            friend class boost::iterator_core_access;
    
        public:
            iterator(virtual_map *map_, size_t offset_) :
                map(map_), 
                offset(offset_)
            {}
    
            iterator(const iterator &other_) 
            {
                this->map = other_.map;
                this->offset = other_.offset;
            }
    
        private:
            bool equal(const iterator &other) const
            {
                assert(this->map == other.map);
                return this->offset == other.offset;
            }
    
            void increment() { ++offset; }
            void decrement() { --offset; }
    
            void advance(difference_type n) { offset += n; }
    
            reference dereference() const { return reference(map->keys[offset], map->values[offset]); }
    
            difference_type distance_to(const iterator &other_) const { return other_.offset - this->offset; }
    
        private:
            size_t offset;
            virtual_map *map;
        };
    
    public:
        virtual_map(_Keys &keys_, _Values &values_) :
            keys(keys_), 
            values(values_) 
        {
            if(keys_.size() != values_.size())
                throw std::runtime_error("different size");
        }
    
    public:
        iterator begin() { return iterator(this, 0); }
        iterator end() { return iterator(this, keys.size()); }
    
    protected:
        _Keys &keys;
        _Values &values;
    };
    

    使用样本:

    int main(int argc, char* const argv[]) 
    {
        std::vector<int> keys_ = { 17, 2, 13, 4, 51, 78, 49, 37, 1 };
        std::vector<std::string> values_ = { "17", "2", "13", "4", "51", "78", "49", "37", "1" };
    
        typedef virtual_map<std::vector<int>, std::vector<std::string>> map;
    
        map map_(keys_, values_);
    
        std::sort(std::begin(map_), std::end(map_), [](map::const_proxy left_, map::const_proxy right_)
        {
            return left_.first < right_.first;
        });
    
        return 0;
    }
    

    排序算法

    如果没有额外的细节,很难推断哪种方法更好。你有什么记忆限制?是否可以使用并发?

答案 2 :(得分:0)

有一些问题:

  • 将两个序列一起迭代需要一对代表 对序列元素的引用 - 该对本身不是 参考。因此,处理引用的算法将不起作用。
  • 性能会退化(序列松散耦合) -

使用一对引用和std :: sort的实现:

// Copyright (c) 2014 Dieter Lucking. Distributed under the Boost
// software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

#include <algorithm>
#include <chrono>
#include <memory>
#include <iostream>

// None
// ============================================================================

/// A void type
struct None {

    None()
    {}

    /// Explicit conversion to None.
    template <typename T>
    explicit None(const T&)
    {}

    template <typename T>
    None& operator = (const T&) {
        return *this;
    }

    /// Never null.
    None* operator & () const;
};

extern None& none();
inline None* None::operator & () const { return &none(); }

None& none() {
    static None result;
    return result;
}


// IteratorAdaptorTraits
// ============================================================================

namespace Detail {

    // IteratorAdaptorTraits
    // =====================

    template <typename Iterator, typename ReturnType, bool IsReference>
    struct IteratorAdaptorTraits;

    // No reference
    // ============

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, false>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef ReturnType value_type;
        typedef None reference;
        typedef None pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &none();
        }
    };

    // Reference
    // =========

    template <typename Iterator, typename ReturnType>
    struct IteratorAdaptorTraits<Iterator, ReturnType, true>
    {
        typedef Iterator iterator_type;
        typedef ReturnType return_type;
        typedef typename std::remove_reference<ReturnType>::type value_type;
        typedef ReturnType reference;
        typedef value_type* pointer;

        static_assert(
            ! std::is_base_of<None, return_type>::value,
            "None as return type.");

        template <typename Accessor>
        static return_type iterator_value(const Accessor& accessor, const Iterator& iterator) {
            return accessor.value(iterator);
        }

        template <typename Accessor>
        static pointer iterator_pointer(const Accessor& accessor, const Iterator& iterator) {
            return &accessor.value(iterator);
        }
    };
} // namespace Detail


// RandomAccessIteratorAdaptor
// ============================================================================

/// An adaptor around a random access iterator.
/// \ATTENTION The adaptor will not fulfill the standard iterator requierments,
///            if the accessor does not support references: In that case, the 
///            reference and pointer type are None.
template <typename Iterator, typename Accessor>
class RandomAccessIteratorAdaptor
{
    // Types
    // =====

    private:
    static_assert(
        ! std::is_base_of<None, Accessor>::value,
        "None as accessor.");

    static_assert(
        ! std::is_base_of<None, typename Accessor::return_type>::value,
        "None as return type.");

    typedef typename Detail::IteratorAdaptorTraits<
        Iterator,
        typename Accessor::return_type,
        std::is_reference<typename Accessor::return_type>::value
    > Traits;

    public:
    typedef typename Traits::iterator_type iterator_type;
    typedef Accessor accessor_type;
    typedef typename std::random_access_iterator_tag iterator_category;
    typedef typename std::ptrdiff_t difference_type;
    typedef typename Traits::return_type return_type;
    typedef typename Traits::value_type value_type;
    typedef typename Traits::reference reference;
    typedef typename Traits::pointer pointer;

    typedef typename accessor_type::base_type accessor_base_type;
    typedef RandomAccessIteratorAdaptor<iterator_type, accessor_base_type> base_type;

    // Tag
    // ===

    public:
    struct RandomAccessIteratorAdaptorTag {};

    // Construction
    // ============

    public:
    explicit RandomAccessIteratorAdaptor(
        iterator_type iterator, const accessor_type& accessor = accessor_type())
    :   m_iterator(iterator), m_accessor(accessor)
    {}

    template <typename IteratorType, typename AccessorType>
    explicit RandomAccessIteratorAdaptor(const RandomAccessIteratorAdaptor<
        IteratorType, AccessorType>& other)
    :   m_iterator(other.iterator()), m_accessor(other.accessor())
    {}

    // Element Access
    // ==============

    public:
    /// The underlaying accessor.
    const accessor_type& accessor() const { return m_accessor; }
    /// The underlaying iterator.
    const iterator_type& iterator() const { return m_iterator; }
    /// The underlaying iterator.
    iterator_type& iterator() { return m_iterator; }
    /// The underlaying iterator.
    operator iterator_type () const { return m_iterator; }

    /// The base adaptor.
    base_type base() const {
        return base_type(m_iterator, m_accessor.base());
    }

    // Iterator
    // ========

    public:
    return_type operator * () const {
        return Traits::iterator_value(m_accessor, m_iterator);
    }
    pointer operator -> () const {
        return Traits::iterator_pointer(m_accessor, m_iterator);
    }

    RandomAccessIteratorAdaptor increment() const {
        return ++RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor increment_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator += n;
        return tmp;
    }

    RandomAccessIteratorAdaptor decrement() const {
        return --RandomAccessIteratorAdaptor(*this);
    }
    RandomAccessIteratorAdaptor decrement_n(difference_type n) const {
        RandomAccessIteratorAdaptor tmp(*this);
        tmp.m_iterator -= n;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator ++ () {
        ++m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator ++ (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        ++m_iterator;
        return tmp;

    }
    RandomAccessIteratorAdaptor& operator += (difference_type n) {
        m_iterator += n;
        return *this;
    }

    RandomAccessIteratorAdaptor& operator -- () {
        --m_iterator;
        return *this;
    }
    RandomAccessIteratorAdaptor operator -- (int) {
        RandomAccessIteratorAdaptor tmp(*this);
        --m_iterator;
        return tmp;
    }

    RandomAccessIteratorAdaptor& operator -= (difference_type n) {
        m_iterator -= n;
        return *this;
    }


    bool equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator == other.m_iterator;
    }
    bool less(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator < other.m_iterator;
    }
    bool less_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator <= other.m_iterator;
    }
    bool greater(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator > other.m_iterator;
    }
    bool greater_equal(const RandomAccessIteratorAdaptor& other) const {
        return this->m_iterator >= other.m_iterator;
    }

    private:
    iterator_type m_iterator;
    accessor_type m_accessor;
};


template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator + (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.increment_n(n);
}

template <typename Iterator, typename Accessor>
inline RandomAccessIteratorAdaptor<Iterator, Accessor> operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& i,
    typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type n) {
    return i.decrement_n(n);
}

template <typename Iterator, typename Accessor>
inline typename RandomAccessIteratorAdaptor<Iterator, Accessor>::difference_type
operator - (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.iterator() - b.iterator();
}

template <typename Iterator, typename Accessor>
inline bool operator == (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator != (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return ! a.equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less(b);
}

template <typename Iterator, typename Accessor>
inline bool operator <= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.less_equal(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >  (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater(b);
}

template <typename Iterator, typename Accessor>
inline bool operator >= (
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& a,
    const RandomAccessIteratorAdaptor<Iterator, Accessor>& b) {
    return a.greater_equal(b);
}


// ElementPair
// ============================================================================

/// A pair of references which can mutate to a pair of values.
/// \NOTE If the key is one or two the pair is less comparable
///       regarding the first or second element. 
template <typename First, typename Second, unsigned Key = 0>
class ElementPair
{
    // Types
    // =====

    public:
    typedef First first_type;
    typedef Second second_type;

    // Construction
    // ============

    public:
    /// Reference
    /// \POSTCONDITION reference() returns true
    ElementPair(first_type& first, second_type& second)
    :   m_first(&first), m_second(&second)
    {}

    /// Copy construction
    /// \POSTCONDITION reference() returns false
    ElementPair(const ElementPair& other)
    :   m_first(new(m_first_storage) first_type(*other.m_first)),
        m_second(new(&m_second_storage) second_type(*other.m_second))
    {}

    /// Move construction
    /// \POSTCONDITION reference() returns false
    ElementPair(ElementPair&& other)
    :   m_first(new(m_first_storage) first_type(std::move(*other.m_first))),
        m_second(new(m_second_storage) second_type(std::move(*other.m_second)))
    {}

    ~ElementPair() {
        if( ! reference()) {
            reinterpret_cast<first_type*>(m_first_storage)->~first_type();
            reinterpret_cast<second_type*>(m_second_storage)->~second_type();
        }
    }

    // Assignment
    // ==========

    public:
    /// Swap content.
    void swap(ElementPair& other) {
        std::swap(*m_first, *other.m_first);
        std::swap(*m_second, *other.m_second);
    }

    /// Assign content.
    ElementPair& operator = (const ElementPair& other) {
        if(&other != this) {
            *m_first = *other.m_first;
            *m_second = *other.m_second;
        }
        return *this;
    }

    /// Assign content.
    ElementPair& operator = (ElementPair&& other) {
        if(&other != this) {
            *m_first = std::move(*other.m_first);
            *m_second = std::move(*other.m_second);
        }
        return *this;
    }

    // Element Access
    // ==============

    public:
    /// True if the pair holds references to external elements.
    bool reference() {
        return (m_first != reinterpret_cast<first_type*>(m_first_storage));
    }
    const first_type& first() const { return *m_first; }
    first_type& first() { return *m_first; }

    const second_type& second() const { return *m_second; }
    second_type& second() { return *m_second; }

    private:
    first_type* m_first;
    typename std::aligned_storage<
        sizeof(first_type),
        std::alignment_of<first_type>::value>::type
        m_first_storage[1];

    second_type* m_second;
    typename std::aligned_storage<
        sizeof(second_type),
        std::alignment_of<second_type>::value>::type
        m_second_storage[1];
};

// Compare
// =======

template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 1>& a,
    const ElementPair<First, Second, 1>& b)
{
    return (a.first() < b.first());
}


template <typename First, typename Second>
inline bool operator < (
    const ElementPair<First, Second, 2>& a,
    const ElementPair<First, Second, 2>& b)
{
    return (a.second() < b.second());
}

// Swap
// ====

namespace std {
    template <typename First, typename Second, unsigned Key>
    inline void swap(
        ElementPair<First, Second, Key>& a,
        ElementPair<First, Second, Key>& b)
    {
        a.swap(b);
    }
}

// SequencePairAccessor
// ============================================================================

template <typename FirstSequence, typename SecondSequence, unsigned Keys = 0>
class SequencePairAccessor
{
    // Types
    // =====

    public:
    typedef FirstSequence first_sequence_type;
    typedef SecondSequence second_sequence_type;
    typedef typename first_sequence_type::size_type size_type;
    typedef typename first_sequence_type::value_type first_type;
    typedef typename second_sequence_type::value_type second_type;
    typedef typename first_sequence_type::iterator iterator;

    typedef None base_type;
    typedef ElementPair<first_type, second_type, Keys> return_type;

    // Construction
    // ============

    public:
    SequencePairAccessor(first_sequence_type& first, second_sequence_type& second)
    :   m_first_sequence(&first), m_second_sequence(&second)
    {}

    // Element Access
    // ==============

    public:
    base_type base() const { return base_type();    }
    return_type value(iterator pos) const {
        return return_type(*pos, (*m_second_sequence)[pos - m_first_sequence->begin()]);
    }

    // Data
    // ====

    private:
    first_sequence_type* m_first_sequence;
    second_sequence_type* m_second_sequence;
};

此测试显示性能(在我的系统上)对const char *的因子为1.5,对std :: string的因子为3.4(与保持std :: pair(s)的单个向量相比)

// Test
// ============================================================================

#define SAMPLE_SIZE 1e1
#define VALUE_TYPE const char*

int main() {
    const unsigned samples = SAMPLE_SIZE;

    typedef int key_type;
    typedef VALUE_TYPE value_type;
    typedef std::vector<key_type> key_sequence_type;
    typedef std::vector<value_type> value_sequence_type;

    typedef SequencePairAccessor<key_sequence_type, value_sequence_type, 1> accessor_type;
    typedef RandomAccessIteratorAdaptor<
        key_sequence_type::iterator,
        accessor_type>
        iterator_adaptor_type;

    key_sequence_type keys;
    value_sequence_type values;
    keys.reserve(samples);
    values.reserve(samples);
    const char* words[] = { "Zero", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
    for(unsigned i = 0; i < samples; ++i) {
        key_type k = i % 10;
        keys.push_back(k);
        values.push_back(words[k]);
    }

    accessor_type accessor(keys, values);
    std::random_shuffle(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );

    if(samples <= 10) {
        std::cout << "\nRandom:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    typedef std::pair<key_type, value_type> pair_type;
    std::vector<pair_type> ref;
    for(const auto& k: keys) {
        ref.push_back(pair_type(k, words[k]));
    }

    struct Less {
        bool operator () (const pair_type& a, const pair_type& b) const {
            return a.first < b.first;
        }
    };
    auto ref_start = std::chrono::system_clock::now();
    std::sort(ref.begin(), ref.end(), Less());
    auto ref_end = std::chrono::system_clock::now();
    auto ref_elapsed = double((ref_end - ref_start).count())
                     / std::chrono::system_clock::period::den;

    auto start = std::chrono::system_clock::now();
    std::sort(
        iterator_adaptor_type(keys.begin(), accessor),
        iterator_adaptor_type(keys.end(), accessor)
    );
    auto end = std::chrono::system_clock::now();
    auto elapsed = double((end - start).count())
                 / std::chrono::system_clock::period::den;;

    if(samples <= 10) {
        std::cout << "\nSorted:\n"
                  <<   "======\n";
        for(unsigned i = 0; i < keys.size(); ++i)
            std::cout << keys[i] << ": "  << values[i] << '\n';
    }

    std::cout << "\nDuration sorting " << double(samples) << " samples:\n"
              <<   "========\n"
              << " One Vector: " << ref_elapsed << '\n'
              << "Two Vectors: " << elapsed << '\n'
              << "     Factor: " << elapsed/ref_elapsed << '\n'
              << '\n';
}

(请调整SAMPLE_SIZE和VALUE_TYPE)

我的结论是对未排序数据序列的排序视图可能更具有不同意义(但这违反了问题的要求)。