使用STL容器进行中值计算时,正确的方法是什么?

时间:2009-11-12 00:32:33

标签: c++ stl containers median

假设我需要从1000000个随机数值序列中检索中位数。

如果使用任何 STL :: list,我没有(内置)方法对序列进行排序以进行中值计算。

如果使用STL :: list,我不能随机访问值来检索排序序列的中间(中位数)。

自己实施排序是否更好? STL :: vector,还是使用STL :: list并使用STL :: list :: iterator for-loop-walk到中值?后者看起来不那么开明,但也感觉更难看......

或者我有更多更好的选择吗?

10 个答案:

答案 0 :(得分:84)

任何随机访问容器(如std::vector)都可以使用std::sort标头中提供的标准<algorithm>算法进行排序。

为了找到中位数,使用std::nth_element会更快;这足以将一个选定的元素放在正确的位置,但不会对容器进行完全排序。所以你可以找到这样的中位数:

int median(vector<int> &v)
{
    size_t n = v.size() / 2;
    nth_element(v.begin(), v.begin()+n, v.end());
    return v[n];
}

答案 1 :(得分:33)

中位数比Mike Seymour的答案更复杂。中位数取决于样本中是偶数还是奇数项。如果项目数为偶数,则中位数是中间两项的平均值。这意味着整数列表的中位数可以是一小部分。最后,空列表的中位数未定义。这是通过我的基本测试用例的代码:

///Represents the exception for taking the median of an empty list
class median_of_empty_list_exception:public std::exception{
  virtual const char* what() const throw() {
    return "Attempt to take the median of an empty list of numbers.  "
      "The median of an empty list is undefined.";
  }
};

///Return the median of a sequence of numbers defined by the random
///access iterators begin and end.  The sequence must not be empty
///(median is undefined for an empty set).
///
///The numbers must be convertible to double.
template<class RandAccessIter>
double median(RandAccessIter begin, RandAccessIter end) 
  throw(median_of_empty_list_exception){
  if(begin == end){ throw median_of_empty_list_exception(); }
  std::size_t size = end - begin;
  std::size_t middleIdx = size/2;
  RandAccessIter target = begin + middleIdx;
  std::nth_element(begin, target, end);

  if(size % 2 != 0){ //Odd number of elements
    return *target;
  }else{            //Even number of elements
    double a = *target;
    RandAccessIter targetNeighbor= target-1;
    std::nth_element(begin, targetNeighbor, end);
    return (a+*targetNeighbor)/2.0;
  }
}

答案 2 :(得分:9)

这是Mike Seymour的更完整版本的答案:

// Could use pass by copy to avoid changing vector
double median(std::vector<int> &v)
{
  size_t n = v.size() / 2;
  std::nth_element(v.begin(), v.begin()+n, v.end());
  int vn = v[n];
  if(v.size()%2 == 1)
  {
    return vn;
  }else
  {
    std::nth_element(v.begin(), v.begin()+n-1, v.end());
    return 0.5*(vn+v[n-1]);
  }
}

它处理奇数或偶数长度的输入。

答案 3 :(得分:5)

该算法使用STL nth_element(分摊的O(N))算法和max_element算法(O(n))有效地处理偶数和奇数大小的输入。请注意,nth_element还有另一个保证的副作用,即n之前的所有元素都保证小于v[n],但不一定排序。

//post-condition: After returning, the elements in v may be reordered and the resulting order is implementation defined.
double median(vector<double> &v)
{
  if(v.empty()) {
    return 0.0;
  }
  auto n = v.size() / 2;
  nth_element(v.begin(), v.begin()+n, v.end());
  auto med = v[n];
  if(!(v.size() & 1)) { //If the set size is even
    auto max_it = max_element(v.begin(), v.begin()+n);
    med = (*max_it + med) / 2.0;
  }
  return med;    
}

答案 4 :(得分:4)

您可以使用库函数std::vectorstd::sort进行排序。

std::vector<int> vec;
// ... fill vector with stuff
std::sort(vec.begin(), vec.end());

答案 5 :(得分:3)

汇总了这个帖子的所有见解,我最终得到了这个例程。它适用于任何stl-container或任何提供输入迭代器的类,并处理奇数和偶数大小的容器。它也可以在容器的副本上工作,而不是修改原始内容。

template <typename T = double, typename C>
inline const T median(const C &the_container)
{
    std::vector<T> tmp_array(std::begin(the_container), 
                             std::end(the_container));
    size_t n = tmp_array.size() / 2;
    std::nth_element(tmp_array.begin(), tmp_array.begin() + n, tmp_array.end());

    if(tmp_array.size() % 2){ return tmp_array[n]; }
    else
    {
        // even sized vector -> average the two middle values
        auto max_it = std::max_element(tmp_array.begin(), tmp_array.begin() + n);
        return (*max_it + tmp_array[n]) / 2.0;
    }
}

答案 6 :(得分:2)

这是一个考虑@MatthieuM建议的答案。即不会修改输入向量。它对偶数和奇数基数的范围使用单个部分排序(在索引向量上),而空范围由向量&#39; at方法抛出的异常处理:

double median(vector<int> const& v)
{
    bool isEven = !(v.size() % 2); 
    size_t n    = v.size() / 2;

    vector<size_t> vi(v.size()); 
    iota(vi.begin(), vi.end(), 0); 

    partial_sort(begin(vi), vi.begin() + n + 1, end(vi), 
        [&](size_t lhs, size_t rhs) { return v[lhs] < v[rhs]; }); 

    return isEven ? 0.5 * (v[vi.at(n-1)] + v[vi.at(n)]) : v[vi.at(n)];
}

Demo

答案 7 :(得分:1)

存在linear-time selection algorithm。以下代码仅在容器具有随机访问迭代器时才有效,但可以修改为无需工作 - 您只需要更加小心谨慎,以避免使用end - beginiter + n等快捷方式

#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <vector>

template<class A, class C = std::less<typename A::value_type> >
class LinearTimeSelect {
public:
    LinearTimeSelect(const A &things) : things(things) {}
    typename A::value_type nth(int n) {
        return nth(n, things.begin(), things.end());
    }
private:
    static typename A::value_type nth(int n,
            typename A::iterator begin, typename A::iterator end) {
        int size = end - begin;
        if (size <= 5) {
            std::sort(begin, end, C());
            return begin[n];
        }
        typename A::iterator walk(begin), skip(begin);
#ifdef RANDOM // randomized algorithm, average linear-time
        typename A::value_type pivot = begin[std::rand() % size];
#else // guaranteed linear-time, but usually slower in practice
        while (end - skip >= 5) {
            std::sort(skip, skip + 5);
            std::iter_swap(walk++, skip + 2);
            skip += 5;
        }
        while (skip != end) std::iter_swap(walk++, skip++);
        typename A::value_type pivot = nth((walk - begin) / 2, begin, walk);
#endif
        for (walk = skip = begin, size = 0; skip != end; ++skip)
            if (C()(*skip, pivot)) std::iter_swap(walk++, skip), ++size;
        if (size <= n) return nth(n - size, walk, end);
        else return nth(n, begin, walk);
    }
    A things;
};

int main(int argc, char **argv) {
    std::vector<int> seq;
    {
        int i = 32;
        std::istringstream(argc > 1 ? argv[1] : "") >> i;
        while (i--) seq.push_back(i);
    }
    std::random_shuffle(seq.begin(), seq.end());
    std::cout << "unordered: ";
    for (std::vector<int>::iterator i = seq.begin(); i != seq.end(); ++i)
        std::cout << *i << " ";
    LinearTimeSelect<std::vector<int> > alg(seq);
    std::cout << std::endl << "linear-time medians: "
        << alg.nth((seq.size()-1) / 2) << ", " << alg.nth(seq.size() / 2);
    std::sort(seq.begin(), seq.end());
    std::cout << std::endl << "medians by sorting: "
        << seq[(seq.size()-1) / 2] << ", " << seq[seq.size() / 2] << std::endl;
    return 0;
}

答案 8 :(得分:1)

Armadillo的实现类似于https://stackoverflow.com/a/34077478的答案https://stackoverflow.com/users/2608582/matthew-fioravante中的实现

它使用一个对nth_element的调用和一个对max_element的调用,它在这里: https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/op_median_meat.hpp#L380

//! find the median value of a std::vector (contents is modified)
template<typename eT>
inline 
eT
op_median::direct_median(std::vector<eT>& X)
  {
  arma_extra_debug_sigprint();
  
  const uword n_elem = uword(X.size());
  const uword half   = n_elem/2;
  
  typename std::vector<eT>::iterator first    = X.begin();
  typename std::vector<eT>::iterator nth      = first + half;
  typename std::vector<eT>::iterator pastlast = X.end();
  
  std::nth_element(first, nth, pastlast);
  
  if((n_elem % 2) == 0)  // even number of elements
    {
    typename std::vector<eT>::iterator start   = X.begin();
    typename std::vector<eT>::iterator pastend = start + half;
    
    const eT val1 = (*nth);
    const eT val2 = (*(std::max_element(start, pastend)));
    
    return op_mean::robust_mean(val1, val2);
    }
  else  // odd number of elements
    {
    return (*nth);
    }
  }

答案 9 :(得分:0)

you can use this approch. It also takes care of sliding window.
Here days are no of trailing elements for which we want to find median and this makes sure the original container is not changed


#include<bits/stdc++.h>

using namespace std;

int findMedian(vector<int> arr, vector<int> brr, int d, int i)
{
    int x,y;
    x= i-d;
    y=d;
    brr.assign(arr.begin()+x, arr.begin()+x+y);


    sort(brr.begin(), brr.end());

    if(d%2==0)
    {
        return((brr[d/2]+brr[d/2 -1]));
    }

    else
    {
        return (2*brr[d/2]);
    }

    // for (int i = 0; i < brr.size(); ++i)
    // {
    //     cout<<brr[i]<<" ";
    // }

    return 0;

}

int main()
{
    int n;
    int days;
    int input;
    int median;
    int count=0;

    cin>>n>>days;

    vector<int> arr;
    vector<int> brr;

    for (int i = 0; i < n; ++i)
    {
        cin>>input;
        arr.push_back(input);
    }

    for (int i = days; i < n; ++i)
    {
        median=findMedian(arr,brr, days, i);

        
    }



    return 0;
}
相关问题