有效地重新排序基于C ++地图的集合的方法

时间:2012-06-06 11:43:50

标签: c++ algorithm stl map

我有一个大的(ish - > 100K)集合,将用户标识符(int)映射到他们购买的不同产品的数量(也是一个int。)我需要有效地重新组织数据尽可能找到有多少用户拥有不同数量的产品。例如,有多少用户拥有1个产品,有多少用户拥有两个产品等。

我通过将原始数据从std::map转换为std::multimap(其中键和值只是反转)来实现此目的。然后我可以选择的用户数量N 产品使用count(N)(虽然我也将值唯一存储在一个集合中,因此我可以确定我迭代的值的确切数量及其顺序)


// uc is a std::map<int, int> containing the  original
// mapping of user identifier to the count of different
// products that they've bought.
std::set<int> uniqueCounts;
std::multimap<int, int> cu; // This maps count to user.

for ( map<int, int>::const_iterator it = uc.begin();
        it != uc.end();  ++it )
    cu.insert( std::pair<int, int>( it->second, it->first ) );
    uniqueCounts.insert( it->second );

// Now write this out
for ( std::set<int>::const_iterator it = uniqueCounts.begin();
        it != uniqueCounts.end();  ++it )
    std::cout << "==> There are "
            << cu.count( *it ) << " users that have bought "
            << *it << " products(s)" << std::endl;


我受限于我无法使用Boost或C ++ 11来执行此操作


4 个答案:

答案 0 :(得分:4)




std::vector< int > uniqueCounts( MAX_PRODUCTS_PER_USER );

for ( map<int, int>::const_iterator it = uc.begin();
        it != uc.end();  ++it )
    uniqueCounts[ uc.second ]++;

// Now write this out
for ( int i = 0, std::vector< int >::const_iterator it = uniqueCounts.begin();
        it != uniqueCounts.end();  ++it, ++i )
    std::cout << "==> There are "
            << *it << " users that have bought "
            << i << " products(s)" << std::endl;


所有这一切都假设您在处理完这些数据之后实际上并不需要用户ID(并且如下面的评论所指出的那样,为每个用户购买的产品数量相对较小&amp;连续集。否则你可能最好用地图代替向量 - 你仍然会避免调用multimap :: count函数,但可能会失去一些其他好处)

答案 1 :(得分:2)


我看到的唯一需要改进的地方是内存使用情况。如果这是一个问题,你可以跳过多图的生成,只需保留一个计数器图,就像这样(小心,我的C ++有点生锈):

std::map<int, int> countFrequency; // count => how many customers with that count

for ( std::map<int, int>::const_iterator it = uc.begin();
        it != uc.end();  ++it )
    // If it->second is not yet in countFrequency, 
    // the default constructor initializes it to 0.
    countFrequency[it->second] += 1;

// Now write this out
for ( std::map<int, int>::const_iterator it = countFrequency.begin();
        it != countFrequency.end();  ++it )
    std::cout << "==> There are "
            << it->second << " users that have bought "
            << it->first << " products(s)" << std::endl;


countFrequency[count] += 1;


countFrequency[oldCount] -= 1;
countFrequency[newCount] += 1;

现在,除了旁边,我建议使用unsigned int进行计数(除非有合理的负数计数原因)并输入userID类型,以增加可读性。

答案 2 :(得分:1)


答案 3 :(得分:1)


typedef std::map<int, int> Map;
typedef Map::const_iterator It;

template <typename Container>
void get_counts(const Map &source, Container &dest) {
    for (It it = source.begin(); it != source.end(); ++it) {

template <typename Container>
void print_counts(Container &people, int max_count) {
    for (int i = 0; i <= max_count; ++i) {
        if contains(people, i) {
            std::cout << "==> There are "
                << people[i] << " users that have bought "
                << i << " products(s)" << std::endl;

// As an alternative to this overloaded contains(), you could write
// an overloaded print_counts -- after all the one above is not an 
// efficient way to iterate a sparsely-populated map. 
// Or you might prefer a template function that visits
// each entry in the container, calling a specified functor to
// will print the output, and passing it the key and value.
// This is just the smallest point of customization I thought of.
bool contains(const Map &c, int key) {
    return c.count(key);
bool contains(const std::vector<int, int> &c, int key) {
    // also check 0 < key < c.size() for a more general-purpose function
    return c[key]; 

void do_everything(const Map &uc) {
    // first get the max product count
    int max_count = 0;
    for (It it = uc.begin(); it != uc.end(); ++it) {
        max_count = max(max_count, it->second);

    if (max_count > uc.size()) { // or some other threshold
        Map counts;
        get_counts(uc, counts);
        print_counts(counts, max_count);
    } else {
        std::vector<int> counts(max_count+1);
        get_counts(uc, counts);
        print_counts(counts, max_count);
