Question

我有一个应用程序（在C ++中），我需要在字符串和整数之间有一组配对，即：

("david", 0)
("james", 1)
("helen", 2)
...

如果我们使用java（key，value）定义，我需要能够（1）搜索以查看映射中是否存在键，以及（2）检索与给定字符串（键）相关联的值。在java中工作时，我发现HashMap类型可以处理我需要的所有内容。

我想在C ++中做同样的事情。我做了一些谷歌搜索，发现在C ++ 2011库中有一个unordered_map类型，可以复制它。我很好奇这是否是最好的方法。

在我的应用程序中，我对集合

有以下规则

整数始终是连续的（根据示例）并从0开始。
整数值永远不会改变。
地图是在应用程序开始时创建的，并且不会发生变化，即它是不可变的。
没有重复的字符串键。
创建地图后，我不知道我需要使用多少个键（以及扩展的整数值）。我的应用程序的一个参数是文本文件的目录，其中包含要使用的单词列表。
我不关心与此相关的启动时间成本。我需要主要任务（即containsKey（..）和get（key）尽可能快）。它将被称为A LOT。该应用程序集中于处理大型文本语料库（即维基百科）并在单词/文档之间形成共同出现矩阵。

我认为不是存储整数和字符串，而是将字符串存储在某些列表类型中，然后返回索引，即 data = {＆＃34; david＆＃34;，＆＃34; james＆＃34;，＆＃34; helen＆＃34;，...}

然后像find_Map（data，key）返回它所在的索引（值）。我认为这可以通过首先按升序排序并应用搜索算法来加快速度。但同样，这只是猜测。

我很欣赏这是一个常见问题，并且存在许多不同的方法。我要编写一些不同的想法，但我认为最好先询问小组，看看你们的想法。

Answer 1

您可以使用unordered_map<string,int>。

Answer 2

根据您要存储的数据量，有两种可能性：

对于半数量的数据，我认为std::unordered_map<string, int>会很好
如果您想处理大量数据，那么考虑更多用于字符串存储的专用数据结构可能会有所帮助，例如：尝试，其中具有公共前缀的字符串存储在公共子树中。这也可以提高您的空间使用率，因为数据会被类压缩。我所知道的最有效的实现是在python pytries包中也使用的marisa-trie。

Answer 3

简单的答案当然是std::unordered_map。但是，为了获得更多功能和自动索引一致性，我们可以参与boost::multi_index_container。

例如：

namespace bmi = boost::multi_index;

// Define a custom container type
using my_map = boost::multi_index_container<
    // It holds StringValue objects
    StringValue,
    bmi::indexed_by<
        // first index is called by_string, is a unique hashed index with constant time lookuo
        bmi::hashed_unique<bmi::tag<by_string>, bmi::member<StringValue, std::string, &StringValue::str>>,

        // second index is called by_value, is a unique hashed index with constant time lookup
        bmi::hashed_unique<bmi::tag<by_value>, bmi::member<StringValue, int, &StringValue::value>>,

        // second index is called ordered_by_value, is a unique ordered index with logarithmic time lookup
        bmi::ordered_unique<bmi::tag<ordered_by_value>, bmi::member<StringValue, int, &StringValue::value>>
    >
>;

在此示例中，my_map被定义为容器：

持有StringValue个对象
通过对象的str成员维护散列的唯一索引
通过对象的value成员维护散列的唯一索引
维护对象value成员的有序唯一索引，以防我们希望按值枚举（例如）

完整示例：

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/indexed_by.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/ordered_index.hpp>

#include <boost/format.hpp>
#include <string>
#include <iostream>
#include <iomanip>
#include <cassert>
#include <type_traits>

// define a value object
struct StringValue
{
    std::string str;
    int         value;
};

// provide a way to stream the pair to an ostream
std::ostream& operator <<(std::ostream& os, StringValue const& sv)
{
    static const char fmt[] = R"__({ "str": %1%, "value": %2% })__";
    return os << boost::format(fmt) % std::quoted(sv.str) % sv.value;
}

struct by_string
{
};
struct by_value
{
};
struct ordered_by_value
{
};

namespace bmi = boost::multi_index;

// Define a custom container type
using my_map = boost::multi_index_container<
    // It holds StringValue objects
    StringValue,
    bmi::indexed_by<
        // first index is called by_string, is a unique hashed index with constant time lookuo
        bmi::hashed_unique<bmi::tag<by_string>, bmi::member<StringValue, std::string, &StringValue::str>>,
        // second index is called by_value, is a unique hashed index with constant time lookup
        bmi::hashed_unique<bmi::tag<by_value>, bmi::member<StringValue, int, &StringValue::value>>,
        // second index is called ordered_by_value, is a unique ordered index with logarithmic time lookup
        bmi::ordered_unique<bmi::tag<ordered_by_value>, bmi::member<StringValue, int, &StringValue::value>>
    >
>;

template<class Array>
struct ArrayEmitter
{
    const Array& array;

    friend std::ostream& operator<<(std::ostream& os, ArrayEmitter const& em) {
        const char* sep = " ";
        os << "[";
        for (auto&& item : em.array) {
            os << sep << item;
            sep = ", ";
        }
        return os << " ]";
    }
};

template<class Array>
auto emit_as_array(Array&& arr)
{
    return ArrayEmitter<std::remove_cv_t<Array>> { arr };
}

int main()
{
    my_map mm { { "B", 3 }, { "D", 1 }, { "A", 4 }, { "C", 2 } };

    // assert that we can't violate the indecies
    auto ib = mm.insert(StringValue{"E", 1});
    assert(ib.second == false);

    // iterate by string
    std::cout << "print by value index:\n";
    std::cout << emit_as_array(mm.get<by_string>()) << std::endl;

    std::cout << "\nprint by value index unordered:\n";
    std::cout << emit_as_array(mm.get<by_value>()) << std::endl;

    std::cout << "\nprint by value index ordered:\n";
    std::cout << emit_as_array(mm.get<ordered_by_value>()) << std::endl;

    std::cout << "\nfind an element by value in constant time:\n";
    auto&& name = mm.get<by_value>().find(2)->str;
    std::cout << name << std::endl;
}

预期产出：

print by value index:
[ { "str": "B", "value": 3 }, { "str": "D", "value": 1 }, { "str": "A", "value": 4 }, { "str": "C", "value": 2 } ]

print by value index unordered:
[ { "str": "B", "value": 3 }, { "str": "D", "value": 1 }, { "str": "A", "value": 4 }, { "str": "C", "value": 2 } ]

print by value index ordered:
[ { "str": "D", "value": 1 }, { "str": "C", "value": 2 }, { "str": "B", "value": 3 }, { "str": "A", "value": 4 } ]

find an element by value in constant time:
C

文档：

http://www.boost.org/doc/libs/1_62_0/libs/multi_index/doc/tutorial/index.html

C ++中等效的Hashmap

3 个答案: