I which situation will std::map<a,b> be faster than sorted std::vector<std::pair<a,b>&gt;?

时间:2016-02-12 20:53:07

标签: c++ performance dictionary vector

I was using a use mysql; SET GLOBAL general_log = 'OFF'; DROP TABLE general_log; CREATE TABLE IF NOT EXISTS `general_log` ( `event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `user_host` mediumtext NOT NULL, `thread_id` bigint(21) unsigned NOT NULL, -- Be careful with this one. `server_id` int(10) unsigned NOT NULL, `command_type` varchar(64) NOT NULL, `argument` mediumtext NOT NULL ); SET GLOBAL general_log = 'ON'; SET GLOBAL log_output = 'TABLE'; select * from mysql.general_log order by event_time desc; in some code to store ordered data. I found out that for huge maps, destruction could take a while. In this code I had, replacing map by map reduced processing time by 10000...

Finally, I was so surprised that I decided to compare vector<pair> performances with sorted map or vector.

And I'm surprised because I could not find a situation where pair was faster than a sorted map of vector (filled randomly and later sorted)...there must be some situations where pair is faster....else what's the point in providing this class?

Here is what I tested:

Test one, compare map filling and destroying vs map filling, sorting (because I want a sorted container) and destroying:

vector

Compiled with #include <iostream> #include <time.h> #include <cstdlib> #include <map> #include <vector> #include <algorithm> int main(void) { clock_t tStart = clock(); { std::map<float,int> myMap; for ( int i = 0; i != 10000000; ++i ) { myMap[ ((float)std::rand()) / RAND_MAX ] = i; } } std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; tStart = clock(); { std::vector< std::pair<float,int> > myVect; for ( int i = 0; i != 10000000; ++i ) { myVect.push_back( std::make_pair( ((float)std::rand()) / RAND_MAX, i ) ); } // sort the vector, as we want a sorted container: std::sort( myVect.begin(), myVect.end() ); } std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; return 0; } and got:

g++ main.cpp -O3 -o main

Time taken by map: 21.7142 Time taken by vect: 7.94725 's 3 times slower...

Then, I said, "OK, vector is faster to fill and sort, but search will be faster with the map"....so I tested:

map

Compiled with #include <iostream> #include <time.h> #include <cstdlib> #include <map> #include <vector> #include <algorithm> int main(void) { clock_t tStart = clock(); { std::map<float,int> myMap; float middle = 0; float last; for ( int i = 0; i != 10000000; ++i ) { last = ((float)std::rand()) / RAND_MAX; myMap[ last ] = i; if ( i == 5000000 ) middle = last; // element we will later search } std::cout << "Map created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; float sum = 0; for ( int i = 0; i != 10; ++i ) sum += myMap[ last ]; // search it std::cout << "Sum is " << sum << std::endl; } std::cout << "Time taken by map: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; tStart = clock(); { std::vector< std::pair<float,int> > myVect; std::pair<float,int> middle; std::pair<float,int> last; for ( int i = 0; i != 10000000; ++i ) { last = std::make_pair( ((float)std::rand()) / RAND_MAX, i ); myVect.push_back( last ); if ( i == 5000000 ) middle = last; // element we will later search } std::sort( myVect.begin(), myVect.end() ); std::cout << "Vector created after " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; float sum = 0; for ( int i = 0; i != 10; ++i ) sum += (std::find( myVect.begin(), myVect.end(), last ))->second; // search it std::cout << "Sum is " << sum << std::endl; } std::cout << "Time taken by vect: " << ((double)(clock() - tStart)/CLOCKS_PER_SEC) << std::endl; return 0; } and got:

g++ main.cpp -O3 -o main

Even search is apparently faster with the Map created after 19.5357 Sum is 1e+08 Time taken by map: 21.41 Vector created after 7.96388 Sum is 1e+08 Time taken by vect: 8.31741 (10 searchs with the vector took almost 2sec and it took only half a second with the map)....

So:

  • Did I miss something?
  • Is my tests not correct/accurate?
  • Is vector simply a class to avoid or is there really situations where map offers good performances?

2 个答案:

答案 0 :(得分:3)

通常情况下,map会更好地进行大量插入和删除操作。如果您构建一次数据结构然后只进行查找,那么排序的vector几乎肯定会更快,只是因为处理器缓存效应。由于向量中任意位置的插入和删除都是O(n)而不是O(log n),因此这些将成为限制因素。

答案 1 :(得分:1)

std::find具有线性时间复杂度,而map搜索具有log N复杂度。

当你发现一个算法比另一个算法快100000倍时,你会产生怀疑!您的基准无效。

您需要比较现实的变体。可能,您的意思是将地图与二进制搜索进行比较。运行每个变量至少1秒的CPU时间,以便您可以实际比较结果。

当基准测试返回“0.00001秒”时间时,您可以很好地处理时钟误差。这个数字什么都没有。