Question

我需要优化一些遗留代码，对C ++来说还是新手。

代码在两个线程中进行网络数据包处理，一个线程将数据包推送到FIFO [topupBuffer]，另一个线程从队列中读取并将它们发送出IP套接字[writeToIPOutput]。遗留代码使用std :: deque来实现FIFO。

然而，运行该程序使用了大量的CPU，高达50％（需要更多的是5％）。运行gprof似乎表明std::deque是罪魁祸首。（我不确定我是否正确解释了个人资料结果，所以感谢您的帮助）

配置文件输出中的摘要： topupBuffer层次结构：

index % time    self  children    called     name
                0.65    2.51       1/1           DvIPFilePlayback::topupBufferThreadMethod(void*) [2]
[1]     60.5    0.65    2.51       1         DvIPFilePlayback::topupBuffer() [1]
                0.27    1.15 4025575/4025575     DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
                0.03    0.56 4026668/4026668     std::deque<TTsPacket, std::allocator<TTsPacket> >::push_back(TTsPacket const&) [6]
                0.03    0.15 4046539/5749754     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::size() const [17]

和

[5]     27.2    0.27    1.15 4025575         DvIPPlaybackBC::bufferizeTsPackets(TPlaybackBuffer&, int&, int&) [5]
                0.04    0.30 4031674/4031674     std::deque<TTsPacket, std::allocator<TTsPacket> >::pop_front() [11]
                0.03    0.30 8058004/8058004     std::deque<TTsPacket, std::allocator<TTsPacket> >::size() const [12]
                0.01    0.19  576183/576183      DvPlaybackBC::insertToPlaybackBuffer(TPlaybackBuffer const&) [22]
                0.04    0.11 4029401/4029401     std::deque<TTsPacket, std::allocator<TTsPacket> >::front() [25]

writeToIPOutput Hierarchy

[3]     36.8    0.92    1.00       1         DvIPPlaybackBC::writeToIPOutput() [3]
                0.31    0.00 1129444/1129444     TPlaybackBuffer::operator=(TPlaybackBuffer const&) [13]
                0.01    0.18  579235/1155128     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::push_back(TPlaybackBuffer const&) [8]
                0.03    0.10 1135318/1135318     std::deque<TPlaybackBuffer, std::allocator<TPlaybackBuffer> >::pop_front() [27]

我猜writeToIPOutput花费了太多时间进行作业。我可以继续努力。但是topupBuffer将时间用在std::deque。

这是对配置文件输出的正确解释吗？

如果是这样，那么使用不同的容器会更有效，如果是，那么哪一个？

谢谢

编辑我 调用树末尾的解释性说明如下：

% time  This is the percentage of the `total' time that was spent
        in this function and its children.  Note that due to
        different viewpoints, functions excluded by options, etc,
        these numbers will NOT add up to 100%.

self    This is the total amount of time spent in this function.

children    This is the total amount of time propagated into this
        function by its children.

所以看bufferizeTsPackets，1.15用于其子项，其中0.30 + 0.30 + 0.11 = 0.71用于不同的deque方法（push_back，size等）。对？所以0.71超过了孩子们花费的总时间（1.15）的一半（??）

Answer 1

更有效的结构是使用数组实现循环队列（环形缓冲区）。

由于数组是固定大小的，你要么必须使数组足够大，所以没有数据溢出;或者只存储最后N个值，其中N是缓冲区的容量。

许多嵌入式系统使用数组来减少由动态内存位置引起的内存碎片问题。

如果您的阵列足够小，它可以适合处理器的数据缓存;这加快了计算速度。

用于高性能FIFO的C ++容器

1 个答案: