c ++ set输出的元素多于它包含的元素

时间:2014-08-12 20:30:55

标签: c++ ubuntu gcc set std

我有一个非常大的字符串集,我想找到唯一字符串的子集,我正在使用set容器。这些方法转到MySQL数据库,引入一组新的字符串并尝试将它们添加到一个集合中。我检查插入的返回以确定是否添加了字符串(第一次出现)或者它已经存在。

#include <iostream>
#include <string>
#include <fstream>
#include <algorithm>
#include <vector>
#include <iostream>

#include "CDR3Sample.h"
#include "MySQLConnect.h"

using namespace std;

int main() {

    CDR3SetReturn ret;
    //CDR3Set is a typedef on set<string>
    CDR3Set total;

    try
    {
            MySQLConnect connection;
            cerr << "size of master " << connection.getMasterSize() << endl;

            SampleIDList list = connection.getSampleIDList();
            SampleIDList ids_seen;
            cerr << "size of raw ID list " << list.size() << endl;


            for (SampleIDListIterator it=list.begin(); it != list.end(); it++) {
                    // We're going to skip it if the table doesn't exist or if the sample has already been processed
                if (connection.checkTable(*it) && find(ids_seen.begin(), ids_seen.end(), *it)!=list.end()) {
            CDR3Sample s(*it, connection);
            int valid_number = 0;
            for (CDR3SetIterator sit=s.begin(); sit != s.end(); sit++) {
                ret = total.insert(*sit);
                if (ret.second) {
                    valid_number++;
                }
            }
            cout << *it << " " << s.getLength() << " " << valid_number << " " << total.size() << endl;
            ids_seen.push_back(*it);
                } else {
                    cerr << *it << " table not found" << endl;
                }
            }
    }
    catch (int i)
    {
            // Need to put code here to save state of calculation
        std::cerr << "Exception thrown by MySQLConnect " << i << std::endl;

        exit(-1);
    }

    // Need to put code here to save state of calculation
    cerr << "size of total " << total.size() << endl;
    ofstream ofs ("cdr3_tally.test", ifstream::out);
    int it_count=0;
    while (ofs.good()) {
        for (CDR3SetIterator it=total.begin(); it != total.end(); ++it) {
            cout << it_count << " " << *it  << endl;
            it_count++;
        }
    }
    ofs.close();
    cerr << "it_count " << it_count << endl;

    ofs_naive.close();


    return 0;
}

为了简洁起见,我将保留支持代码,但我可以提供。

当它结束时,它具有正确的条目数:

size of master 9243
size of raw ID list 1
~MySQLConnect
size of total 372

但是写出这个集合的循环只会持续进行数百万行。如果我在输出上使用sort -u,则它具有正确的条目数。

我很难过。代码看起来不错。这并不复杂。

有人能看到我做错了吗?我应该用CDR3Set而不是typdef来制作一个正式的课程吗?

我在ubuntu上使用g ++

  

$ g ++ -v   使用内置规格。   COLLECT_GCC =克++   COLLECT_LTO_WRAPPER = / usr / lib中/ GCC / x86_64的-Linux的GNU / 4.8 / LTO-包装   目标:x86_64-linux-gnu   配置为:../ src /configure -v --with-pkgversion =&#39; Ubuntu 4.8.1-2ubuntu1~12.04&#39; --with-bugurl = file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages = c,c ++,java,go,d,fortran,objc,obj-c ++ --prefix = / usr --program-suffix = -4.8 --enable-shared --enable-linker-build-id --libexecdir = / usr / lib --without-included-gettext --enable-threads = posix --with -gxx-include-dir = / usr / include / c ++ / 4.8 --libdir = / usr / lib --enable-nls --with-sysroot = / --enable-clocale = gnu --enable-libstdcxx-debug - -enable-libstdcxx-time = yes --enable-gnu-unique-object --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt = gtk --enable-gtk -cairo --with-java-home = / usr / lib / jvm / java-1.5.0-gcj-4.8-amd64 / jre --enable-java-home --with-jvm-root-dir = / usr / lib / jvm / java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir = / usr / lib / jvm-exports / java-1.5.0-gcj-4.8-amd64 --with-arch -directory = amd64 --with-ecj-jar = / usr / share / java / eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32 = i686 --with-abi = m64 --with-multilib-list = m32,m64 --with-tune = generic --enable-checking = release --buil d = x86_64-linux-gnu --host = x86_64-linux-gnu --target = x86_64-linux-gnu   线程模型:posix   gcc版本4.8.1(Ubuntu 4.8.1-2ubuntu1~12.04)

由于

麦克

1 个答案:

答案 0 :(得分:2)

您的cout for循环包含在while(ofs.good())中。 for循环中的任何内容都不会使它变坏,所以它会一直循环遍历集合并一次又一次地打印所有内容。