Question

我总是说文件io进程是最慢的。但是，当我测试以下两个过程时：

情景1：

test.open("test.xml",fstream::out);
for(int i=0;i<1000;i++)
{
    test<<"<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
}
test.close();

情景2：

test.open("test.xml",fstream::out);
stringstream fileDataStr;
for(int i=0;i<1000;i++)
{
    fileDataStr<<"<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
}
test<<fileDataStr;
test.close();

我希望senario1更慢，因为它确实有1000个文件，但测试结果显示它与scenario2具有相同的速度（就clock_t而言）。为什么会这样，它与文件读取中的操作系统优化有关吗？ getline while reading a file vs reading whole file and then splitting based on newline character

编辑：根据@irW的建议

  string fileDataStr;

更改为

stringstream fileDataStr;

Answer 1

由于std::ofstream缓冲输出的方式，你最终会在这两种情况下完成相同数量的IO。（通常情况下，无论如何 - 实施可以在什么时候优化你输出一个很长的字符串。）唯一的区别是在第二种情况，你引入了一个额外的中间体缓冲区，这意味着更多的复制，还有一些动态分配。（动态分配的数量取决于多少实施，但不应该太多。）

Answer 2

每次你有fileDataStr+=你正在创建一个新字符串并将前一个字符串复制到其中，字符串是不可变的！如果您使用stringstream，则可能是更公平的比较。

Answer 3

对此没有一个答案，因为结果可能会随着您使用的编译器和标准库而变化。例如，我将您的不同尝试放在一起，只需要一点测试/时序线束。然后，为了好玩，我添加了第四次尝试（下面代码中的test3）：

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <time.h>
#include <fstream>
#include <sstream>
#include <string.h>

static const int limit = 1000000;

void test1() {
    std::ofstream test("test.xml");
    for (int i = 0; i < limit; i++)
    {
        test << "<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
    }
    test.close();
}

void test11() {
    std::ofstream test("test.xml");
    std::string fileDataStr;
    for (int i = 0; i < limit; i++)
    {
        fileDataStr += "<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
    }
    test << fileDataStr;
    test.close();

}
void test2() {
    std::ofstream test("test.xml");
    std::stringstream fileDataStr;
    for (int i = 0; i < limit; i++)
    {
        fileDataStr << "<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
    }
    test << fileDataStr.str();
    test.close();
}

void test3() {
    std::ofstream test("test.xml");
    std::vector<char> buffer;
    char line [] = "<p> attr1=\"test1\" attr2=\"test2\" attr3=\"test3\" attr4=\"test4\">test5</p>\n";
    size_t len = strlen(line);

    buffer.reserve(limit * len + 1);

    for (int i = 0; i < limit; i++)
        std::copy(line, line + len, std::back_inserter(buffer));

    test.write(&buffer[0], buffer.size());
    test.close();
}

template <class T>
void timer(T f) {
    clock_t start = clock();
    f();
    clock_t stop = clock();
    std::cout << double(stop - start) / CLOCKS_PER_SEC << " seconds\n";
}

int main() {
    timer(test1);
    timer(test11);
    timer(test2);
    timer(test3);
}

然后我用VC ++编译它，得到以下结果：

0.681 seconds
0.659 seconds
0.874 seconds
0.955 seconds

然后，我用g ++编译，得到了这些结果：

1.267 seconds
0.725 seconds
0.795 seconds
0.649 seconds

第四个版本（我添加的版本）使用VC ++表现最差，但使用g ++表现最佳。 VC ++中速度最快的那个（到目前为止）是g ++中最慢的。

你问为什么X是真的。不幸的是，X根本不是真的。

我们可能不得不对您使用的完全编译器和标准库进行非常详细的分析，以提供真正意义重大的答案。

逐行写入文件并一次写入整个文本

3 个答案: