Question

如何将文件读入std::string，即一次读取整个文件？

文本或二进制模式应由调用者指定。该解决方案应符合标准，便携且高效。它不应该不必要地复制字符串的数据，并且应该避免在读取字符串时重新分配内存。

执行此操作的一种方法是统计文件大小，将std::string和fread()的大小调整为std::string的{{1}}'ed const_cast<char*>()。这要求data()的数据是连续的，这是标准不需要的，但似乎是所有已知实现的情况。更糟糕的是，如果在文本模式下读取文件，std::string的大小可能与文件大小不同。

可以使用std::string的{{1}} std::ifstream构建一个完全正确，符合标准且可移植的解决方案，然后从rdbuf()构建一个std::ostringstream。但是，这可能会复制字符串数据和/或不必要地重新分配内存。所有相关的标准库实现是否足够智能以避免所有不必要的开销？还有另一种方法吗？我是否错过了一些已经提供所需功能的隐藏Boost功能？

请显示您的建议如何实施。

std::string

考虑到上面的讨论。

Answer 1

最快（我知道，折扣内存映射文件）：

std::string str(static_cast<std::stringstream const&>(std::stringstream() << in.rdbuf()).str());

这需要字符串流的附加标头<sstream>。（static_cast是必要的，因为operator <<会返回一个普通的ostream&，但我们知道实际上它是stringstream&所以演员阵容是安全的。）

分成多行，将临时文件移动到一个变量中，我们得到一个更易读的代码：

std::string slurp(std::ifstream& in) {
    std::stringstream sstr;
    sstr << in.rdbuf();
    return sstr.str();
}

或者，再次在一行中：

std::string slurp(std::ifstream& in) {
    return static_cast<std::stringstream const&>(std::stringstream() << in.rdbuf()).str();
}

Answer 2

有关类似问题，请参阅this answer。

为了您的方便，我正在重新发布CTT的解决方案：

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(bytes.data(), fileSize);

    return string(bytes.data(), fileSize);
}

与Moby Dick（1.3M）的文本相比，此解决方案的执行时间比此处提供的其他答案快20％。对于便携式C ++解决方案来说不错，我希望看到mmap的文件结果;）

Answer 3

最短的变体： Live On Coliru

std::string str(std::istreambuf_iterator<char>{ifs}, {});

它需要标题<iterator>。

有报告称此方法比预先分配字符串并使用std::istream::read要慢。但是，在启用了优化的现代编译器上，似乎不再是这种情况，尽管各种方法的相对性能似乎高度依赖于编译器。

Answer 4

使用

#include <iostream>
#include <sstream>
#include <fstream>

int main()
{
  std::ifstream input("file.txt");
  std::stringstream sstr;

  while(input >> sstr.rdbuf());

  std::cout << sstr.str() << std::endl;
}

或非常接近的东西。我没有打开stdlib引用来仔细检查自己。

是的，我知道我没有按要求编写slurp函数。

Answer 5

我没有足够的声誉来使用tellg()直接评论回复。

请注意，tellg()出错时可返回-1。如果您将tellg()的结果作为分配参数传递，则应该首先检查结果。

问题的一个例子：

...
std::streamsize size = file.tellg();
std::vector<char> buffer(size);
...

在上面的示例中，如果tellg()遇到错误，它将返回-1。有符号（即tellg()）和无符号（即arg到vector<char>构造函数）之间的隐式转换将导致向量错误地分配非常大量的字节。（可能是4294967295字节，或4GB。）

修改paxos1977的答案以解释上述问题：

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    if (fileSize < 0)                             <--- ADDED
        return std::string();                     <--- ADDED

    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

Answer 6

如果您有C ++ 17（std :: filesystem），也有这种方式（通过std::filesystem::file_size而不是seekg和tellg来获取文件的大小）：< / p>

#include <filesystem>
#include <fstream>
#include <string>

namespace fs = std::filesystem;

std::string readFile(fs::path path)
{
    // Open the stream to 'lock' the file.
    std::ifstream f{ path };

    // Obtain the size of the file.
    const auto sz = fs::file_size(path);

    // Create a buffer.
    std::string result(sz, ' ');

    // Read the whole file into the buffer.
    f.read(result.data(), sz);

    return result;
}

注意：如果您的标准库尚未完全支持C ++ 17，则可能需要使用<experimental/filesystem>和std::experimental::filesystem。如果result.data()不支持non-const std::basic_string data，您可能还需要将&result[0]替换为function foo (autoVal, self){ if(something){ return autoVal; }else{ return self.unset(); } } export const myCollection = new Mongo.Collection('myCollection'); const Schema = new SimpleSchema({ my_field:{ type:Boolean, autoValue() { return foo(false, this); } } }); myCollection.attachSchema(Schema);。

Answer 7

永远不要写入std :: string的const char *缓冲区。永远不能！这样做是一个巨大的错误。

在std :: string中为整个字符串保留（）空间，从合理大小的文件中读取块到缓冲区中，并追加（）它。块的大小取决于您的输入文件大小。我很确定所有其他便携式和符合STL的机制都会做同样的事情（但看起来可能更漂亮）。

Answer 8

这样的事情应该不会太糟糕：

void slurp(std::string& data, const std::string& filename, bool is_binary)
{
    std::ios_base::openmode openmode = ios::ate | ios::in;
    if (is_binary)
        openmode |= ios::binary;
    ifstream file(filename.c_str(), openmode);
    data.clear();
    data.reserve(file.tellg());
    file.seekg(0, ios::beg);
    data.append(istreambuf_iterator<char>(file.rdbuf()), 
                istreambuf_iterator<char>());
}

这里的优点是我们首先执行保留，因此我们在阅读内容时不必增长字符串。缺点是我们通过char进行char。更智能的版本可以获取整个读取buf，然后调用下溢。

Answer 9

此解决方案将错误检查添加到基于rdbuf（）的方法。

std::string file_to_string(const std::string& file_name)
{
    std::ifstream file_stream{file_name};

    if (file_stream.fail())
    {
        // Error opening file.
    }

    std::ostringstream str_stream{};
    file_stream >> str_stream.rdbuf();  // NOT str_stream << file_stream.rdbuf()

    if (file_stream.fail() && !file_stream.eof())
    {
        // Error reading file.
    }

    return str_stream.str();
}

我正在添加这个答案，因为在原始方法中添加错误检查并不像您期望的那样简单。原始方法使用stringstream的插入运算符（str_stream << file_stream.rdbuf()）。问题是，当没有插入字符时，这会设置stringstream的failbit。这可能是由于错误造成的，也可能是由于文件为空。如果通过检查failbit来检查故障，则在读取空文件时会遇到误报。如何消除插入任何字符的合法失败以及插入任何字符的“失败”，因为文件是空的？

您可能会认为要显式检查一个空文件，但这是更多的代码和相关的错误检查。

检查失败条件str_stream.fail() && !str_stream.eof()不起作用，因为插入操作不设置eofbit（在ostringstream上也不设置ifstream）。

因此，解决方案是改变操作。使用ifstream的提取运算符（＆gt;＆gt;）代替使用ostringstream的插入运算符（＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;＆lt;然后检查故障条件file_stream.fail() && !file_stream.eof()。

重要的是，当file_stream >> str_stream.rdbuf()遇到合法的失败时，它不应该设置eofbit（根据我对规范的理解）。这意味着上述检查足以检测合法故障。

Answer 10

您可以使用'std :: getline'函数，并将'eof'指定为分隔符。结果代码虽然有点模糊：

std::string data;
std::ifstream in( "test.txt" );
std::getline( in, data, std::string::traits_type::to_char_type( 
                  std::string::traits_type::eof() ) );

Answer 11

这是使用新文件系统库的版本，具有相当强大的错误检查功能：

#include <cstdint>
#include <exception>
#include <filesystem>
#include <fstream>
#include <sstream>
#include <string>

namespace fs = std::filesystem;

std::string loadFile(const char *const name);
std::string loadFile(const std::string &name);

std::string loadFile(const char *const name) {
  fs::path filepath(fs::absolute(fs::path(name)));

  std::uintmax_t fsize;

  if (fs::exists(filepath)) {
    fsize = fs::file_size(filepath);
  } else {
    throw(std::invalid_argument("File not found: " + filepath.string()));
  }

  std::ifstream infile;
  infile.exceptions(std::ifstream::failbit | std::ifstream::badbit);
  try {
    infile.open(filepath.c_str(), std::ios::in | std::ifstream::binary);
  } catch (...) {
    std::throw_with_nested(std::runtime_error("Can't open input file " + filepath.string()));
  }

  std::string fileStr;

  try {
    fileStr.resize(fsize);
  } catch (...) {
    std::stringstream err;
    err << "Can't resize to " << fsize << " bytes";
    std::throw_with_nested(std::runtime_error(err.str()));
  }

  infile.read(fileStr.data(), fsize);
  infile.close();

  return fileStr;
}

std::string loadFile(const std::string &name) { return loadFile(name.c_str()); };

Answer 12

因为这似乎是一种广泛使用的实用程序，所以我的方法是搜索并喜欢现有的库来手工制作解决方案，尤其是在您的项目中已经链接了boost库（链接器标志-lboost_system -lboost_filesystem）的情况下。 Here (and older boost versions too)，boost提供了一个load_string_file实用程序：

#include <iostream>
#include <string>
#include <boost/filesystem/string_file.hpp>

int main() {
    std::string result;
    boost::filesystem::load_string_file("aFileName.xyz", result);
    std::cout << result.size() << std::endl;
}

一个优点是，此函数不查找整个文件来确定大小，而是在内部使用stat（）。但是，作为一个可能忽略不计的缺点，可以很容易地推断出源代码：用'\0'字符不必要地调整了字符串的大小，该字符由文件内容重写。

Answer 13

SELECT *
FROM category cat
  JOIN category_dictionary cat_dic ON cat.id = cat_dic.id
WHERE NOT EXISTS
 (SELECT 1 FROM category cat2
  WHERE cat2.parent = cat.id);

用法：

#include <string>
#include <sstream>

using namespace std;

string GetStreamAsString(const istream& in)
{
    stringstream out;
    out << in.rdbuf();
    return out.str();
}

string GetFileAsString(static string& filePath)
{
    ifstream stream;
    try
    {
        // Set to throw on failure
        stream.exceptions(fstream::failbit | fstream::badbit);
        stream.open(filePath);
    }
    catch (system_error& error)
    {
        cerr << "Failed to open '" << filePath << "'\n" << error.code().message() << endl;
        return "Open fail";
    }

    return GetStreamAsString(stream);
}

Answer 14

基于CTT解决方案的更新功能：

#include <string>
#include <fstream>
#include <limits>
#include <string_view>
std::string readfile(const std::string_view path, bool binaryMode = true)
{
    std::ios::openmode openmode = std::ios::in;
    if(binaryMode)
    {
        openmode |= std::ios::binary;
    }
    std::ifstream ifs(path.data(), openmode);
    ifs.ignore(std::numeric_limits<std::streamsize>::max());
    std::string data(ifs.gcount(), 0);
    ifs.seekg(0);
    ifs.read(data.data(), data.size());
    return data;
}

有两个重要区别：

tellg()不保证返回自文件开头以来的偏移量（以字节为单位）。相反，正如Puzomor Croatia指出的那样，它更多地是可以在fstream调用中使用的令牌。 gcount()但是会返回上次提取的未格式化字节的数量。因此，我们打开文件，使用ignore()提取并丢弃其所有内容以获取文件的大小，然后基于该文件构造输出字符串。

第二，我们避免通过直接写入字符串而将文件数据从std::vector<char>复制到std::string。

就性能而言，这应该是绝对最快的，提前分配适当大小的字符串并调用一次read()。有趣的是，在gcc上使用ignore()和countg()而不是ate和tellg()会逐渐编译为almost the same thing。

Answer 15

#include <iostream>
#include <fstream>
#include <string.h>
using namespace std;
main(){
    fstream file;
    //Open a file
    file.open("test.txt");
    string copy,temp;
    //While loop to store whole document in copy string
    //Temp reads a complete line
    //Loop stops until temp reads the last line of document
    while(getline(file,temp)){
        //add new line text in copy
        copy+=temp;
        //adds a new line
        copy+="\n";
    }
    //Display whole document
    cout<<copy;
    //close the document
    file.close();
}

在C ++中将整个文件读入std :: string的最佳方法是什么？

15 个答案: