从文件中删除注释并保留整数

时间:2016-09-07 15:25:53

标签: c++ vector

我正在尝试从.txt文件中删除评论。我的文本文件如下所示:

(* Sunspot data collected by Robin McQuinn from *)
(* http://sidc.oma.be/html/sunspot.html         *)

(* Month: 1749 01 *) 58
(* Month: 1749 02 *) 63
(* Month: 1749 03 *) 70
(* Month: 1749 04 *) 56

评论是(*和*)之间的所有内容。我只需保留此文件中的58,63,70和56.

我的代码删除了一些字符,但不正确。我的代码如下所示:

#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <fstream>
#include <string>
#include <cctype>
#include <numeric>
#include <iomanip>

using namespace std;

int main() {

    int digit = 1;
    string filename;
    //cout for getting user path
    //the compiler parses string literals differently so use a double backslash or a forward slash
    cout << "Enter the path of the data file, be sure to include extension." << endl;
    cout << "You can use either of the following:" << endl;
    cout << "A forwardslash or double backslash to separate each directory." << endl;
    getline(cin, filename);

    //gets file
    ifstream infile{filename};
    istream_iterator<char> infile_begin{ infile };
    istream_iterator<char> eof{};
    vector<char> file{ infile_begin, eof };

    for(int i =0; i < file.size(); i++){
    if(!isdigit(file[i])) {
        if(file[i] != ')') {
            file.erase(file.begin(),file.begin()+i);
        }
    }
    }
    copy(begin(file), end(file), ostream_iterator<char>(cout, " "));
    }

我不应该使用vector.erase()吗?我知道这段代码不对。如果是这种情况,什么是更好的解决方案?我知道在C中你可以将它写入内存并转到每个位置,这会是更好的方法吗?

3 个答案:

答案 0 :(得分:4)

我首先将所有内容保存为字符串,准备字符串然后然后将结果安全地推送回向量。 现在我使用std :: regex来过滤你的文件。但这并不是最容易的。

#include <iostream>
#include <string>
#include <regex>
#include <fstream>

int main(){

    std::string file_name;
    std::cout << "Enter name/path of the txt file: ";
    std::getline(std::cin, file_name);
    std::ifstream file(file_name);

    std::vector<int> vec; //here save integers

    std::string text; //save current line here


    std::smatch match; //here the found "comment" get's saved, later to be removed from text

    std::regex remove("[\(\*]\.*[\*\)] *"); //the expression to search for
    //translation
    //     _[\(\*]   -> (*
    //     _\.*      -> any number of characters
    //     _[\*\)]   -> *)
    //     _ *       -> any number of whitespaces (important to cast to integer)..



    while (std::getline(file, text)){ //loop through all lines in file.txt

        if (std::regex_search(text, match, remove)){ //if a comment was found
            text.erase(text.begin(), text.begin() + match[0].length()); //remove the comment
        }

        if (!text.empty()) { //empty, line was a pure comment
            vec.push_back(std::stoi(text)); //else add integer to list
        }
    }


    std::cout << "The file contains:" << std::endl;
    for (int i = 0; i < vec.size(); i++){
        std::cout << vec.at(i) << std::endl;
    }

    return 0;
}

输出中:

Enter name/path of the txt file: file.txt
The file contains:
58
63
70
56

当然,使用std::stoi只有在整数之后没有字符时才有效。嗯,这只是一个想法,当然是高度可修改的。

答案 1 :(得分:2)

嗯,正如你所注意到的那样,逻辑是错误的。 如果当前字符不是数字,也不是),则从头开始删除字符。

您可能要删除评论,那么为什么不搜索开始(*并结束*)并删除其中的所有内容?

std::vector<std::string> fileContent;
std::string line;
while (std::getline(infile, line))
{
    //Find starting character sequence
    auto begin = line.find("(*");
    if (begin != std::string::npos)
    {
        //Find matching ending sequence, it's not a comment otherwise
        auto end = line.find("*)", begin);
        if (end != std::string::npos)
            line.erase(line.begin() + begin, line.begin() + end + 2);
    }

    fileContent.push_back(line);
}

答案 2 :(得分:0)

您可以使用std::getline读取结束')'字符,然后您知道下一次阅读将是您的号码:

int main()
{
    std::ifstream ifs("test.txt");

    std::string line;
    while(std::getline(ifs, line)) // line by line
    {
        std::string skip;
        int value;

        // skip data upto and past ')', then read number
        if(std::getline(std::istringstream(line), skip, ')') >> value)
            std::cout << "found: " << value << '\n';
    }
}

<强>输出:

found: 58
found: 63
found: 70
found: 56