如何在C ++中快速查找和子串字符串中的多字符项?

时间:2017-12-11 12:44:05

标签: c++ string substring

我对C ++很陌生,我正在努力解决以下问题:
我正在从iptables解析syslog消息。每条消息都如下: 11 15:20:36 SRC= DST= LEN=250
如果它是一个简单的程序,我会使用std::find来查找STR子字符串的索引,然后在循环中将每个下一个字符添加到数组中,直到遇到空格。然后我会对DSTLEN执行相同的操作 例如,

std::string x = "15:30:20 SRC= DST= LEN=255";
std::string substr;

std::cout << "Original string: \"" << x << "\"" << std::endl;

// Below "magic number" 4 means length of "SRC=" string 
// which is the same for "DST=" and "LEN="    

// For SRC
auto npos = x.find("SRC");
if (npos != std::string::npos) {
    substr = x.substr(npos + 4, x.find(" ", npos) - (npos+4));
    std::cout << "SRC: " << substr << std::endl;

// For DST
npos = x.find("DST");
if (npos != std::string::npos) {
    substr = x.substr(npos + 4, x.find(" ", npos) - (npos + 4));
    std::cout << "DST: " << substr << std::endl;

// For LEN
npos = x.find("LEN");
if (npos != std::string::npos) {
    substr = x.substr(npos + 4, x.find('\0', npos) - (npos + 4));
    std::cout << "LEN: " << substr << std::endl;

然而,在我的情况下,我需要非常快速地完成这项工作,理想情况是在一次迭代中 你能就此给我一些建议吗?

3 个答案:

答案 0 :(得分:1)

“快速,理想情况下在一次迭代中” - 实际上,程序的速度并不取决于源代码中可见的循环次数。特别是正则表达式是一种隐藏多个嵌套循环的好方法。

你的解决方案实际上非常好。在找到“SRC”之前不会浪费太多时间,并且不会搜索超过检索IP地址所需的时间。当然,当搜索“SRC”时,它在“Sep”的第一个“S”上有误报,但这可以通过下一次比较来解决。如果您确定第一次出现的“SRC”位于第20列中的某个位置,则可以通过跳过前20个字符来节省一点点速度。 (检查你的日志,我不知道)

答案 1 :(得分:1)


std::string x = "15:30:20 SRC= DST= LEN=255";

std::regex const r(R"(SRC=(\S+) DST=(\S+) LEN=(\S+))");
std::smatch matches;
if(regex_search(x, matches, r)) {
    std::cout << "SRC " << matches.str(1) << '\n';
    std::cout << "DST " << matches.str(2) << '\n';
    std::cout << "LEN " << matches.str(3) << '\n';


答案 2 :(得分:1)


// buf_ptr will be updated to point to the first character after the " SRC=x.x.x.x" sequence
unsigned long GetSRC(const char*& buf_ptr)
    // Don't search like this unless you have a trusted input format that's guaranteed to contain " SRC="!!!
    while (*buf_ptr != ' ' ||
        *(buf_ptr + 1) != 'S' ||
        *(buf_ptr + 2) != 'R' ||
        *(buf_ptr + 3) != 'C' ||
        *(buf_ptr + 4) != '=') 
    buf_ptr += 5;
    char* next;

    long part = std::strtol(buf_ptr, &next, 10);
    // part is now the first number of the IP. Depending on your requirements you may want to extract the string instead
    unsigned long result = (unsigned long)part << 24;

    // Don't use 'next + 1' like this unless you have a trusted input format!!!
    part = std::strtol(next + 1, &next, 10);
    // part is now the second number of the IP. Depending on your requirements ...
    result |= (unsigned long)part << 16;

    part = std::strtol(next + 1, &next, 10);
    // part is now the third number of the IP. Depending on your requirements ...
    result |= (unsigned long)part << 8;

    part = std::strtol(next + 1, &next, 10);
    // part is now the fourth number of the IP. Depending on your requirements ...
    result |= (unsigned long)part;

    // update the buf_ptr so searching for the next information ( DST=x.x.x.x) starts at the end of the currently parsed parts
    buf_ptr = next;
    return result;


const char* x_str = x.c_str();
unsigned long srcIP = GetSRC(x_str);
// now x_str will point to " DST= LEN=255" for further processing

std::cout << "SRC=" << (srcIP >> 24) << "." << ((srcIP >> 16) & 0xff) << "." << ((srcIP >> 8) & 0xff) << "." << (srcIP & 0xff) << std::endl;



当然,我认为你的std::cout << ...行只是用于开发测试,因为否则所有的微优化都会变得毫无用处。
