Question

我一直在寻找一种简单的方法来检查类型std::string是否只包含字母数字字符。我的代码如下所示：

   std::string sStr("This is a test");
   for (std::string::const_iterator s = sStr.begin(); s != sStr.end(); ++s)
              if (! isalnum(*s)) return false;

这是正确的做法吗？有没有更有效的方法来处理这个问题？

Answer 1

我会使用std::find_if：

std::string sStr("This is a test");

auto it = std::find_if(std::begin(sStr), std::end(sStr), [](unsigned char c) {
    return !std::isalnum(c);
});

return it == std::end(sStr);

std::find_if_not也许会更有意义：

auto it = std::find_if_not(std::begin(sStr), std::end(sStr), [](unsigned char c) {
    return std::isalnum(c);
});

Answer 2

是的，实际上是*。在<algorithm>中找到的循环结构似乎比优化时的原始循环执行略微，至少使用gcc。

乍一看，使用<algorithm>和lambda是一种很好的方法：

bool onlyAlnum(const std::string& str){
    return std::all_of(
        str.begin(), 
        str.end(), 
        [loc = std::locale{}](char c){return std::isalnum(c, loc);});
}

然而，这有其缺点。

区域设置 locale isalnum版本的<cctype>版似乎比我测试时的char版本慢了很多。 cctype版本并没有注意语言环境，但测试单个loc是否它们的字母数字是否适用于UTF-8字符的一小部分：UTF-8是一个可变宽度编码，并测试一个多字符字符的一部分将导致对字母数字测试的错误否定。

上面的lambda是一个c ++ 14 lambda，它初始化变量[](char c){return std::isalnum(c, std::locale{});}以在首次创建时保存语言环境。这允许函数依赖于当前语言环境而工作，同时还防止在每次评估谓词时构造新对象以代表语言环境的成本，就像使用lambda一样：

<locale>

然而，相比之下，它仍然是非常慢速测试。如果我们不需要std::isalnum <cctype> [](char c){return std::isalnum(c);}版本的有限优势，我们可以使用（更快）bool onlyAlnumAllOf(const std::string& str){ return std::all_of( str.begin(), str.end(), [](char c){return std::isalnum(c);}); }版本：

str.begin()

所以我们采用第二种方式来实现这一点，而不是使用cctype版本。测试显示这个速度要快得多，与您给出的原始循环相同：

str.end()

all_of测试条件是否对一系列输入迭代器中的每个条目有效。范围由前两个参数提供，此处为onlyAlNum和g++ main.cpp -Wall -Wextra -Wpedantic --std=c++14 -o3 all_of (with locale information): 652ms for 1000000 iterations all_of: 63ms for 1000000 iterations find_if: 63ms for 1000000 iterations loop: 70ms for 1000000 iterations range-loop: 69ms for 1000000 iterations，它们自然地定义了字符串的开头和结尾。

和demo on coliru表示g++ main.cpp -Wall -Wextra -Wpedantic --std=c++14 -o3 all_of (with locale information): 1404ms for 1000000 iterations all_of: 101ms for 1000000 iterations find_if: 110ms for 1000000 iterations loop: 108ms for 1000000 iterations range-loop: 119ms for 1000000 iterations将对包含字母或数字中的字符的任何字符串返回true，但不包含任何空格。

最后，您可以测试差异。通过粗略的测试评估＆＃34; oyn3478qo47nqooina7o8oao7nroOL＆＃34; 1000000次，结果如下：

我的机器上的gcc 5.2.0的MinGW-64端口

clang++ -std=c++14 -O3 -Wall -Wextra -Wpedantic main.cpp

all_of (with locale information): 1127ms for 1000000 iterations
all_of: 85ms for 1000000 iterations
find_if: 72ms for 1000000 iterations
loop: 128ms for 1000000 iterations
range-loop: 88ms for 1000000 iterations

和coliru与gcc 6.1.0：

using StrEvaluator = bool (*)(const std::string&);
using Milliseconds = std::chrono::milliseconds;

void testStrEvaluator(StrEvaluator eval, std::string str){
    auto begin = std::chrono::steady_clock::now();
    bool result = true;
    for(unsigned int i = 0; i < 1000000; ++i){
        str.resize(str.size());
        result &= eval(str);
    }
    auto end = std::chrono::steady_clock::now();
    std::cout
        << std::chrono::duration_cast<Milliseconds>(end - begin).count()
        << "ms for 1000000 iterations\n";
}

coliru上的

和clang 3.8.0：

[gerrit]
    basePath = git
    canonicalWebUrl = http://my-host:8090/
[database]
    type = postgresql
    hostname = db-host
    database = reviewdb
    username = gerrit2
[index]
    type = LUCENE
[auth]
    type = HTTP
[receive]
    enableSignedPush = false
[sendemail]
    smtpServer = localhost
[container]
    user = gerrit2
    javaHome = /usr/lib/jvm/java-7-openjdk-amd64/jre
[sshd]
    listenAddress = *:29418
[httpd]
    listenUrl = proxy-http://my-host:9090/
[cache]
    directory = cache

如您所见，它因编译器和版本而异，功能最快。优化不是很有趣吗？

这是我用来测试每种方法的函数：

<VirtualHost *:9090>
  ServerName my-host

  ProxyRequests Off
  ProxyVia Off
  ProxyPreserveHost On

  <Proxy *>
    Order deny,allow
    Allow from all
  </Proxy>

  <Location /login/>
    AuthType Digest
    AuthName "bloodhound"
    AuthDigestDomain /bloodhound
    AuthUserFile /opt/bloodhound/environments/main/bloodhound.htdigest
    Require valid-user
  </Location>

  AllowEncodedSlashes On
  ProxyPass / http://my-host:8090/

</VirtualHost>

测试存在缺陷：coliru在执行过程中没有提供有关一致资源的保证，而且我没有关闭计算机上的其他程序，因此这些变化可能是侥幸。然而，它们似乎足够一致，可以从中得出一些结论：算法和原始循环的循环结构都可以很好地执行，并且基于速度选择它们（除非你发现循环是一个瓶颈）更多微观优化比什么都重要。

Answer 3

您可以通过几种表面上不同的方式实现这一目标，但有效的解决方案基本上仍然是您已经编写的内容。任何解决方案都必须逐个检查字符，直到找到所有字符为字母数字，或者找到非字母数字。

实际上，那不是相当是真的。如果你的字符串非常长，你可能会受益于并行性。但我怀疑情况并非如此。

就样式而言，我建议使用范围（如果你有C ++ 11）;否则，我会写你所拥有的东西。

更好的方法来确保字符串只包含字母数字字符？

3 个答案: