Question

我们有一个令牌化器，用于对文本文件进行标记。接下来的逻辑非常奇怪，但在我们的上下文中是必需的。

电子邮件，例如 xyz.zyx@gmail.com

将导致以下令牌： xyz . zyx @ gmail

我想知道如果允许我们仅使用这些令牌，我们如何将该字段识别为电子邮件。不允许正则表达式。我们只允许使用令牌及其周围的令牌来确定该字段是否为电子邮件字段

Answer 1

好吧..试试这样的一些（坏）逻辑...

  int i=0,j=0;
    if(str.contains(".") && str.contains("@"))
    {
     if((i=str.indexOf(".") < (j=str.indexOf("@")) 
    {
     if(i!=0 && i+1!=j)      //ignore Strings like .@ , abc.@ 
        return true;
    }
    }
    return false

Answer 2

将电子邮件地址逻辑拆分为3个部分：

用户名（或资源名称），对于此说明，我们将其称为用户名
@ character。
主机名，由任意数量的“字点”序列+最终的顶级域名字符串组成。

像这样散步：

 while token can be part of a user name
    fetch next token;
    if there no more -> no e-mail;

check if the next token is @
if not -> no e-mail

while there are tokens
    while token can be part of a host name subpart (the "word" above)
        fetch next token;
        if there are no more -> might be a valid e-mail address

    check if the next token is a dot
    if not -> might be a valid e-mail address
    set a flag that you found at least one dot

   check if the next token can be part of a host name subpart
       if not -> no valid e-mail address (or maybe you ignore a trailing dot and take what was found so far)

如果需要更多令牌，请添加进一步检查。您还可能必须对找到的令牌进行过帐以确保有效的电子邮件地址，并且您可能必须倒回令牌器（或缓存提取的令牌），以防您找不到有效的电子邮件地址并需要提供与其他一些认可过程相同的输入。

Answer 3

检查令牌列表是否为电子邮件：

列表只包含一个令牌@
令牌@的索引！= 0
@
在.之后至少有1 @个令牌，但不会在
以字符标记开头和结尾

额外检查：

没有两个.后续令牌
没有特殊字符
@之后的字符标记长度至少为2
@之前所有字符标记的总长度至少为3

识别电子邮件字段而不使用正则表达式

3 个答案: