Question

我有一个由句点（w / o空格）组成的字符串，用句点分隔。每个令牌（一段时间后）可以以[a-zA-Z_]或[（以]）或$结尾（以{{1}结尾）开头}）。

示例：

House.Car。[0] .Flower
House.Car。$ $东西
House.Car2。$ 4.45 $。[0]
House.Car2。$ $ abc.def [0]

所以我需要按句点分割字符串，但在最后两个例子中我不想分割$（或4.45）。 <{1}}包围的任何内容都不应该被拆分。

对于最后两个例子，我只想要一个像这样的数组：

楼
CAR2
$ 4.45 $ //已修复，谢谢Sabuj Hassan
[0]

或

楼
CAR2
$ $ abc.def
[0]

我试过使用正则表达式，但我完全错了。

我刚收到通知，在结束abc.def之后，可以成为$和$所包围的另一个字符串，它可以再次包含我不应该包含的点分裂：

<

我需要得到它：

>
House.Car.$abc.def$<ghi.jk>.[0].bla
House
Car
$abc.def$<ghi.jk>

感谢您的帮助。

Answer 1

最好通过＆＃34; walk＆＃34;收集结果。要与.find()匹配的字符串：

// Note the alternation
private static final Pattern PATTERN 
    = Pattern.compile("\\$[^.$]+(\\.[^.$]+)*\\$|[^.]+");

//

public List<String> matchesForInput(final String input)
{
    final Matcher m = PATTERN.matcher(input);
    final List<String> matches = new ArrayList<>();

    while (m.find())
        matches.add(m.group());

    return matches;
}

Answer 2

我相信使用Pattern / Matcher会更容易。原始正则表达式：

\$[^$]+\$|\[[^\]]+\]|[^.]+

在代码中：

String s = "House.Car2.$4.45$.[0]";
Pattern pattern = Pattern.compile("\\$[^$]+\\$|\\[[^\\]]+\\]|[^.]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
   System.out.println(matcher.group());
}

输出：

House
Car2
$4.45$
[0]

ideonde demo

Answer 3

如果不使用正则表达式是一个选项，那么您可以编写自己的解析器，它将对字符串中的所有字符重复一次，检查字符是否在 $...$ 内，{{1 }或[...]。

当你找到非<...>时，你只需要将它添加到你正在构建的令牌中，就像任何普通角色一样，
当您找到.时，它就在前面提到的“区域”内。
但如果您发现.并且您在这些区域之外，则需要拆分它，这意味着将当前构建令牌添加到结果并清除它以获取下一个令牌。

这样的解析器看起来像这样

你可以像

一样使用它

public static List<String> parse(String input){
    //list which will hold retuned tokens
    List<String> tokens = new ArrayList<>();

    // flags representing if currently tested character is inside some of
    // special areas 
    // (at start we are outside of these areas so hey are set to false)
    boolean insideDolar = false;          // $...$
    boolean insideSquareBrackets = false; // [...]
    boolean insideAgleBrackets =false;    // <...>

    // we need some buffer to build tokens, StringBuilder is excellent here
    StringBuilder sb = new StringBuilder();

    // now lets iterate over all characters and decide if we need to add them
    // to token or just add token to result list
    for (char ch : input.toCharArray()){

    // lets update in which area are we
        // finding $ means that we either start or end `$...$` area so 
        // simple negation of flag is enough to update its status
        if (ch == '$') insideDolar = !insideDolar; 
        //updating rest of flags seems pretty obvious 
        else if (ch == '[') insideSquareBrackets = true;
        else if (ch == ']') insideSquareBrackets = false;
        else if (ch == '<') insideAgleBrackets = true;
        else if (ch == '>') insideAgleBrackets = false;

        // So now we know in which area we are, so lets handle special cases
        // if we are handling no dot
        // OR we are handling dot but we are inside either of areas we need 
        // to just add it to token (append it to StringBuilder)
        if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
            sb.append(ch);
        }else{// other case means that we are handling dot outside of special 
              // areas where dots are not separators, so now they represents place 
              // to split which means that we don't add it to token, but
              // add value from buffer (current token) to results and reset buffer
              // for next token
            tokens.add(sb.toString());
            sb.delete(0, sb.length());
        }
    }
    // also since we only add value held in buffer to list of tokens when we 
    // find dot on which we split, there is high chance that we will not add 
    // last token to result, because there is no dot after it, so we need to 
    // do it manually after iterating over all characters 
    if (sb.length()>0)//non empty token needs to be added to result
        tokens.add(sb.toString());

    return tokens;
}

输出：

String  input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
    System.out.println(s);

按句点拆分字符串，但字符串包含浮点数

3 个答案: