按句点拆分字符串,但字符串包含浮点数

时间:2014-04-06 12:52:35

标签: java regex string split

我有一个由句点(w / o空格)组成的字符串,用句点分隔。每个令牌(一段时间后)可以以[a-zA-Z_][(以])或$结尾(以{{1}结尾)开头})。

示例:

  • House.Car。[0] .Flower
  • House.Car。$ $东西
  • House.Car2。$ 4.45 $。[0]
  • House.Car2。$ $ abc.def [0]

所以我需要按句点分割字符串,但在最后两个例子中我不想分割$(或4.45)。 <{1}}包围的任何内容都不应该被拆分。

对于最后两个例子,我只想要一个像这样的数组:

  • CAR2
  • $ 4.45 $ //已修复,谢谢Sabuj Hassan
  • [0]

  • CAR2
  • $ $ abc.def
  • [0]

我试过使用正则表达式,但我完全错了。


我刚收到通知,在结束abc.def之后,可以成为$$所包围的另一个字符串,它可以再次包含我不应该包含的点分裂:

  • <

我需要得到它:

  • >
  • House.Car.$abc.def$<ghi.jk>.[0].bla
  • House
  • Car
  • $abc.def$<ghi.jk>

感谢您的帮助。

3 个答案:

答案 0 :(得分:2)

最好通过&#34; walk&#34;收集结果。要与.find()匹配的字符串:

// Note the alternation
private static final Pattern PATTERN 
    = Pattern.compile("\\$[^.$]+(\\.[^.$]+)*\\$|[^.]+");

//

public List<String> matchesForInput(final String input)
{
    final Matcher m = PATTERN.matcher(input);
    final List<String> matches = new ArrayList<>();

    while (m.find())
        matches.add(m.group());

    return matches;
}

答案 1 :(得分:1)

我相信使用Pattern / Matcher会更容易。原始正则表达式:

\$[^$]+\$|\[[^\]]+\]|[^.]+

在代码中:

String s = "House.Car2.$4.45$.[0]";
Pattern pattern = Pattern.compile("\\$[^$]+\\$|\\[[^\\]]+\\]|[^.]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
   System.out.println(matcher.group());
}

输出:

House
Car2
$4.45$
[0]

ideonde demo

答案 2 :(得分:1)

如果不使用正则表达式是一个选项,那么您可以编写自己的解析器,它将对字符串中的所有字符重复一次,检查字符是否在$...$内,{{1 }或[...]

  • 当你找到非<...>时,你只需要将它添加到你正在构建的令牌中,就像任何普通角色一样,
  • 当您找到.时,它就在前面提到的“区域”内。
  • 但如果您发现.并且您在这些区域之外,则需要拆分它,这意味着将当前构建令牌添加到结果并清除它以获取下一个令牌。

这样的解析器看起来像这样

.

你可以像

一样使用它
public static List<String> parse(String input){
    //list which will hold retuned tokens
    List<String> tokens = new ArrayList<>();

    // flags representing if currently tested character is inside some of
    // special areas 
    // (at start we are outside of these areas so hey are set to false)
    boolean insideDolar = false;          // $...$
    boolean insideSquareBrackets = false; // [...]
    boolean insideAgleBrackets =false;    // <...>

    // we need some buffer to build tokens, StringBuilder is excellent here
    StringBuilder sb = new StringBuilder();

    // now lets iterate over all characters and decide if we need to add them
    // to token or just add token to result list
    for (char ch : input.toCharArray()){

    // lets update in which area are we
        // finding $ means that we either start or end `$...$` area so 
        // simple negation of flag is enough to update its status
        if (ch == '$') insideDolar = !insideDolar; 
        //updating rest of flags seems pretty obvious 
        else if (ch == '[') insideSquareBrackets = true;
        else if (ch == ']') insideSquareBrackets = false;
        else if (ch == '<') insideAgleBrackets = true;
        else if (ch == '>') insideAgleBrackets = false;

        // So now we know in which area we are, so lets handle special cases
        // if we are handling no dot
        // OR we are handling dot but we are inside either of areas we need 
        // to just add it to token (append it to StringBuilder)
        if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
            sb.append(ch);
        }else{// other case means that we are handling dot outside of special 
              // areas where dots are not separators, so now they represents place 
              // to split which means that we don't add it to token, but
              // add value from buffer (current token) to results and reset buffer
              // for next token
            tokens.add(sb.toString());
            sb.delete(0, sb.length());
        }
    }
    // also since we only add value held in buffer to list of tokens when we 
    // find dot on which we split, there is high chance that we will not add 
    // last token to result, because there is no dot after it, so we need to 
    // do it manually after iterating over all characters 
    if (sb.length()>0)//non empty token needs to be added to result
        tokens.add(sb.toString());

    return tokens;
}

输出:

String  input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
    System.out.println(s);