我有一个由句点(w / o空格)组成的字符串,用句点分隔。每个令牌(一段时间后)可以以[a-zA-Z_]
或[
(以]
)或$
结尾(以{{1}结尾)开头})。
示例:
所以我需要按句点分割字符串,但在最后两个例子中我不想分割$
(或4.45
)。 <{1}}包围的任何内容都不应该被拆分。
对于最后两个例子,我只想要一个像这样的数组:
或
我试过使用正则表达式,但我完全错了。
我刚收到通知,在结束abc.def
之后,可以成为$
和$
所包围的另一个字符串,它可以再次包含我不应该包含的点分裂:
<
我需要得到它:
>
House.Car.$abc.def$<ghi.jk>.[0].bla
House
Car
$abc.def$<ghi.jk>
感谢您的帮助。
答案 0 :(得分:2)
最好通过&#34; walk&#34;收集结果。要与.find()
匹配的字符串:
// Note the alternation
private static final Pattern PATTERN
= Pattern.compile("\\$[^.$]+(\\.[^.$]+)*\\$|[^.]+");
//
public List<String> matchesForInput(final String input)
{
final Matcher m = PATTERN.matcher(input);
final List<String> matches = new ArrayList<>();
while (m.find())
matches.add(m.group());
return matches;
}
答案 1 :(得分:1)
我相信使用Pattern / Matcher会更容易。原始正则表达式:
\$[^$]+\$|\[[^\]]+\]|[^.]+
在代码中:
String s = "House.Car2.$4.45$.[0]";
Pattern pattern = Pattern.compile("\\$[^$]+\\$|\\[[^\\]]+\\]|[^.]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group());
}
输出:
House
Car2
$4.45$
[0]
答案 2 :(得分:1)
如果不使用正则表达式是一个选项,那么您可以编写自己的解析器,它将对字符串中的所有字符重复一次,检查字符是否在$...$
内,{{1 }或[...]
。
<...>
时,你只需要将它添加到你正在构建的令牌中,就像任何普通角色一样,.
时,它就在前面提到的“区域”内。.
并且您在这些区域之外,则需要拆分它,这意味着将当前构建令牌添加到结果并清除它以获取下一个令牌。这样的解析器看起来像这样
.
你可以像
一样使用它public static List<String> parse(String input){
//list which will hold retuned tokens
List<String> tokens = new ArrayList<>();
// flags representing if currently tested character is inside some of
// special areas
// (at start we are outside of these areas so hey are set to false)
boolean insideDolar = false; // $...$
boolean insideSquareBrackets = false; // [...]
boolean insideAgleBrackets =false; // <...>
// we need some buffer to build tokens, StringBuilder is excellent here
StringBuilder sb = new StringBuilder();
// now lets iterate over all characters and decide if we need to add them
// to token or just add token to result list
for (char ch : input.toCharArray()){
// lets update in which area are we
// finding $ means that we either start or end `$...$` area so
// simple negation of flag is enough to update its status
if (ch == '$') insideDolar = !insideDolar;
//updating rest of flags seems pretty obvious
else if (ch == '[') insideSquareBrackets = true;
else if (ch == ']') insideSquareBrackets = false;
else if (ch == '<') insideAgleBrackets = true;
else if (ch == '>') insideAgleBrackets = false;
// So now we know in which area we are, so lets handle special cases
// if we are handling no dot
// OR we are handling dot but we are inside either of areas we need
// to just add it to token (append it to StringBuilder)
if (ch != '.' || insideAgleBrackets|| insideDolar || insideSquareBrackets ){
sb.append(ch);
}else{// other case means that we are handling dot outside of special
// areas where dots are not separators, so now they represents place
// to split which means that we don't add it to token, but
// add value from buffer (current token) to results and reset buffer
// for next token
tokens.add(sb.toString());
sb.delete(0, sb.length());
}
}
// also since we only add value held in buffer to list of tokens when we
// find dot on which we split, there is high chance that we will not add
// last token to result, because there is no dot after it, so we need to
// do it manually after iterating over all characters
if (sb.length()>0)//non empty token needs to be added to result
tokens.add(sb.toString());
return tokens;
}
输出:
String input = "House.Car2.$abc.def$<ghi.jk>.[0]";
for (String s: parse(input))
System.out.println(s);