从rtf string java中提取字符串内容

时间:2018-01-29 01:23:40

标签: java regex rtf

我跟随rtf字符串:\af31507 \ltrch\fcs0 \insrsid6361256 Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}}{\rtlch\fcs1 \af31507 \ltrch\fcs0 \insrsid12283827并且我想提取研究标题的内容,即(Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000})。以下是我的代码

String[] arr = value.split("\\s+");
//System.out.println(arr.length);
for(int j=0; j<arr.length; j++) {
    if(isNumeric(arr[j])) {
         arr[j] = "\\?" + arr[j];
    }
}

在上面的代码中,我按空格分割字符串并迭代数组以检查字符串中是否有任何数字,但isNumeric函数无法处理8000\u8805之后,因为它将内容设为8000}}{\rtlch\fcs1。我不确定如何使用正则表达式搜索研究标题及其内容?

1 个答案:

答案 0 :(得分:2)

Study Title: {[^}]*}符合您的预期。演示:https://regex101.com/r/FZl1WL/1

    String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("Study Title: \\{[^}]*\\}");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

输出:

Study Title: {Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000}

根据OP问题更新

String s = "{\\af31507 \\ltrch\\fcs0 \\insrsid6361256 Study Title: {Test for 14431 process\\'27s \\u8805 1000 Testing2 14432 \\u8805 8000}}{\\rtlch\\fcs1 \\af31507 \\ltrch\\fcs0 \\insrsid12283827";
    Pattern p = Pattern.compile("(?<=Study Title: \\{)[^}]*(?=\\})");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group());
    }

Test for 14431 process\'27s \u8805 1000 Testing2 14432 \u8805 8000