使用indexOf()转义字符

时间:2013-11-20 20:28:48

标签: java escaping character

我正在逐行读取文件并使用line.indexOf('"', 1)substring()将其拆分为较小的字符串 但是这种方式不会检测"之前是否为\,因此它不会对转义char做出反应。我该如何解决这个问题?

(我不能只使用line.split('"') couse "在子字符串的开头和结尾,也不能用其他字符分割,因为我的任务不允许这样做。)

整个阅读部分是:

while ((line = bufferedReader.readLine()) != null) {
        System.out.println(line);
        while(line.length()>0){
            if(line.charAt(0) == ',' || line.charAt(0) == ' '){
                line = line.substring(1);
            }
            else{
                if(line.indexOf(',') != -1){
                    if (line.charAt(0) == '"'){
                    pabaiga = line.indexOf("\"", 1);
                    zodis = line.substring(0, pabaiga+1);
                    line = line.substring(pabaiga+1);
                    duomenys.add(zodis);
                    }
                    else{
                        pabaiga = line.indexOf(',');
                        zodis = line.substring(0, pabaiga);
                        line = line.substring(pabaiga);
                        duomenys.add(zodis);
                    }
                }
                else{
                    zodis = line;
                    line = line.substring(line.length());
                    duomenys.add(zodis);
                }
            }
            for(String elem : duomenys){
            System.out.println(elem);
            }
duomenys.removeAll(duomenys);
        }

我不允许仅拆分分隔符,因为字符串中间可能有一个,在文本文件中使用\不是一个选项。所以我被建议将一个sting元素确定为“text”,但如果它在中间包含另一个“或”,那么我当前的代码不起作用。

如果来自文本文件的行是"start \"title\" end", 10, 20, "text"
sting数组应该包含

  • [0] "start "title" end"
  • [1] 10
  • [2] 20
  • [3] "text"

3 个答案:

答案 0 :(得分:0)

您可以先将动态尺寸组件存储在List中。要用你的标记填充这样的列表,你需要迭代你的句子中的每个字符,如果它不是,里面的引号然后将它添加到tokenBuilder,但如果该逗号是在引号之外,则添加tokenBuilder的当前值到你的tokenList。这是示例代码。

String line = "\"start \\\"title\\\" end\", 10, 20, \"text\"";

List<String> tokens = new ArrayList<>();
StringBuilder tokenBuilder = new StringBuilder();

boolean insideQuote = false;
char ch, prev = ' ';

for (int i = 0; i < line.length(); i++) {
    ch = line.charAt(i);
    if (ch == '"' && prev != '\\') {// normal " (without \ before)
        insideQuote = !insideQuote; // starts or ends quotation
    }
    // commas that are outside quote or last character in line 
    // should invoke adding non-empty builder to list
    if (ch == ',' && !insideQuote || i == line.length() - 1) {
        if (tokenBuilder.length() > 0) {
            tokens.add(tokenBuilder.toString().trim());
            tokenBuilder.delete(0, tokenBuilder.capacity());
        }

    }
    // add every character to builder except \ that are inside
    // quotes and have " after it
    else if (!(ch == '\\' && i + 1 < line.length()
            && line.charAt(i + 1) == '"' && insideQuote)) {
        tokenBuilder.append(ch);
    }
    prev = ch;//in next loop previous character should be our current one
}

String[] array = tokens.toArray(new String[tokens.size()]);

for (String s : array)
    System.out.println(">" + s);

输出:

>"start "title" end"
>10
>20
>"text

答案 1 :(得分:0)

您可以使用此功能(http://ideone.com/TTtlZV上的在线示例):

import java.util.*;
import java.lang.*;
import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
    {
         boolean inQuoted = false;

        List<String> parts = new ArrayList<String>();
        String s = "\"start \\\"title\\\" end\", 10, 20, \"text\"";
        StringBuilder current = new StringBuilder();
        for( int i=0; i<s.length(); i++ ){
            char c = s.charAt(i);
            char cPrev = ( i == 0 ? (char)0 : s.charAt(i-1));

            if( c == '"' && cPrev != '\\' ){
                inQuoted = !inQuoted;
            }

            if( c == ',' && !inQuoted ){
                if( current.length() > 0 ){
                    parts.add(current.toString());
                    current = new StringBuilder();
                }
            }
            else {
                int length = current.length();
                if( length > 1 && c == '"' && current.charAt(length-1) == '\\' ){
                    current.deleteCharAt(length-1);
                }
                current.append(c);
            }
        }
        if( current.length() > 0 ){
            parts.add(current.toString());
        }

        System.out.println(parts);
    }
}

它不会处理双重转义。例如

  

\\“

如果我运行此程序,则输出为:

  

[“start \”title \“end”,10,20,“text”]

答案 2 :(得分:-1)

如果你想要最后一个索引,那么只需使用lastindexof

.lastindexOf("\"", 1)

只需替换

pabaiga = line.indexOf("\"", 1);

pabaiga = line.lastindexOf("\"", 1);