Question

我有一个文本文件（使用XStream创建的XML），长度为63000行（3.5 MB）。我正在尝试使用Buffered reader阅读它：

                BufferedReader br = new BufferedReader(new FileReader(file));
                try {
                    String s = "";
                    String tempString;
                    int i = 0;
                    while ((tempString = br.readLine()) != null) {
                        s = s.concat(tempString);
//                        s=s+tempString;
                        i = i + 1;
                        if (i % 1000 == 0) {
                            System.out.println(Integer.toString(i));
                        }
                    }
                    br.close();

在这里，您可以看到我尝试测量阅读速度。它非常低。在10000行之后读取1000行需要几秒钟。我显然做错了什么，但无法理解什么。在此先感谢您的帮助。

Answer 1

@PaulGrime是对的。每次循环读取一行时，您都在复制字符串。一旦字符串变大（比如10,000行大），那么复制就做了很多工作。

试试这个：

StringBuilder sb = new StringBuilder();
while (...reading lines..){ 
   ....
   sb.append(tempString);  //should add newline
   ...
}

s = sb.toString();

注意：请阅读保罗在下面的答案，了解为什么剥离换行符会使读取文件成为一种糟糕的方式。另外，正如问题评论中所提到的，XStream提供了一种读取文件的方法，即使它没有，IOUtils.toString（reader）也是一种更安全的读取文件的方法。

Answer 2

您可以立即做出一些改进：

使用StringBuilder代替concat和+。使用+和concat可以真正影响性能，特别是在循环中使用时。
减少对磁盘的访问。您可以使用large buffer：
来完成此操作
BufferedReader br = new BufferedReader（new FileReader（“someFile.txt”），SIZE）;

Answer 3

你应该使用StringBuilder作为String串联极慢，即使是小字符串。

此外，请尝试使用NIO而不是BufferedReader。

public static void main(String[] args) throws IOException {
    final File file = //some file
    try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
        final StringBuilder stringBuilder = new StringBuilder();
        final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
        final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
        while (fileChannel.read(byteBuffer) > 0) {
            byteBuffer.flip();
            stringBuilder.append(charsetDecoder.decode(byteBuffer));
            byteBuffer.clear();
        }
    }
}

如果缓冲区大小仍然太慢，您可以调整缓冲区大小 - 系统依赖于缓冲区大小更好的工作。对我来说，如果缓冲区是1K或4K，那么差别很小，但在其他系统上，我知道改变速度会增加一个数量级。

Answer 4

除了已经说过的内容之外，根据您对XML的使用，您的代码可能不正确，因为它会丢弃行结尾。例如，此代码：

package temp.stackoverflow.q15849706;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

import com.thoughtworks.xstream.XStream;

public class ReadXmlLines {
    public String read1(BufferedReader br) throws IOException {
        try {
            String s = "";
            String tempString;
            int i = 0;
            while ((tempString = br.readLine()) != null) {
                s = s.concat(tempString);
                // s=s+tempString;
                i = i + 1;
                if (i % 1000 == 0) {
                    System.out.println(Integer.toString(i));
                }
            }
            return s;
        } finally {
            br.close();
        }
    }

    public static void main(String[] args) throws IOException {
        ReadXmlLines r = new ReadXmlLines();

        URL url = ReadXmlLines.class.getResource("xml.xml");
        String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
                .openStream())));

        Object ob = null;

        XStream xs = new XStream();
        xs.alias("root", Root.class);

        // This is incorrectly read/parsed, as the line endings are not
        // preserved.
        System.out.println("----------1");
        System.out.println(xmlStr);
        ob = xs.fromXML(xmlStr);
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the URL directly
        ob = xs.fromXML(url);
        System.out.println("----------2");
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the InputStream
        // directly
        ob = xs.fromXML(url.openStream());
        System.out.println("----------3");
        System.out.println(ob);
    }

    public static class Root {
        public String script;

        public String toString() {
            return script;
        }
    }
}

和类路径上的这个xml.xml文件（与类在同一个包中）：

<root>
    <script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
]]>
    </script>
</root>

产生以下输出。前两行显示行结尾已被删除，因此使CDATA部分中的Javascript无效（因为第一个JS注释现在注释掉整个JS，因为JS行已合并）。

----------1
<root>    <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}]]>    </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}    
----------2


// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
...

Java读取长文本文件非常慢

4 个答案: