读取文本文件时,GC Overhead limit超出错误

时间:2010-10-19 19:52:30

标签: java garbage-collection out-of-memory

我收到java.lang.OutOfMemoryError:从文本文件读取时GC开销限制超出错误。我不知道出了什么问题。我正在一个有足够内存的集群上运行我的程序。外部循环迭代为16000次,对于外循环的每次迭代,内循环迭代大约300,000次。当代码尝试从内循环读取一行时抛出错误。任何建议都将得到很好的理解。以下是我的代码片段:

//Read from the test data output file till not equals null
//Reads a single line at a time from the test data
while((line=br.readLine())!=null)
{
    //Clears the hashmap
    leastFive.clear();

    //Clears the arraylist
    fiveTrainURLs.clear();
    try
    {
        StringTokenizer st=new StringTokenizer(line," ");
        while(st.hasMoreTokens())
        {
            String currentToken=st.nextToken();

            if(currentToken.contains("File"))
            {
                testDataFileNo=st.nextToken();
                String tok="";
                while((tok=st.nextToken())!=null)
                {
                    if (tok==null) break;

                    int topic_no=Integer.parseInt(tok);
                    topic_no=Integer.parseInt(tok);
                    String prob=st.nextToken();

                    //Obtains the double value of the probability
                    double double_prob=Double.parseDouble(prob);
                    p1[topic_no]=double_prob;

                }
                break;
            }
        }
    }
    catch(Exception e)
    {
    }

    //Used to read over all the training data file
    FileReader fr1=new FileReader("/homes/output_train_2000.txt");

    BufferedReader br1=new BufferedReader(fr1);
    String line1="";

    //Reads the training data output file,one row at a time
    //This is the line on which an exception occurs!
    while((line1=br1.readLine())!=null)
    {
        try
        {
            StringTokenizer st=new StringTokenizer(line1," ");

            while(st.hasMoreTokens())
            {
                String currentToken=st.nextToken();

                if(currentToken.contains("File"))
                {
                    trainDataFileNo=st.nextToken();
                    String tok="";
                    while((tok=st.nextToken())!=null)
                    {
                        if(tok==null)
                            break;

                        int topic_no=Integer.parseInt(tok);
                        topic_no=Integer.parseInt(tok);
                        String prob=st.nextToken();

                        double double_prob=Double.parseDouble(prob);

                        //p2 will contain the probability values of each of the topics based on the indices
                        p2[topic_no]=double_prob;

                    }
                    break;
                }
            }
        }
        catch(Exception e)
        {
            double result=klDivergence(p1,p2);

            leastFive.put(trainDataFileNo,result);
        }
    }
}

1 个答案:

答案 0 :(得分:3)

16000 * 300000 = 4.8亿。如果每个令牌只占用6个字节,那么它本身就超过24GB。当垃圾收集器最终以24GB开始进入gc时,垃圾收集器将运行很长时间。好像你需要把它分解成更小的块。你可以将你的应用程序内存限制在1GB这样的合理范围内,这样GC就可以更快地启动,并且可以在它完成工作的时候完成任务。