从文件中提取字符串

时间:2011-03-25 07:03:07

标签: java

嗨我已经编写了一个java程序,如果ID匹配但是从gettin StringIndexOutofBoundsException 获取分子函数和生物过程。 任何人都可以纠正吗?提前致谢。  以下是我的意见:

chr11   RAP3_rep    mRNA    17114958    17117968    .   +   .   ID=Os11t0448200-01;Name=Os11t0448200-01;Gene_symbols=AM14;GO=Molecular Function: protein kinase activity (GO:0004672),Molecular Function: ATP binding (GO:0005524),Biological Process: protein amino acid phosphorylation (GO:0006468),Molecular Function: protein tyrosine kinase activity (GO:0004713),Molecular Function: protein serine/threonine kinase activity (GO:0004674);ID_converter=Os11g0448200;InterPro=Protein kinase, core (IPR000719),Tyrosine protein kinase (IPR001245),Serine/threonine protein kinase (IPR002290),Serine/threonine protein kinase, active site (IPR008271),Protein kinase-like (IPR011009),Serine/threonine protein kinase-related (IPR017442);Link_to=8185 (Oryzabase),Protein kinase%2C core (Plant Gene Family Database);Locus_id=Os11g0448200;Note=Arbuscular mycorrhizal specific marker 14.;ORF_evidence=Q53JE9 (UniProt);Transcript_evidence=Inferred from reference;Sequence_download=Os11t0448200-01;References=19033527%2C 15905328;Status=manual curation (Oct 29%2C 2010)
chr11   RAP3_rep    CDS 17114958    17115039    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17115846    17115869    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17115970    17116095    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116205    17116546    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116669    17116784    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17116880    17117140    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17117589    17117786    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    CDS 17117891    17117968    .   +   .   Parent=Os11t0448200-01
chr11   RAP3_rep    mRNA    17565866    17568694    .   -   .   ID=Os11t0455500-01;Name=Os11t0455500-01;Alias=AK059712,AK060299,AK119539,AK122115;ID_converter=Os11g0455500;Link_to=S-adenosyl-L-homocysteine hydrolase (Plant Gene Family Database);Locus_id=Os11g0455500;NIAS_FLcDNA=001-032-F05;Note=Similar to Adenosylhomocysteinase-like protein.;ORF_evidence=Q84VE1 (UniProt);Transcript_evidence=AK059712 (DDBJ%2C Best hit);Sequence_download=Os11t0455500-01;InterPro=NAD(P)-binding (IPR016040),S-adenosyl-L-homocysteine hydrolase (IPR000043),S-adenosyl-L-homocysteine hydrolase%2C NAD binding (IPR015878);GO=Molecular Function: catalytic activity (GO:0003824),Molecular Function: binding (GO:0005488),Biological Process: metabolic process (GO:0008152),Molecular Function: adenosylhomocysteinase activity (GO:0004013),Biological Process: one-carbon compound metabolic process (GO:0006730);Expression=AK059712
chr11   RAP3_rep    CDS 17567891    17568694    .   -   .   Parent=Os11t0455500-01;
chr11   RAP3_rep    CDS 17566493    17567029    .   -   .   Parent=Os11t0455500-01;
chr11   RAP3_rep    CDS 17566191    17566400    .   -   .   Parent=Os11t0455500-01;

和程序

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
import java.io.ObjectInputStream.GetField;
import java.util.ArrayList;
import java.util.Scanner;

public class Sample
{
 public static void main(String args[]) throws FileNotFoundException
 {
   Sample s=new Sample();
   String inputID="Os11t0120200-01";

   //System.out.println("Enter the value");
   //Scanner sc=new Scanner(System.in);
   //n=sc.nextLong();

   ArrayList<String> IDlist=new ArrayList<String>();
   ArrayList<String> InputIDlist=new ArrayList<String>();
   int n;
   try
   {
     File nf=new File("textfile1.txt");
     FileOutputStream fop1=new FileOutputStream(nf,true);
     String os ="";

     FileInputStream fis1=new FileInputStream("chr11.gb");
     FileInputStream fis2=new FileInputStream("1.txt");
     InputStreamReader in1 = new InputStreamReader(fis1, "UTF-8");
     InputStreamReader in2 = new InputStreamReader(fis2, "UTF-8");
     BufferedReader input1 = new BufferedReader(in1);
     BufferedReader input2 =  new BufferedReader(in2);

     String line1;
     String line2;

     FileInputStream fis=new FileInputStream("chr11.GB");
     InputStreamReader in = new InputStreamReader(fis, "UTF-8");
     BufferedReader input = new BufferedReader(in);
     String line;

     File f=new File("1.GB");
     FileOutputStream fop=new FileOutputStream(f);

     if(f.exists())
     {
        os="This data is written through the program\t\n";
        fop1.write(os.getBytes());

        String str1="";
        String str2="";
        os="The data has been written\t\n";
        fop1.write(os.getBytes());

        while((line=input.readLine())!=null)
        {
          String splits[]=line.split("\t");
          if(splits[2].equalsIgnoreCase("mrna"))
          {
            IDlist.add((splits[8]));
          }
        }

        while((line=input2.readLine())!=null)
        {
          String splits[]=line.split("\t");
          if(splits[0]!="")
          {
            InputIDlist.add((splits[0]));
          }
        }
        for(int j=0; j<InputIDlist.size(); j++)
        {
          for(int i=0; i<IDlist.size(); i++)
          {
            if((IDlist.get(i).substring(3, 18).toString()).equals(InputIDlist.get(j)))
            {
              if(IDlist.get(i).contains("Alias"))
              {
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))+"\t\n";
                 fop1.write(os.getBytes());
              }
              if(IDlist.get(i).contains("Biological Process"))
              {
                 //n=IDlist.get(i).lastIndexOf("Biological Process");
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Biological Process"),IDlist.get(i).lastIndexOf(";"))+"\t\n";
                 fop1.write(os.getBytes());
              }
              if(IDlist.get(i).contains("Molecular Function"))
              {
                 //n=IDlist.get(i).lastIndexOf("Molecular Function");
                 os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Molecular Function"), IDlist.get(i).lastIndexOf(","))+"\t\n";
                 fop1.write(os.getBytes());
              }
              break;
            }
            String p="\n";
            fop1.write(p.getBytes());
          }
        }
     }
     else
     {
        System.out.println("This file is not exist");
     }
   }
   catch (Exception e)
   {
      e.printStackTrace();
   }
 }
}

2 个答案:

答案 0 :(得分:2)

我同意对这个问题的评论,但我仍然会猜测:

最有可能的是,它是以下文件(由于 String IndexOutOfBoundsException):IDlist.get(i).substring(3, 18)。如果这个更短,你会得到那个例外。

原因可能就是这个部分:

if(splits[0]!="")  
{
   InputIDlist.add((splits[0]));
}

如果splits[0]为空,==可能仍然不成立(因此!=可能为真)。在此使用!splits[0].equals("")(或更好!"".equals(splits[0])来说明splits[0]可能为空的可能性。请注意==检查引用相等性,即两个引用都指向同一个对象(就C ++而言,它是否是相同的指针),而equals检查逻辑相等(可能是针对不同的实现)每个对象)。

编辑:

该例外的另一种可能性是其中一条线:

os=IDlist.get(i).substring(IDlist.get(i).lastIndexOf("Alias"),IDlist.get(i).lastIndexOf("ID_converter"))

您检查“别名”,因此lastIndexOf("Alias")不应返回-1,但IDlist.get(i).lastIndexOf("ID_converter")可能不会。如果是这样,那你就不在了。

编辑2:

还有一件事:即使两个字符串(“别名”和“ID_converter”)都在源字符串中,但顺序错误(“ID_converter ....别名”),你也会得到这个例外从那以后begin index > end index不允许(请阅读String.substring()上的JavaDoc)。

答案 1 :(得分:1)

变化:

if (IDlist.get(i).contains("Alias"))

要:

if ((IDlist.get(i).contains("Alias")) && (IDlist.get(i).contains("ID_converter")))

如果没有进入if语句,那么设置一个断点来检查第二个条件为什么是假的。