将文本文件解析为多个文本文件

时间:2012-07-10 20:36:15

标签: java

我希望通过解析输入文件来获取多个文件。 输入文件包含许多数千种蛋白质序列的fasta格式,我想生成原始格式(即,没有任何逗号分号,也没有任何额外的符号,如“>”,“[”,“]”等)每个蛋白质序列。

fasta序列从“>”开始符号后面是蛋白质的描述,然后是蛋白质的序列。

例如►> lcl | NC_000001.10_cdsid_XP_003403591.1 [gene = LOC100652771] [蛋白质=假设蛋白质LOC100652771] [protein_id = XP_003403591.1] [location = join(12190..12227,12595..12721,13403..13639)] MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH

与上述甲酸盐一样,输入文件包含1000个蛋白质序列。我必须生成数千个仅包含单个蛋白质序列的原始文件,没有任何特殊符号或间隙。

我已经用Java开发了它的代码,但是输出的是:无法打开文件后跟无法找到的文件。

请帮我解决问题。

此致 Vijay Kumar Garg 瓦拉纳西 巴拉特(印度)

代码是

/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;

// java package for using regular expression
public class Arrayren
{
    public static void main(String args[]) throws IOException  
    {
        String a[]=new String[1000];
        String b[][] =new String[1000][1000];
        /*open the id file*/
        try
        {
            File f = new File ("input.txt"); 
            //opening the text document containing genbank ids
            FileInputStream fis = new FileInputStream("input.txt");
            //Reading the file contents through inputstream
            BufferedInputStream bis = new BufferedInputStream(fis);
            // Writing the contents to a buffered stream
            DataInputStream dis = new DataInputStream(bis);
            //Method for reading Java Standard data types
            String inputline;
            String line;
            String separator = System.getProperty("line.separator");
            // reads a line till next line operator is found
            int i=0;
            while ((inputline=dis.readLine()) != null) 
            {
                i++;
                a[i]=inputline;
                a[i]=a[i].replaceAll(separator,"");
                //replaces unwanted patterns like /n with space
                a[i]=a[i].trim();
                // trims out if any space is available
                a[i]=a[i]+".txt";
                //takes the file name into an array
                try
                // to handle run time error
                /*take the sequence in to an array*/
                {
                    BufferedReader in = new BufferedReader (new FileReader(a[i]));
                    String inline = null;
                    int j=0;
                    while((inline=in.readLine()) != null)
                    {
                        j++;
                        b[i][j]=inline;
                        Pattern q=Pattern.compile(">");
                        //Compiling the regular expression
                        Matcher n=q.matcher(inline);
                        //creates the matcher for the above pattern
                        if(n.find())
                        {
                            /*appending the comment line*/
                            b[i][j]=b[i][j].replaceAll(">gi","");
                            //identify the pattern and replace it with a space
                            b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
                            b[i][j]=b[i][j].replaceAll("|","");
                            b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
                            b[i][j]=b[i][j].replaceAll(".","");
                            b[i][j]=b[i][j].replaceAll("_","");
                            b[i][j]=b[i][j].replaceAll("\\(","");
                            b[i][j]=b[i][j].replaceAll("\\)","");
                        }
                        /*printing the sequence in to a text file*/
                        b[i][j]=b[i][j].replaceAll(separator,"");
                        b[i][j]=b[i][j].trim();
                        // trims out if any space is available
                        File create = new File(inputline+"R.txt");
                        try
                        {
                            if(!create.exists())
                            {
                                create.createNewFile();
                                // creates a new file
                            }
                            else
                            {
                                System.out.println("file already exists");
                            }
                        }
                        catch(IOException e)
                        // to catch the exception and print the error if cannot open a file
                        {
                            System.err.println("cannot create a file");
                        }
                        BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
                        outt.write(b[i][j]);
                        // printing the contents to a text file
                        outt.close();
                        // closing the text file
                        System.out.println(b[i][j]);
                    }
                }
                catch(Exception e)
                {
                    System.out.println("cannot open a file");
                }
            }
        }
        catch(Exception ex)
        // catch the exception and prints the error if cannot find file
        {
            System.out.println("cannot find file ");
        }
    }
}

如果您提供正确的信息,那将更容易理解。

2 个答案:

答案 0 :(得分:0)

由于缺少java expertice,此代码不会赢得价格。例如,即使它是正确的,我也会期待OutOfMemory。 最好是重写。 然而,我们都从小开始。

  • 提供文件的完整路径。同样在输出中,文件中可能缺少该目录。
  • 更好地使用BufferedReader等i.o. DateInputStream。
  • 用-1初始化i。更好地使用for (int i = 0; i < a.length; ++i)
  • 最好在循环外编译Pattern。但删除匹配器。你也可以if (s.contains(">")。 。一个人不需要创建新文件。

代码:

const String encoding = "Windows-1252"; // Or "UTF-8" or leave away.
File f = new File("C:/input.txt");
BufferedReader dis = new BufferedReader(new InputStreamReader(
    new FileInputStream(f), encoding));

...

        int i= -1; // So i++ starts with 0.
        while ((inputline=dis.readLine()) != null) 
        {
            i++;
            a[i]=inputline.trim();
            //replaces unwanted patterns like /n with space
            // Not needed a[i]=a[i].replaceAll(separator,"");

答案 1 :(得分:0)

您的代码包含以下两个catch块:

    catch(Exception e)
    {
        System.out.println("cannot open a file");
    }
    catch(Exception ex)
    // catch the exception and prints the error if cannot find file
    {
        System.out.println("cannot find file ");
    }

这两个都吞下了异常并打印了一个通用的“它不起作用”的消息,它告诉你输入了catch块,但仅此而已。

例外通常包含有用的信息,可以帮助您找出真正问题的位置。忽略它们,你会更难以诊断你的问题。更糟糕的是,你正在捕捉Exception,它是许多异常的超类,因此这些catch块会捕获许多不同类型的异常并忽略它们。

从异常中获取信息的最简单方法是调用其printStackTrace()方法,该方法打印异常类型,异常消息和堆栈跟踪。在这两个catch块中添加对此的调用,这将有助于您更清楚地了解抛出的异常以及从何处抛出异常。