在csv文件中查找标准偏差

时间:2015-02-07 02:12:27

标签: java file csv streamwriter opencsv

我试图找到standard deviation(σ=√[(Σ(x - MEAN))2÷n])csv file.csv文件的单个提取列包含大约45000个实例和17个属性。 ';'。 为了找到标准偏差,在与Xi一起使用的while循环的每次迭代中都需要MEAN值。所以我认为MEAN需要在循环迭代之前找到标准偏差。但我不知道该怎么做或者有没有办法做到这一点。我被困在这里。然后我用新的Xi替换了旧Xi的代码。然后编写(生成)新的csv文件。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.io.FileWriter;
import java.io.*;
import static java.lang.Math.sqrt;

public class Main {

   public static void main(String[] args) throws IOException {

        String filename = "ly.csv";
        File file = new File(filename);
        BufferedWriter writer = null;

   try {
            writer = new BufferedWriter(new FileWriter("bank-full_updated.csv"));
       } 
   catch (IOException e) {
        } 
   try {

            double Tuple,avg;
            double temp;
            Tuple = 0; 
            double stddev=0;

             Scanner inputStream = new Scanner(file);
            inputStream.next();
            while (inputStream.hasNext()) {
            String data1 = inputStream.next();                
            String[] values = data1.split(";");
            double Xi = Double.parseDouble(values[1]);
           //now finding standard deviation

            temp1 +=  (Xi-MEAN);                
           // temp2=(temp1*temp1);
           // temp3=(temp2/count);
           // standard deviation=Math.sqrt(temp3);
            Xi=standard deviation * Xi

           //now replace  new Xi to original values1
            values[1] = String.valueOf(Xi);

           // iterate through the values and build a string out of them for write a new file
            StringBuilder sb = new StringBuilder();
            String newData = sb.toString();

      for (int i = 0; i < values.length; i++) {
                sb.append(values[i]);
        if (i < values.length - 1) {
                sb.append(";");
           }
           }
            // get the new string
            System.out.println(sb.toString());

            writer.write(sb.toString()+"\n");
            }

            writer.close();

            inputStream.close();
          }

       catch (FileNotFoundException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }    

    }
}

1 个答案:

答案 0 :(得分:2)

可以一次性计算标准偏差。 Donald Knuth教授有一个使用Kahan求和算法的算法。以下是论文:http://researcher.ibm.com/files/us-ytian/stability.pdf

Here是另一种方式,但它有四舍五入的错误:

double std_dev2(double a[], int n) {
    if(n == 0)
        return 0.0;
    double sum = 0;
    double sq_sum = 0;
    for(int i = 0; i < n; ++i) {
       sum += a[i];
       sq_sum += a[i] * a[i];
    }
    double mean = sum / n;
    double variance = sq_sum / n - mean * mean;
    return sqrt(variance);
}