构建CSV文件时间距不一致

时间:2013-05-21 18:30:20

标签: java html csv screen-scraping

我正在从网站上抓取(屏幕抓取)这个原始文本数据,我必须对其进行格式化,然后将其传输到CSV文件。原始文本数据格式正确,但换行符不会延续。

Servicing Option: Retained/CTOS,,,Servicing Fee Rate: 0.250,,,Remittance Type: Gold
Seller Number: 143939

Mortgage product name from the Pricing Engine: 30-Year Fixed Rate Conventional,,,Valid for: 05/17/2013 at 01:56:39 PM, EDT
Interest Rate,5-DAY Contract Expiration Date : 05/22/2013,10-DAY Contract Expiration Date : 05/28/2013,15-DAY Contract Expiration Date : 06/03/2013,30-DAY Contract Expiration Date : 06/17/2013,45-DAY Contract Expiration Date : 07/01/2013,60-DAY Contract Expiration Date : 07/16/2013,75-DAY Contract Expiration Date : 07/31/2013,90-DAY Contract Expiration Date : 08/15/2013,
2.750,94.587,94.549,94.511,94.392,94.302,94.176,94.080,93.975,
2.875,95.574,95.535,95.497,95.363,95.273,95.134,95.038,94.919,
3.000,96.549,96.510,96.472,96.323,96.234,96.082,95.986,95.854,
3.125,97.489,97.450,97.412,97.250,97.160,96.997,96.901,96.757,
3.250,99.325,99.279,99.232,99.136,99.027,98.917,98.800,98.714,
3.375,100.333,100.287,100.240,100.126,100.017,99.891,99.774,99.673,
3.500,101.201,101.154,101.107,100.980,100.871,100.734,100.617,100.504,
3.625,102.016,101.970,101.923,101.785,101.676,101.529,101.413,101.290,
3.750,102.699,102.652,102.606,102.458,102.350,102.195,102.079,101.948,
3.875,103.326,103.271,103.216,103.146,103.018,102.915,102.777,102.703,
4.000,104.095,104.040,103.985,103.910,103.782,103.672,103.535,103.453,
4.125,104.834,104.779,104.724,104.641,104.513,104.399,104.262,104.176,
4.250,105.454,105.399,105.344,105.253,105.125,105.006,104.868,104.777,
4.375,104.469,104.405,104.342,104.441,104.293,104.315,104.157,104.209,
4.500,105.196,105.133,105.070,105.158,105.010,105.027,104.869,104.919,
4.625,105.892,105.828,105.765,105.847,105.699,105.712,105.554,105.599,
4.750,106.438,106.375,106.312,106.388,106.241,106.251,106.093,106.134,

Mortgage product name from the Pricing Engine: 20-Year Fixed Rate Conventional,,,Valid for: 05/17/2013 at 01:56:41 PM, EDT
Interest Rate,5-DAY Contract Expiration Date : 05/22/2013,10-DAY Contract Expiration Date : 05/28/2013,15-DAY Contract Expiration Date : 06/03/2013,30-DAY Contract Expiration Date : 06/17/2013,45-DAY Contract Expiration Date : 07/01/2013,60-DAY Contract Expiration Date : 07/16/2013,75-DAY Contract Expiration Date : 07/31/2013,90-DAY Contract Expiration Date : 08/15/2013,
2.750,95.080,95.042,95.003,94.869,94.779,94.640,94.543,94.424,
2.875,95.934,95.896,95.857,95.713,95.623,95.474,95.378,95.249,
3.000,96.777,96.739,96.700,96.546,96.456,96.299,96.202,96.065,
3.125,97.593,97.555,97.517,97.353,97.263,97.098,97.002,96.856,
3.250,100.570,100.523,100.476,100.364,100.255,100.131,100.014,99.915,
3.375,101.473,101.427,101.380,101.256,101.147,101.013,100.896,100.786,
3.500,102.276,102.229,102.183,102.050,101.941,101.799,101.682,101.564,
3.625,103.027,102.981,102.934,102.794,102.685,102.537,102.421,102.296,
3.750,103.584,103.538,103.491,103.346,103.237,103.085,102.969,102.839,
3.875,103.952,103.897,103.842,103.763,103.635,103.525,103.388,103.307,
4.000,104.664,104.609,104.554,104.474,104.346,104.232,104.095,104.009,
4.125,105.338,105.283,105.228,105.144,105.016,104.901,104.763,104.676,
4.250,105.824,105.769,105.714,105.624,105.496,105.380,105.243,105.154,
4.375,104.700,104.637,104.574,104.663,104.515,104.526,104.368,104.411,
4.500,105.361,105.298,105.234,105.315,105.167,105.178,105.020,105.063,
4.625,105.966,105.903,105.840,105.920,105.772,105.784,105.626,105.669,
4.750,106.336,106.273,106.210,106.290,106.143,106.158,106.000,106.046,

Mortgage product name from the Pricing Engine: 15-Year Fixed Rate Conventional,,,Valid for: 05/17/2013 at 02:04:38 PM, EDT
Interest Rate,5-DAY Contract Expiration Date : 05/22/2013,10-DAY Contract Expiration Date : 05/28/2013,15-DAY Contract Expiration Date : 06/03/2013,30-DAY Contract Expiration Date : 06/17/2013,45-DAY Contract Expiration Date : 07/01/2013,60-DAY Contract Expiration Date : 07/16/2013,75-DAY Contract Expiration Date : 07/31/2013,90-DAY Contract Expiration Date : 08/15/2013,
2.250,98.764,98.734,98.704,98.649,98.542,98.478,98.351,98.274,
2.375,99.509,99.479,99.449,99.388,99.280,99.212,99.085,99.001,
2.500,100.254,100.224,100.194,100.126,100.019,99.946,99.818,99.729,
2.625,100.868,100.838,100.808,100.734,100.627,100.549,100.422,100.325,
2.750,101.649,101.611,101.573,101.496,101.411,101.325,101.214,101.163,
2.875,102.380,102.342,102.304,102.220,102.135,102.044,101.933,101.873,
3.000,103.046,103.008,102.969,102.881,102.796,102.701,102.590,102.525,
3.125,103.598,103.559,103.521,103.430,103.344,103.248,103.137,103.068,
3.250,104.053,104.015,103.976,103.880,103.795,103.697,103.586,103.514,
3.375,103.999,103.952,103.906,103.803,103.802,103.688,103.672,103.715,
3.500,104.604,104.558,104.511,104.404,104.403,104.288,104.272,104.311,
3.625,105.149,105.102,105.056,104.944,104.943,104.826,104.809,104.845,
3.750,105.597,105.551,105.504,105.389,105.388,105.268,105.252,105.282,
3.875,104.646,104.591,104.536,104.413,104.497,104.363,104.395,104.496,
4.000,105.305,105.250,105.196,105.069,105.153,105.016,105.048,105.145,
4.125,105.819,105.764,105.709,105.579,105.662,105.524,105.556,105.649,
4.250,106.253,106.198,106.143,106.009,106.093,105.953,105.984,106.074,

我已经编写了以下代码来格式化

outputfile = "c:/" + session.getName() + ".csv"; 

out = new FileWriter(outputfile, true);

_prices = session.getv("Prices");

_prices = _prices.replace("GoldSeller", "Gold\nSeller");
_prices = _prices.replace("Seller Number: 143939", "Seller Number: 143939\n\n");
_prices = _prices.replace("EDTInterest", "EDT\nInterest");
_prices = _prices.replace("Mortgage", "\n\nMortgage");

String[] words = _prices.split(",");

for(i = 0; i < words.length; i++) {
    try {
        if(words[i].length() > 1) {
            if(words[i].substring(0,2).equals("0.") ||
                 words[i].substring(0,2).equals("1.") ||
                 words[i].substring(0,2).equals("2.") ||
                 words[i].substring(0,2).equals("3.") ||
                 words[i].substring(0,2).equals("4.") ||
                 words[i].substring(0,2).equals("5.") ||
                 words[i].substring(0,2).equals("6.") ||
                 words[i].substring(0,2).equals("7.") ||
                 words[i].substring(0,2).equals("8.") ||
                 words[i].substring(0,2).equals("9.")) {    

                _prices = _prices.replace(words[i], "\n" + words[i]);
            }
        }
    } 
    catch(Exception e){
        session.log(e.toString() + " " + words[i].length());
    }       
}

out.write(_prices);

out.close();

正在发生的问题是\ n字符在某些地方添加了两行,而在另一些地方添加了一行。

除了我添加\ n \ n之外,我不会尝试获取任何空行。

当我不使用\ n时,一切都在一行上。

1 个答案:

答案 0 :(得分:0)

我想我刚想通了。

_prices = _prices.replace(words[i], "\n" + words[i]);

这一行正在替换所有类似单词[i]添加额外行的单词。

相关问题