从给定的.txt文件中提取整数

时间:2019-06-05 21:34:57

标签: java

每列代表大量数据的不同变量。我试图提取每个数字并将其放置在每一行的数组中。

下划线表示间距

2 ___ 2 ___ 2 _______ 3 ___ 1 ___ 19

1 ___ 3 ___ 2 _______ 3 ___ 3 ___ 19

1 ___ 3 ___ 4 _______ 3 ___ 1 ___ 19

6 ___ 3 ___ 6 _______ 5 _______ 13

5 ___ 2 ___ 5 _______ 5 _______ 13

5 ___ 4 ___ 4 ___ 7 ___ 4 _______ 13

spaceForNew表示在找到下一个变量之前还剩下多少个字符。这与当前变量不同。

我正在使用以下代码:

    public static int[] remaining(String Line)throws IOException
{
    int[] data = new int[7];
    int pointer = 0;
    int spaceForNew = 0;
    for(int i = 0;i<=Line.length()-1;i++)
    {
        if(i<Line.length()-1)
        {
            if((i == spaceForNew)&&(pointer<6))
            {
                //two digit
                if((Line.charAt(i)=='1')&&(Line.charAt(i+1)=='0'))
                {
                    data[pointer] = 10;
                    spaceForNew+=3;
                    pointer++;
                //one digit
                }else if((Line.charAt(i)!= '    ')&&(Line.charAt(i+1)!='0')){
                    data[pointer] = Integer.parseInt(Character.toString(Line.charAt(i)));
                    spaceForNew+=2;
                    pointer++;
                }else if((Line.charAt(i)==' ')&&(data[pointer]==0)){
                    data[pointer]=-1;
                    spaceForNew++;
                    pointer++;
                }

            }
        }else {
            if(pointer==6)
            {
                data[pointer]=Integer.parseInt(Character.toString(Line.charAt(i)));
            }
        }
    }
    return data;
}

以下代码令人毛骨悚然,不是很直观,但是似乎可以处理大量数据,但是失败的方式似乎是随机的。根本没有任何建议

4 个答案:

答案 0 :(得分:0)

UPD 尝试

    String line = "10   8   10           1   8";
    String[] split = line.split("   ");
    int[] array = new int[7];
    for (int i = 0; i < split.length; i++) {
        array[i] = split[i].trim().isEmpty() ? -1 : Integer.parseInt(split[i].trim());
    }

答案 1 :(得分:0)

您可以使用正则表达式来解析行 (\d+| )(?: )?
这基本上是说给我所有数字或一个空格,然后跟或不跟3个空格。 您将获得一个可以解析为数字或为单个空格的字符串列表,您可以将其作为丢失的数据进行处理,但是将成为占位符,因此您可以使列保持直线。

    Integer[] parsed = new Integer[7];
    String thing = "2   2   2       3   1   19";
    Pattern pattern = Pattern.compile("(\\d+| )(?:   )?");
    Matcher m = pattern.matcher(thing);
    int index = 0;
    while (m.find()) {
        if (!" ".equals(m.group(1)))
            parsed[index] = Integer.parseInt(m.group(1));
        else
            parsed[index] = -1; //or what ever your missing data value should be.
        index++;
    }
    Arrays.asList(parsed).forEach(System.out::println);

edit ***超级固定。 group(0)是整个模式,然后是任何捕获组。因此,group(1)获取第一个捕获组,该捕获组只是数字或单个空格。

答案 2 :(得分:0)

您需要知道每行的确切模式。我假设每个“列”的宽度都是固定的,否则数字就不会像这样对齐。

例如,假设每列的宽度为三个字符(数字和/或空格),并且列分隔符的宽度为1个空格,则您的模式可能如下所示:

[ \d]{3} |[ \d]{1,3}

现在使用Pattern::compilePattern::matcherMatcher::find,您可以搜索当前行中存在的所有数字。假设linesList<String>,每个元素都是一行:

// Precompile pattern. This matches either a cell followed by a space, or,
// if we are at the end of the line, a variable number of spaces and/or
// digits.
Pattern pattern = Pattern.compile("[ \\d]{3} |[ \\d]{1,3}");

List<List<Integer>> matrix = lines.stream()
    .map(pattern::matcher)
    .map(matcher -> {
        List<Integer> ints = new ArrayList<>();
        while (matcher.find()) {
            String element = matcher.group().trim();
            ints.add(!element.isEmpty() ? Integer.valueOf(element) : -1);
        }
        return ints;
    })
    .collect(Collectors.toList());

使用MatcherStream提供的dimo414

Pattern pattern = Pattern.compile("[ \\d]{3} |[ \\d]{1,3}");
List<List<Integer>> matrix = lines.stream()
    .map(line -> MatcherStream.find(pattern, line)
        .map(String::trim)
        .map(element -> !element.isEmpty() ? Integer.valueOf(element) : -1)
        .collect(Collectors.toList()))
    .collect(Collectors.toList());

答案 3 :(得分:0)

我想,从理论上讲,在任何给定文件行的空格分隔数据(甚至是连续值)中,任何地方都可能缺少值。这将包括

  • 在数据行的开头;
  • 在数据行的末尾;
  • 数据行开始和结束之间的任何地方。

示例可能是(如您的示例中,下划线表示空格):

2___2___2_______3___1___19

1___3___2_______3___3___19

____3___4_______3___1___19

____5___7___4___3___8____

6___3___6_______5_______13

5___2___5_______________13

5___4___4___7___4_______16

10___6___10___3___8_______1

2___10___0___8___4___0___1

2___10___0___8___4________

4___12___0___9___6

这里的好处是,文件中的数据似乎以 固定空间 模式格式化。知道这一点后,就有可能用特定的整数值替换丢失的值,该整数值将与每个文件数据行中实际包含的其他值相去甚远。我认为“-1” (您使用的是什么)确实可以很好地解决此问题,前提是不必担心处理文件或 -1 < / strong>永远不会成为进一步处理数据的任何真正关注的价值,因为它已被考虑为可能存在。当然,这必须由您决定。

一旦将任何给定数据行中的缺失值替换为 -1 ,就可以根据空格定界来分割行,将数组元素转换为整数,然后将它们放入整数数组中。

如果要将文件数据的每一行(文件行)放入一个整数数组,那么请允许我提出一个二维整数(int [] [])数组。我认为您会发现它更容易处理,因为整个数据文件都可以包含在该特定数组中。然后,允许Java方法创建该数组,例如:

将整个文件逐行读取到String []数组中:

List<String> list = new ArrayList<>();
try (Scanner reader = new Scanner(new File("FileExample.txt"))) {
    while (reader.hasNextLine()) {
        String line = reader.nextLine();
        if (line.equals("")) { continue; }
        list.add(line);
    }
}
catch (FileNotFoundException ex) {
    Logger.getLogger("FILE NOT FOUND!").log(Level.SEVERE, null, ex);
}

// Convert list to String Array
String[] stringData = list.toArray(new String[0]);

FileExample.txt文件包含与上面提供的数据完全相同的数据,但是文件下划线是空格。一旦运行了上面的代码,名为 stringData 的String [] Array变量将包含所有文件数据行。现在,我们将此数组传递给名为 stringDataTo2DIntArray()的下一个方法(由于缺少更好的名称),以创建2D整数数组( data [] [] ):< / p>

/**
 * Creates a 2D Integer (int[][]) Array from data lines contained within the 
 * supplied String Array.<br><br>
 * 
 * @param stringData (1D String[] Array) The String array where each element 
 * contains lines of fixed space delimited numerical values, for example each 
 * line would look something like:<pre>
 * 
 *     "2   1   3   4   5   6   7" </pre>
 * 
 * @param replaceMissingWith (String) One or more numerical values could be 
 * missing from any elemental line within the supplied stringData array. What 
 * you supply as an argument to this parameter will be used in place of that 
 * missing value. <br>
 * 
 * @param desiredNumberOfColumns (Integer (int)) The number of columns desired 
 * in each row of the returned 2D Integer Array. Make sure desiredNumberOfColumns 
 * contains a value greater than 0 and less then (Integer.MAX_VALUE - 4). You 
 * will most likely run out of JVM memory if you go that big! Be reasonable, 
 * although almost any unsigned integer value can be supplied (and you're 
 * encouraged to test this) the largest number of data columns contained within 
 * the data file should suffice.<br>
 * 
 * @return (2D Integer (int[][]) Array) A two dimensional Integer Array derived 
 * from the supplied String Array of fixed space delimited line data.
 */
public int[][] stringDataToIntArray(final String[] stringData, 
        final String replaceMissingWith, final int desiredNumberOfColumns) {
    int requiredArrayLength = desiredNumberOfColumns;

    // Make sure the replaceMissingWith parameter actually contains something.
    if (replaceMissingWith == null || replaceMissingWith.trim().equals("")) {
        System.err.println("stringDataToIntArray() Method Error! The "
                + "replaceMissingWith parameter requires a valid argument!");
        return null;  
    }

    /* Make sure desiredNumberOfColumns contains a value greater than 0 and
       less then (Integer.MAX_VALUE - 4).   */
    if (desiredNumberOfColumns < 1 || desiredNumberOfColumns > (Integer.MAX_VALUE - 4)) {
        System.err.println("stringDataToIntArray() Method Error! The "
                + "desiredNumberOfColumns parameter requires any value "
                + "from 1 to " + (Integer.MAX_VALUE - 4) + "!");
        return null;
    }

    // The 2D Array to return.
    int[][] data = new int[stringData.length][requiredArrayLength];

    /* Iterate through each elemental data line contained within 
       the supplied String Array. Process each line and replace 
       any missing values...   */
    for (int i = 0; i < stringData.length; i++) {
        String line = stringData[i];
        // Replace the first numerical value with replaceMissingWith if missing:
        if (line.startsWith(" ")) {
            line = replaceMissingWith + line.substring(1);
        }

        // Replace remaining missing numerical values if missing:
        line = line.replaceAll("\\s{4}", " " + replaceMissingWith);

        // Split the string of numerical values based on whitespace:
        String[] lineParts = line.split("\\s+");

        /* Ensure we have the correct Required Array Length (ie: 7):
           If we don't then at this point we were missing values at
           the end of the input string (line). Append replaceMissingWith
           to the end of line until a split satisfies the requiredArrayLength:  */
        while (lineParts.length < requiredArrayLength) {
            line+= " " + replaceMissingWith;
            lineParts = line.split("\\s+");
        }

        /* Fill the data[][] integer array. Convert each string numerical
           value to an Integer (int) value for current line:   */
        for (int  j = 0; j < requiredArrayLength; j++) {
            data[i][j] = Integer.parseInt(lineParts[j]);
        }
    } 
    return data;
}

要使用此方法(一旦您已读取数据文件并将其内容放入字符串数组中):

int[][] data = stringDataToIntArray(stringData, "-1", 7);

// Display the 2D data Array in Console...
for (int i = 0; i < data.length; i++) {
    System.out.println(Arrays.toString(data[i]));
}

如果您已经处理了上面提供的示例文件数据,那么控制台输出窗口应包含:

[2, 2, 2, -1, 3, 1, 19]
[1, 3, 2, -1, 3, 3, 19]
[-1, 3, 4, -1, 3, 1, 19]
[-1, 5, 7, 4, 3, 8, -1]
[6, 3, 6, -1, 5, -1, 13]
[5, 2, 5, -1, -1, -1, 13]
[5, 4, 4, 7, 4, -1, 16]
[10, 6, 10, 3, 8, -1, 1]
[2, 10, 0, 8, 4, 0, 1]
[2, 10, 0, 8, 4, -1, -1]
[4, 12, 0, 9, 6, -1, -1]

如果您只希望每个文件行的前三列,那么您的呼叫将是:

int[][] data = stringDataToIntArray(stringData, "-1", 3);

,输出结果如下:

[2, 2, 2]
[1, 3, 2]
[-1, 3, 4]
[-1, 5, 7]
[6, 3, 6]
[5, 2, 5]
[5, 4, 4]
[10, 6, 10]
[2, 10, 0]
[2, 10, 0]
[4, 12, 0]

,如果您希望每个文件行有12个数据列,则您的调用将是:

int[][] data = stringDataToIntArray(stringData, "-1", 12);

,输出结果如下:

[2, 2, 2, -1, 3, 1, 19, -1, -1, -1, -1, -1]
[1, 3, 2, -1, 3, 3, 19, -1, -1, -1, -1, -1]
[-1, 3, 4, -1, 3, 1, 19, -1, -1, -1, -1, -1]
[-1, 5, 7, 4, 3, 8, -1, -1, -1, -1, -1, -1]
[6, 3, 6, -1, 5, -1, 13, -1, -1, -1, -1, -1]
[5, 2, 5, -1, -1, -1, 13, -1, -1, -1, -1, -1]
[5, 4, 4, 7, 4, -1, 16, -1, -1, -1, -1, -1]
[10, 6, 10, 3, 8, -1, 1, -1, -1, -1, -1, -1]
[2, 10, 0, 8, 4, 0, 1, -1, -1, -1, -1, -1]
[2, 10, 0, 8, 4, -1, -1, -1, -1, -1, -1, -1]
[4, 12, 0, 9, 6, -1, -1, -1, -1, -1, -1, -1]

在每个数组结尾处附加的 -1 是因为该方法检测到数据行中不存在这些列,但是由于12是您所需的列数,因此需要附加数据