Question

我已经暂时解决了这个问题。我基本上试图从一组CSV数据生成树层次结构。 CSV数据不一定是有序的。这类似于以下内容：

Header: Record1,Record2,Value1,Value2
Row: A,XX,22,33
Row: A,XX,777,888
Row: A,YY,33,11
Row: B,XX,12,0
Row: A,YY,13,23
Row: B,YY,44,98

我正在尝试尽可能灵活地执行分组。最简单的分组方法是对Record1和Record2进行操作，将Value1和Value2存储在Record2下，以便我们得到以下输出：

Record1
    Record2
        Value1 Value2

这将是：

我目前正将我的群组设置存储在列表中 - 我不知道这是否会阻碍我的想法。此列表包含组的层次结构，例如：

Record1 (SchemaGroup)
    .column = Record1
    .columns = null
    .childGroups =
        Record2 (SchemaGroup)
            .column = Record1
            .columns = Value1 (CSVColumnInformation), Value2 (CSVColumnInformation)
            .childGroups = null

此代码如下所示：

private class SchemaGroup {
    private SchemaGroupType type = SchemaGroupType.StaticText;  // default to text
    private String text;
    private CSVColumnInformation column = null;
    private List<SchemaGroup> childGroups = new ArrayList<SchemaGroup>();
    private List<CSVColumnInformation> columns = new ArrayList<CSVColumnInformation>();
}


private enum SchemaGroupType {
    /** Allow fixed text groups to be added */
    StaticText,
    /** Related to a column with common value */
    ColumnGroup
}

我正在为此制作一个算法，试图考虑要使用的底层结构。目前我正在使用我自己的包装类从上到下解析CSV：

CSVParser csv = new CSVParser(content);
String[] line;
while((line = csv.readLine()) != null ) {
    ...
}

我只是想开始我的编码大脑。

有什么想法吗？

Answer 1

基本思路并不难：按第一条记录分组，然后按第二条记录分组等，直到你得到这样的结果：

(A,XX,22,33)
(A,XX,777,888)
-------------------------
(A,YY,33,11)
(A,YY,13,23)
=============
(B,XX,12,0)
-------------------------
(B,YY,44,98)

然后向后工作以构建树木。

但是，有一个递归组件使得有点难以推断这个问题，或者一步一步地显示它，所以实际上编写伪代码更容易。

我假设你的csv中的每一行都表示为一个元组。每个元组都有“记录”和“值”，使用您在问题中使用的相同术语。 “记录”是必须放入层次结构中的东西。 “价值观”将是树的叶子。当我使用具有这些特定含义的这些术语时，我会使用引用。

我还假设所有“记录”都出现在所有“值”之前。

不用多说，代码：

// builds tree and returns a list of root nodes
// list_of_tuples: a list of tuples read from your csv
// curr_position: used to keep track of recursive calls
// number_of_records: assuming each csv row has n records and then m values, number_of_records equals n
function build_tree(list_of_tuples, curr_position, number_of_records) {
    // check if we have already reached the "values" (which shouldn't get converted into trees)
    if (curr_position == number_of_records) {
        return list of nodes, each containing a "value" (i.e. everything from position number_of_records on)
    }

    grouped = group tuples in list_of_tuples that have the same value in position curr_position, and store these groups indexed by such common value
    unique_values = get unique values in curr_position

    list_of_nodes = empty list

   // create the nodes and (recursively) their children
    for each val in unique_values {
        the_node = create tree node containing val
        the_children = build_tree(grouped[val], curr_position+1, number_of_records)
        the_node.set_children(the_children)

        list_of_nodes.append(the_node)
    }

    return list_of_nodes
}

// in your example, this returns a node with "A" and a node with "B"
// third parameter is 2 because you have 2 "records"
build_tree(list_parsed_from_csv, 0, 2)

现在你必须考虑使用的具体数据结构，但是如果你理解算法，那么这应该不会太困难（正如你所提到的，我认为早期决定数据结构可能会阻碍你的想法）。

Answer 2

以下是使用google-guava collections简化的junit（无断言）形式的基本工作解决方案。代码是不言自明的，而不是文件，你使用csv库来读取csv。这应该给你基本的想法。

import java.io.File;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import java.util.Set;

import org.junit.Test;

import com.google.common.base.Charsets;
import com.google.common.base.Splitter;
import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.Iterables;
import com.google.common.collect.Multimap;
import com.google.common.collect.Sets;
import com.google.common.io.Files;

public class MyTest
{
    @Test
    public void test1()
    {
        List<String> rows = getAllDataRows();

        Multimap<Records, Values> table = indexData(rows);

        printTree(table);

    }

    private void printTree(Multimap<Records, Values> table)
    {
        Set<String> alreadyPrintedRecord1s = Sets.newHashSet();

        for (Records r : table.keySet())
        {
            if (!alreadyPrintedRecord1s.contains(r.r1))
            {
                System.err.println(r.r1);
                alreadyPrintedRecord1s.add(r.r1);
            }

            System.err.println("\t" + r.r2);

            Collection<Values> allValues = table.get(r);

            for (Values v : allValues)
            {
                System.err.println("\t\t" + v.v1 + " , " + v.v2);
            }
        }
    }

    private Multimap<Records, Values> indexData(List<String> lines)
    {
        Multimap<Records, Values> table = ArrayListMultimap.create();

        for (String row : lines)
        {
            Iterable<String> split = Splitter.on(",").split(row);
            String[] data = Iterables.toArray(split, String.class);

            table.put(new Records(data[0], data[1]), new Values(data[2], data[3]));
        }
        return table;
    }

    private List<String> getAllDataRows()
    {
        List<String> lines = Collections.emptyList();

        try
        {
            lines = Files.readLines(new File("C:/test.csv"), Charsets.US_ASCII);
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }

        lines.remove(0);// remove header

        return lines;
    }
}



public class Records
{
    public final String r1, r2;

    public Records(final String r1, final String r2)
    {
        this.r1 = r1;
        this.r2 = r2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((r1 == null) ? 0 : r1.hashCode());
        result = prime * result + ((r2 == null) ? 0 : r2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Records))
        {
            return false;
        }
        Records other = (Records) obj;
        if (r1 == null)
        {
            if (other.r1 != null)
            {
                return false;
            }
        }
        else if (!r1.equals(other.r1))
        {
            return false;
        }
        if (r2 == null)
        {
            if (other.r2 != null)
            {
                return false;
            }
        }
        else if (!r2.equals(other.r2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Records1and2 [r1=").append(r1).append(", r2=").append(r2).append("]");
        return builder.toString();
    }

}


public class Values
{
    public final String v1, v2;

    public Values(final String v1, final String v2)
    {
        this.v1 = v1;
        this.v2 = v2;
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((v1 == null) ? 0 : v1.hashCode());
        result = prime * result + ((v2 == null) ? 0 : v2.hashCode());
        return result;
    }

    @Override
    public boolean equals(final Object obj)
    {
        if (this == obj)
        {
            return true;
        }
        if (obj == null)
        {
            return false;
        }
        if (!(obj instanceof Values))
        {
            return false;
        }
        Values other = (Values) obj;
        if (v1 == null)
        {
            if (other.v1 != null)
            {
                return false;
            }
        }
        else if (!v1.equals(other.v1))
        {
            return false;
        }
        if (v2 == null)
        {
            if (other.v2 != null)
            {
                return false;
            }
        }
        else if (!v2.equals(other.v2))
        {
            return false;
        }
        return true;
    }

    @Override
    public String toString()
    {
        StringBuilder builder = new StringBuilder();
        builder.append("Values1and2 [v1=").append(v1).append(", v2=").append(v2).append("]");
        return builder.toString();
    }

}

Answer 3

如果你知道你只有两级Record，我会使用像

这样的东西

Map<string, Map<string, List<Values>>>

当您阅读新行时，您会查看外部地图以检查Record1的值是否已存在，如果不存在，则为其创建新的空内部Map。

然后检查内部地图是否存在Record2的值。如果没有，请创建新的List。

然后读取值并将它们添加到列表中。

Answer 4

我最近需要做同样的事情并编写tree-builder.com来完成任务。唯一的区别是，当你的CSV布局时，最后两个参数将是父和子而不是对等。另外，我的版本不接受标题行。

代码全部是JavaScript;它使用jstree来构建树。您可以使用firebug或只查看页面上的来源以查看它是如何完成的。调整它以逃避CSV中的逗号可能很容易，以便保持最后两个参数是一个孩子。

Answer 5

根据这个问题的提出方法，我会做以下几点：

定义最终数据结构包含的内容树。
为原始文本中的每一行定义一种表示形式（可能是灵活性的链表）
编写一个方法，该方法获取所表示的行并将其插入到树数据结构中。对于每个不存在的分支，创建它;对于每个现有分支，遍历它时，逐步执行“行”链接列表结构。
从一棵空树开始。
将文件的每一行读入行项目结构，并调用步骤3中定义的方法。

这有帮助吗？

Answer 6

    public static void main (String arg[]) throws Exception
{
    ArrayList<String> arRows = new ArrayList<String>();
    arRows.add("A,XX,22,33");
    arRows.add("A,XX,777,888");
    arRows.add("A,YY,33,11");
    arRows.add("B,XX,12,0");
    arRows.add("A,YY,13,23");
    arRows.add("B,YY,44,98");
    for(String sTreeRow:createTree(arRows,",")) //or use //// or whatever applicable
        System.out.println(sTreeRow);
}
    public static ArrayList<String> createTree (ArrayList<String> arRows, String sSeperator) throws Exception
{
    ArrayList<String> arReturnNodes = new ArrayList<String>();
    Collections.sort(arRows);
    String sLastPath = "";
    int iFolderLength = 0;
    for(int iRow=0;iRow<arRows.size();iRow++)
    {
        String sRow = arRows.get(iRow);
        String[] sFolders = sRow.split(sSeperator);
        iFolderLength = sFolders.length;
        String sTab = "";
        String[] sLastFolders = sLastPath.split(sSeperator);
        for(int i=0;i<iFolderLength;i++)
        {
            if(i>0)
                sTab = sTab+"    ";
            if(!sLastPath.equals(sRow))
            {

                if(sLastFolders!=null && sLastFolders.length>i)
                {
                    if(!sLastFolders[i].equals(sFolders[i]))
                    {
                        arReturnNodes.add(sTab+sFolders[i]+"");
                        sLastFolders = null;
                    }
                }
                else
                {
                    arReturnNodes.add(sTab+sFolders[i]+"");
                }
            }
        }
        sLastPath = sRow;
    }
    return arReturnNodes;
}

从csv生成树结构

6 个答案: