直接导入

Question

我必须从文本文件中读取Employee数据（每个记录由制表符分隔）到ArrayList中。然后我必须将列表中的员工对象插入到DB中的Employee表中。为此，我逐个迭代列表元素，并将一个Employee详细信息一次插入到DB中。这种方法不建议使用性能，因为我们可以有超过100k的记录，并且插入整个数据需要花费很多时间。

在将数据从列表插入数据库以提高性能时，我们如何在此处使用多线程。另外，我们如何使用CountDownLatch和ExecutorService类来优化此方案。

ReadWriteTest

public class ReadWriteTest {

public static void main(String... args) {
    BufferedReader br = null;
    String filePath = "C:\\Documents\\EmployeeData.txt";
    try {
        String sCurrentLine;
        br = new BufferedReader(new FileReader(filePath));
        List<Employee> empList = new ArrayList<Employee>();

        while ((sCurrentLine = br.readLine()) != null) {
            String[] record = sCurrentLine.split("\t");
            Employee emp = new Employee();
            emp.setId(record[0].trim());
            emp.setName(record[1].trim());
            emp.setAge(record[2].trim());
            empList.add(emp);
        }
        System.out.println(empList);

        writeData(empList);

    } catch (IOException | SQLException e) {
        e.printStackTrace();
    }
}

public static void writeData(List<Employee> empList) throws SQLException {
    Connection con =null;
    try{  
        Class.forName("oracle.jdbc.driver.OracleDriver");  

        con=DriverManager.getConnection("jdbc:oracle:thin:@localhost:1521:xe","system","oracle");  
        for(Employee emp : empList)  
        {
        PreparedStatement stmt=con.prepareStatement("insert into Employee values(?,?,?)");  
        stmt.setString(1,emp.getId()); 
        stmt.setString(2,emp.getName());
        stmt.setString(3,emp.getAge());
        stmt.executeUpdate();   
        }         
        }catch(Exception e){ 
            System.out.println(e);
        }
        finally{
            con.close();
        }   
        }  
}

员工类

public class Employee {

String id;
String name;
String age;

public String getId() {
    return id;
}
public void setId(String id) {
    this.id = id;
}
public String getName() {
    return name;
}
public void setName(String name) {
    this.name = name;
}
public String getAge() {
    return age;
}
public void setAge(String age) {
    this.age = age;
}
@Override
public String toString() {
    return "Employee [id=" + id + ", name=" + name + ", age=" + age + "]";
}
}

EmployeeData.txt

1   Sachin  20
2   Sunil   30
3   Saurav  25

Answer 1

直接导入

Java应用程序方法的替代方法是数据库方法。所有主要数据库都有可以将数据直接从文本文件导入表的工具。

Postgres有COPY命令。这可以是run from the command line，也可以来自SQL。请参阅the wiki page进行讨论。

查看数据库工具集。

Answer 2

我同意@kuporific。从性能的角度来看，批量更新将证明更好。

尝试对您的代码进行以下编辑：

    public static void writeData(List<Employee> empList) throws SQLException {
    Connection con =null;
    final int BATCH_SIZE = 1000; // just an indicative number
    try{  
        Class.forName("oracle.jdbc.driver.OracleDriver");  
        con=DriverManager.getConnection("jdbc:oracle:thin:@localhost:1521:xe","system","oracle");  
        Statement statement = con.createStatement();
        int counter = 0;
        for(Employee emp : empList)  
        {
            String query = "insert into Employee (id, name, city) values('"
                    emp.getId() + "','" + emp.getName() + "','" + emp.getAge() + "')";
            statement.addBatch(query);
            if (counter % BATCH_SIZE == 0){
                statement.executeBatch();
            }
            counter++;  
        }

        statement.close();

        }catch(Exception e){ 
            System.out.println(e);
        }
        finally{
            con.close();
        }   
}

Answer 3

根据您的应用程序，将数据库更新代码放在主应用程序线程的线程上可能是有意义的。例如，您可以使用Executors执行此操作。

您也可以考虑使用batch updates。

我怀疑尝试在多个线程上更新数据库不会加快速度，因为数据库必须保持原子性，所以任何表都只能由一个线程一次更新。

你可能真的很疯狂，并使用Java 8 StructField从主线程中执行这两个操作：

import org.apache.spark.sql.types._
import com.madhukaraphatak.sizeof.SizeEstimator

object App {
  def main(args: Array[String]) {
    val schema = StructField("foo", IntegerType, true)
    println(SizeEstimator.estimate(schema))
    // 271872172
  }
}

第一个CompletableFuture将在另一个线程上调用给定的代码。完成后，返回值将传递给CompletableFuture.supplyAsync(new Supplier<List<Employee>>() { @Override public List<Employee> get() { List<Employee> employees = new ArrayList<>(); // get employee list return employees; } }).thenAcceptAsync(new Consumer<List<Employee>>() { @Override public void accept(List<Employee> employees) { // put into DB using batching } });中的supplyAsyc，该函数也会在另一个线程上运行。

这可以更紧凑地写成：

Consumer

如何使用多线程将数据从List插入到DB表中以提高性能？

3 个答案:

直接导入