Groovy Sql WithBatch在DB中丢失记录

时间:2017-02-07 13:06:58

标签: sql postgresql groovy batch-processing

我使用Groovy Sql.withBatch处理CSV文件并加载Postgres数据库中的所有数据。

这是我的方法:

def processCSV() {
    def logger = Logger.getLogger('groovy.sql')
    logger.level = Level.FINE
    logger.addHandler(new ConsoleHandler(level: Level.FINE))

    def fileName = "file.csv"
    def resource = this.getClass().getResource( '/csv/' + fileName )

    File file = new File(resource.path)

    String year = '2016'

    char separator = ','

    def lines = CSV
            .separator(separator)
            .skipLines(1)
            .quote(CSVParser.DEFAULT_QUOTE_CHARACTER)
            .escape(CSVParser.DEFAULT_ESCAPE_CHARACTER)
            .charset('UTF-8')
            .create()
            .reader(file)
            .readAll()

    def totalLines = lines.size()

    Sql sql = getDatabaseInstance()

    println("Delete existing rows for " + year + " if exists")
    String dQuery = "DELETE FROM table1 WHERE year = ?"
    sql.execute(dQuery, [year])

    def statement = 'INSERT INTO table1 (column1, column2, column3, coulmn4, year) VALUES (?, ?, ?, ?, ?)'

    println("Total lines in the CSV files: " + totalLines)

    def batches = []

    sql.withBatch(BATCH_SIZE, statement) { ps ->
        lines.each { fields ->
            String coulmn1 = fields[0]
            String coulmn2 = fields[1]
            String column3 = fields[2]
            String column4 = fields[3]

            def params = [column1, coulmn2, column3, column4, year]

            def batch = ['params': params, 'error': false]
            try {
                ps.addBatch(params)
            }
            catch (all) {
                batch['error'] = true
                throw all
            }

            batches << batch
        }
    }

    def recordsAddedInDB = sql.firstRow("SELECT count(*) FROM " + tableName + " WHERE year = ?", year)[0]

    sql.close()

    println("")
    println("Processed lines: " + line)
    println("Batches: " + batches.size())
    println("Batches in error: " + batches.findAll{ it.error }.size())
    println("Record in DB for " + year + ": " + recordsAddedInDB)
}

CSV文件中的行(排除标题行)为23758。 此方法的输出如下:

Delete existing rows for 2016 if exists
Total lines in the CSV files: 23758
Processed lines: 23758
Batches: 23758
Batches in error: 0
Record in DB for 2016 year: 23580

如果我启用日志记录,BATCH_SIZE为500,我可以看到:

  • 47次,句子&#34;用500个命令成功执行批次&#34;
  • 1次句子&#34;用258个命令成功执行批处理&#34;

这意味着已经处理了23758个插入语句。

任何人都知道为什么数据库中的行数少于处理过的行数?

1 个答案:

答案 0 :(得分:0)

解决。 INSERT语句有一个子查询,当子查询没有返回值时,忽略INSERT语句。