scalikejdbc中的批量插入在远程计算机上很慢

时间:2014-10-29 14:35:50

标签: mysql scala jdbc hikaricp scalikejdbc

我试图以100的大量插入表格(我听说它是​​与mySQL一起使用的最佳大小),我使用scala 2.10.4和sbt 0.13.6以及我正在使用的jdbc框架与Hikaricp是scalikejdbc,我的连接设置如下:

val dataSource: DataSource = {
  val ds = new HikariDataSource()
  ds.setDataSourceClassName("com.mysql.jdbc.jdbc2.optional.MysqlDataSource");
  ds.addDataSourceProperty("url", "jdbc:mysql://" + org.Server.GlobalSettings.DB.mySQLIP + ":3306?rewriteBatchedStatements=true")
  ds.addDataSourceProperty("autoCommit", "false")
  ds.addDataSourceProperty("user", "someUser")
  ds.addDataSourceProperty("password", "not my password")
  ds
}

ConnectionPool.add('review, new DataSourceConnectionPool(dataSource))

插入代码:

try {
  implicit val session = AutoSession
  val paramList: scala.collection.mutable.ListBuffer[Seq[(Symbol, Any)]] = scala.collection.mutable.ListBuffer[Seq[(Symbol, Any)]]()
  .
  .
  .
  for(rev<reviews){
  paramList += Seq[(Symbol, Any)](
            'review_id -> rev.review_idx,
            'text -> rev.text,
            'category_id -> rev.category_id,
            'aspect_id -> aspectId,
            'not_aspect -> noAspect /*0*/ ,
            'certainty_aspect -> rev.certainty_aspect,
            'sentiment -> rev.sentiment,
            'sentiment_grade -> rev.certainty_sentiment,
            'stars -> rev.stars
          )
  }
  .
  .
  .
  try {
    if (paramList != null && paramList.length > 0) {
        val result = NamedDB('review) localTx { implicit session =>
        sql"""INSERT INTO `MasterFlow`.`classifier_results`
        (
            `review_id`,
            `text`,
            `category_id`,
            `aspect_id`,
            `not_aspect`,
            `certainty_aspect`,
            `sentiment`,
            `sentiment_grade`,
            `stars`)
        VALUES
              ( {review_id}, {text}, {category_id}, {aspect_id},
              {not_aspect}, {certainty_aspect}, {sentiment}, {sentiment_grade}, {stars})
        """
          .batchByName(paramList.toIndexedSeq: _*)/*.__resultOfEnsuring*/
          .apply()
        }

每次我插入一个批处理花了15秒,我的日志:

29/10/2014 14:03:36 - DEBUG[Hikari Housekeeping Timer (pool HikariPool-0)] HikariPool - Before cleanup pool stats HikariPool-0 (total=10, inUse=1, avail=9, waiting=0)
29/10/2014 14:03:36 - DEBUG[Hikari Housekeeping Timer (pool HikariPool-0)] HikariPool - After cleanup pool stats HikariPool-0 (total=10, inUse=1, avail=9, waiting=0)
29/10/2014 14:03:46 - DEBUG[default-akka.actor.default-dispatcher-3] StatementExecutor$$anon$1 - SQL execution completed

  [SQL Execution]
   INSERT INTO `MasterFlow`.`classifier_results` ( `review_id`, `text`, `category_id`, `aspect_id`, `not_aspect`, `certainty_aspect`, `sentiment`, `sentiment_grade`, `stars`) VALUES ( ...can't show this....);
   INSERT INTO `MasterFlow`.`classifier_results` ( `review_id`, `text`, `category_id`, `aspect_id`, `not_aspect`, `certainty_aspect`, `sentiment`, `sentiment_grade`, `stars`) VALUES ( ...can't show this....);
.
.
.
   INSERT INTO `MasterFlow`.`classifier_results` ( `review_id`, `text`, `category_id`, `aspect_id`, `not_aspect`, `certainty_aspect`, `sentiment`, `sentiment_grade`, `stars`) VALUES ( ...can't show this....);
   ... (total: 100 times); (15466 ms)

  [Stack Trace]
    ...
    logic.DB.ClassifierJsonToDB$$anonfun$1.apply(ClassifierJsonToDB.scala:119)
    logic.DB.ClassifierJsonToDB$$anonfun$1.apply(ClassifierJsonToDB.scala:96)
    scalikejdbc.DBConnection$$anonfun$_localTx$1$1.apply(DBConnection.scala:252)
    scala.util.control.Exception$Catch.apply(Exception.scala:102)
    scalikejdbc.DBConnection$class._localTx$1(DBConnection.scala:250)
    scalikejdbc.DBConnection$$anonfun$localTx$1.apply(DBConnection.scala:257)
    scalikejdbc.DBConnection$$anonfun$localTx$1.apply(DBConnection.scala:257)
    scalikejdbc.LoanPattern$class.using(LoanPattern.scala:33)
    scalikejdbc.NamedDB.using(NamedDB.scala:32)
    scalikejdbc.DBConnection$class.localTx(DBConnection.scala:257)
    scalikejdbc.NamedDB.localTx(NamedDB.scala:32)
    logic.DB.ClassifierJsonToDB$.insertBulk(ClassifierJsonToDB.scala:96)
    logic.DB.ClassifierJsonToDB$$anonfun$bulkInsert$1.apply(ClassifierJsonToDB.scala:176)
    logic.DB.ClassifierJsonToDB$$anonfun$bulkInsert$1.apply(ClassifierJsonToDB.scala:167)
    scala.collection.Iterator$class.foreach(Iterator.scala:727)
    ...

当我在托管mySQL数据库的服务器上运行它时,它运行得很快,我该​​怎么做才能让它在远程计算机上运行得更快?

2 个答案:

答案 0 :(得分:1)

如果有人需要,我有类似的问题,用ScalikeJdbc批量插入10000条记录到MySQL,并且可以通过在jdbc url中将rewriteBatchedStatements设置为true来解决(“jdbc:mysql:// host:3306 / DB?rewriteBatchedStatements =真正的“)。它将批量插入时间从40秒减少到1秒!

答案 1 :(得分:0)

我想这不是ScalikeJDBC或HikariCP的问题。您应该调查机器和MySQL服务器之间的网络环境。