Question

目标：要从搜索API获取数据框列中可用关键字的JSON响应。

+---------+--------+--------------------+------+
|searchKeyword   |Response                        |
+---------+--------+--------------------+------+
|  bags          |    [{"id":"4664"}.....      |
| sheet          |    [{"id":"976"}.....      |
| bottles        |    [{"id":"1234"}.....      |
| disposable bags|    [{"id":"234"}.....      |
+---------+--------+--------------------+------+

我获取了一些关键字的列表，然后将其转到了数据框中。之后，我通过执行mappartitions对这些关键字进行API调用，以便每个分区只能创建一个http连接。

但是，当我在rdd上执行操作时，却显示“连接池关闭错误”。

以下是使用mappartions的代码：-

val solrUrl = "http://%s:XXXXX/solr/%s/select?q=%s&fl=id,score&defType=edismax&wt=json"

def getHttpClient(): CloseableHttpClient = {
    val httpClient: CloseableHttpClient = HttpClients.createDefault();
    httpClient
  }


def getResults(url:String, httpClient:org.apache.http.impl.client.CloseableHttpClient): String = {
    val httpResponse = httpClient.execute(new HttpGet(url))
    val entity = httpResponse.getEntity()
    println(entity)
    var content = ""
    if (entity != null) {
      val inputStream = entity.getContent()
      content = scala.io.Source.fromInputStream(inputStream).getLines.mkString
      inputStream.close
    }
    httpClient.getConnectionManager().shutdown()
    return content
  }




val rddResults = searchTermsDf.rdd.mapPartitions(partition => {
  val connection = getHttpClient() 
  val newPartition = partition.map(keyword => {


  val searchTerm = keyword.getString(0)

  var url = solrUrl.format(HOST_IP,searchTerm)

  getResults(url,connection)
  }).toList // consumes the iterator, thus calls readMatchingFromDB 

  //println(newPartition)
  connection.close()
  newPartition.iterator // create a new iterator
})

rddResults.foreach(println)

如果我做错了事，请您帮我。

Spark：使用mappartitions进行多个API调用，导致java.lang.illegalstateException：连接池关闭

0 个答案: