如何一次在python中发送异步http请求?

时间:2013-04-03 07:23:34

标签: python asynchronous gevent http-request

我们有一个工作队列,工人一次处理这些工作。每个作业都要求我们格式化一些数据并发出HTTP POST请求,并将数据作为请求有效负载。

我们如何让每个工作人员以单线程,非阻塞方式异步发出这些HTTP POST请求?我们不关心请求的响应 - 我们想要的只是请求尽快执行,然后让工作人员立即进入下一个工作。

我们已探索过使用geventgrequests库(请参阅Why does gevent.spawn not execute the parameterized function until a call to Greenlet.join?)。我们的工作代码看起来像这样:

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.post, url, params=params)

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()

第一个print语句执行,但第二个和第三个print语句永远不会打印,并且永远不会命中url。

我们如何才能执行这些异步请求?

4 个答案:

答案 0 :(得分:1)

1)制作一个Queue.Queue对象

2)根据你的喜好制作尽可能多的“工人”线程,并从Queue.Queue中读取

3)将作业提供给Queue.Queue

工作线程将按照它们放在它上面的顺序读取Queue.Queue

从文件中读取行并将它们放入Queue.Queue

的示例
import sys
import urllib2
import urllib
from Queue import Queue
import threading
import re

THEEND = "TERMINATION-NOW-THE-END"


#read from file into Queue.Queue asynchronously
class QueueFile(threading.Thread):
    def run(self):
        if not(isinstance(self.myq, Queue)):
            print "Queue not set to a Queue"
            sys.exit(1)
        h = open(self.f, 'r')
        for l in h:
            self.myq.put(l.strip())  # this will block if the queue is full
        self.myq.put(THEEND)

    def set_queue(self, q):
        self.myq = q

    def set_file(self, f):
        self.f = f

了解工作线程可能是什么样的(仅限示例)

class myWorker(threading.Thread):
    def run(self):
        while(running):           
            try:
                data = self.q.get()  # read from fifo

                req = urllib2.Request("http://192.168.1.10/url/path")
                req.add_data(urllib.urlencode(data))
                h1 = urllib2.urlopen(req, timeout=10)
                res = h1.read()
                assert(len(res) > 80)

            except urllib2.HTTPError, e:
                print e

            except urllib2.URLError, e:
                print "done %d reqs " % n
                print e
                sys.exit()

基于threading制作对象。转到go,创建对象然后在实例上调用“start”

答案 1 :(得分:1)

您必须在不同的线程中运行它或使用内置的asyncore库。 大多数库都会在你不知道的情况下进行线程化,或者它将依赖于作为Python标准部分的asyncore。

这是Threading和asyncore的组合:

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import asyncore, socket
from threading import *
from time import sleep
from os import _exit
from logger import *  # <- Non-standard library containing a log function
from config import *  # <- Non-standard library containing settings such as "server"

class logDispatcher(Thread, asyncore.dispatcher):
    def __init__(self, config=None):
        self.inbuffer = ''
        self.buffer = ''
        self.lockedbuffer = False
        self.is_writable = False

        self.is_connected = False

        self.exit = False
        self.initated = False

        asyncore.dispatcher.__init__(self)
        Thread.__init__(self)

        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            self.connect((server, server_port))
        except:
            log('Could not connect to ' + server, 'LOG_SOCK')
            return None

        self.start()

    def handle_connect_event(self):
        self.is_connected = True

    def handle_connect(self):
        self.is_connected = True
        log('Connected to ' + str(server), 'LOG_SOCK')

    def handle_close(self):
        self.is_connected = False
        self.close()

    def handle_read(self):
        data = self.recv(8192)
        while self.lockedbuffer:
            sleep(0.01)

        self.inbuffer += data


    def handle_write(self):
        while self.is_writable:
            sent = self.send(self.buffer)
            sleep(1)

            self.buffer = self.buffer[sent:]
            if len(self.buffer) <= 0:
                self.is_writable = False
            sleep(0.01)

    def _send(self, what):
        self.buffer += what + '\r\n'
        self.is_writable = True

    def run(self):
        self._send('GET / HTTP/1.1\r\n')

while 1:
    logDispatcher() # <- Initate one for each request.
    asyncore.loop(0.1)
    log('All threads are done, next loop in 10', 'CORE')
    sleep(10)

或者你可以简单地做一个完成工作的线程然后死掉。

from threading import *
class worker(Thread):
    def __init__(self, host, postdata)
        Thread.__init__(self)
        self.host = host
        self.postdata = postdata
        self.start()
    def run(self):
        sock.send(self.postdata) #Pseudo, create the socket!

for data in postDataObjects:
    worker('example.com', data)

如果你需要限制线程数量(如果你发送超过5k的帖子或者它可能会对系统造成负担),只需执行while len(enumerate()) > 1000: sleep(0.1)并让looper对象等待一些线程消亡。

答案 2 :(得分:1)

您可能希望使用join方法而不是sleep,然后检查状态。如果你想一次执行一个将解决问题。稍微修改你的代码以测试它似乎工作正常。

import gevent
import requests

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.get, 'http://example.com', params={})

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()
    print greenlet.get()

execute_task(None, None)

给出结果:

About to spawn request
Request spawned, about to call sleep
Greenlet status:  True
<Response [200]>

这个Python进程中是否有更多可能会阻止Gevent运行此greenlet?

答案 3 :(得分:0)

将你的url和params包装在一个列表中,然后一次一对地弹出一对任务池(这里的任务池有一个任务或者是空的),创建线程,从任务池中读取任务,当一个线程获取任务并发送请求,然后从列表中弹出另一个(即这实际上是一个队列列表)