从URLFetch请求中剥离禁止的标头:[' Host']

时间:2015-01-27 22:36:20

标签: python google-app-engine header rss web-scraping

我尝试运行简单的剪贴簿脚本以在gae上生成rss。但得到这个错误。我搜索了所有stackoverflow的答案,但它没有帮助

urlfetch_stub.py:504] Stripped prohibited headers from URLFetch request: ['Host']

我使用这个脚本

# -*- coding: cp1254 -*-
import sys
sys.path.append('libs/')
reload(sys); sys.setdefaultencoding('utf-8')
from bs4 import BeautifulSoup
import urllib
from google.appengine.api import urlfetch


from datetime import datetime
import locale

import PyRSS2Gen
locale.setlocale(locale.LC_ALL, '')
import requests
import codecs

class yenitem:
        baslik= ""
        link=""
        aciklama=""
        zaman=datetime.now()

def parse(url):
     page=urllib2.urlopen(url)
     soup=BeautifulSoup(page.read())

##     request=urllib.Request(url)
##     request.add_header('Accept-Encoding','utf-8')
##     response=urllib2.urlopen(request)
##     response=urlfetch.fetch( url, headers={ "User-Agent" : user_agent } ).content


##     soup = BeautifulSoup(response.read().decode('utf-8', 'ignore'))

     items=[]
##     print soup

     for link in soup.find_all('article',{'class':'item-list'}):

         item=yenitem()

         item.baslik=link.find_all('h2')[0].get_text()
         item.link= link.find_all('a')[0].get('href')
         item.aciklama= link.find_all('div')[1].get_text()

         item.zaman=link.find_all('span')[0].get_text()





         items.append(item)
     return items 

错误日志:

2015-01-28 00:10:23 Running command: "['C:\\Python27\\pythonw.exe', 'C:\\Program Files (x86)\\Google\\google_appengine\\dev_appserver.py', '--skip_sdk_update_check=yes', '--port=8080', '--admin_port=8000', 'C:\\Users\\bigM\\Desktop\\gomhg\\denemer']"
INFO     2015-01-28 00:10:28,510 devappserver2.py:745] Skipping SDK update check.
INFO     2015-01-28 00:10:28,585 api_server.py:172] Starting API server at: http://localhost:61808
INFO     2015-01-28 00:10:28,589 dispatcher.py:186] Starting module "default" running at: http://localhost:8080
INFO     2015-01-28 00:10:28,592 admin_server.py:118] Starting admin server at: http://localhost:8000
WARNING  2015-01-28 00:10:32,867 urlfetch_stub.py:504] Stripped prohibited headers from URLFetch request: ['Host']
Content-Type: text/xml
<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>ff</title><link></link><description></description><lastBuildDate></lastBuildDate><generator>PyRSS2Gen-1.0.0</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Destan yazanlar, yazd&#305;ranlar</title><link>http://www.taraf.com.tr/yazarlar/destan-yazanlar-yazdiranlar/</link><description>
.....

0 个答案:

没有答案
相关问题