如何将网页抓取数据放入csv文件

时间:2018-06-06 04:38:45

标签: python csv

我使用jupyter笔记本中的以下代码从网站上删除了数据(基本上是No,Name,Type,Zone等列车详细信息):

如何将'output'中获得的结果放入DataFrame然后放入csv文件?

import requests
from bs4 import BeautifulSoup   
import pandas as pd

r=requests.get("https://indiarailinfo.com/arrivals/kanpur-central-cnb/452")
print(r.text[0:200000])

soup=BeautifulSoup(r.text,'html.parser')
results=soup.find_all('div',attrs={'class':'tdborder'})
results1=soup.find_all('div',attrs={'class':'tdborderhighlight'}) //for 'To' and 'Sch'
lresult=results[11:570]
lresult

for i in range(11,550):
    output=lresult[i].text
print(output)

2 个答案:

答案 0 :(得分:0)

你需要把所有东西都倾倒在一起(最简单的方法) 然后使用该对象导出它 示例

import numpy
a = numpy.asarray([ [1,2,3], [4,5,6], [7,8,9] ])
numpy.savetxt("foo.csv", a, delimiter=",").   

答案 1 :(得分:0)

我不完全确定您希望输出csv看起来如何,但您可以尝试这样的方法将数据转换为数据帧,然后输出到csv:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://indiarailinfo.com/arrivals/kanpur-central-cnb/452'

html = requests.get(url).text

soup = BeautifulSoup(html, 'lxml')
res = soup.find_all('div',attrs={'class':'tdborder'})

headers = [header.text.strip() for header in res[:11]]

lines = [[x.text.strip() for x in res[11:][i:i+11]] for i in range(0, len(res[11:]), 11)]

df = pd.DataFrame(lines, columns=headers)

df.to_csv('trains.csv', encoding='utf-8', index=False)

print(open('trains.csv', 'r').read())

这给了这个csv:

No.,Name,Type,Zone,PF,Arrival Days,From,Sch,Delay,ETA,LKL
12303,Poorva Express (via Patna) (PT),SF,ER,1,S TW  S,HWH,08:05,3h 53m late,03:58,DER/Dadri
12381,Poorva Express (via Gaya) (PT),SF,ER,1,M  TF,HWH,08:15,no arr today,no arr today,n/a
11015,Kushinagar Express (PT),Exp,CR,6,SMTWTFS,LTT,22:45,57m late,01:07,GKP/Gorakhpur Junction
...