如何使用Python抓取网站内容

时间:2018-02-01 06:31:57

标签: python beautifulsoup

使用python如何从网站上抓取内容?

  transform(df,Medication=sub("\\s\\d.*","",df$Medication))
  ID                    Medication
1  1                    FOLIC ACID
2  2                     RIBAVIRIN
3  3                      ACARBOSE
4  4                    AmLODIPine
5  5 MAGNESIUM TRISILICATE MIXTURE
6  6                      RESONIUM
7  7        CALCIUM & VIT D TABLET
8  8                          <NA>

1 个答案:

答案 0 :(得分:2)

如果您检查表格,则可以看到它位于此<div data-curpg="1" class="dataContainer"> ... </div>标记内。但是,如果您查看页面源代码,则会显示以下代码:<div data-curpg="1" class="dataContainer"><data_table></data_table></div>

使用JS动态生成<data_table>内容。您无法使用requests模块直接执行JS。你必须为此目的使用Selenium。有关安装和演示的信息,请check this link

您可以这样使用Selenium:

from bs4 import BeautifulSoup
from selenium import webdriver

URL = 'https://economictimes.indiatimes.com/marketstats/pageno-1,pid-58,sortby-CurrentYearRank,sortorder-asc,year-2017.cms'
driver = webdriver.Chrome()
driver.get(URL)
html = driver.page_source
driver.quit()

soup = BeautifulSoup(html, 'html.parser')
for li in soup.find_all('li', class_='w170 alignL'):
    a = li.find('a')
    company_name = a.text
    company_url = a['href']  # This is the link that you were looking for.
    # You can save or print these values however you want.
    print(company_name, company_url)

输出:

Indian Oil Corporation Ltd. /indian-oil-corporation-ltd/stocks/companyid-11924.cms
Reliance Industries Ltd. /reliance-industries-ltd/stocks/companyid-13215.cms
State Bank of India /state-bank-of-india/stocks/companyid-11984.cms
Tata Motors Ltd. /tata-motors-ltd/stocks/companyid-12934.cms
Rajesh Exports Ltd. /rajesh-exports-ltd/stocks/companyid-6650.cms
Bharat Petroleum Corporation Ltd. /bharat-petroleum-corporation-ltd/stocks/companyid-11941.cms
Hindustan Petroleum Corporation Ltd. /hindustan-petroleum-corporation-ltd/stocks/companyid-12078.cms
Oil And Natural Gas Corporation Ltd. /oil-and-natural-gas-corporation-ltd/stocks/companyid-11599.cms
Coal India Ltd. /coal-india-ltd/stocks/companyid-11822.cms
Tata Consultancy Services Ltd. /tata-consultancy-services-ltd/stocks/companyid-8345.cms
ICICI Bank Ltd. /icici-bank-ltd/stocks/companyid-9194.cms
Tata Steel Ltd. /tata-steel-ltd/stocks/companyid-12902.cms
Larsen & Toubro Ltd. /larsen-&-toubro-ltd/stocks/companyid-13447.cms
Hindalco Industries Ltd. /hindalco-industries-ltd/stocks/companyid-13637.cms
Bharti Airtel Ltd. /bharti-airtel-ltd/stocks/companyid-2718.cms
HDFC Bank Ltd. /hdfc-bank-ltd/stocks/companyid-9195.cms
Mahindra & Mahindra Ltd. /mahindra-&-mahindra-ltd/stocks/companyid-11898.cms
NTPC Ltd. /ntpc-ltd/stocks/companyid-12316.cms
Vedanta Ltd. /vedanta-ltd/stocks/companyid-13111.cms
Infosys Ltd. /infosys-ltd/stocks/companyid-10960.cms
Maruti Suzuki India Ltd. /maruti-suzuki-india-ltd/stocks/companyid-11890.cms
Housing Development Finance Corporation Ltd. /housing-development-finance-corporation-ltd/stocks/companyid-13640.cms
Wipro Ltd. /wipro-ltd/stocks/companyid-12799.cms
Axis Bank Ltd. /axis-bank-ltd/stocks/companyid-9175.cms
Punjab National Bank /punjab-national-bank/stocks/companyid-11585.cms
相关问题