在烧瓶中,顺序重要吗?

时间:2019-05-19 03:20:33

标签: javascript python web flask

因此,我正在使用碎片来循环浏览职位描述。我正在使用flask创建完整的网络,但是即使将其在vscode中移动到python时,即使在jupyter笔记本上运行良好,它也会给我一个错误提示:AttributeError:'ElementList'对象没有属性'fill'

app = Flask(__name__)

# list to store scraped data
company = []
location = []
job_desc = []
position = []

# Initialize browser to use chrome and show its process.
executable_path = {'executable_path': "chromedriver.exe"}
browser = Browser('chrome', **executable_path, headless=False)
url = "https://www.glassdoor.ca/index.htm"
browser.visit(url)

def scrape_current_page():
    # Getting html of first page
    html = browser.html
    soup = BeautifulSoup(html, "html.parser")
    jobs = soup.find_all("li", class_="jl")

    for job in jobs:
        # Store all info into a list         
        position.append(job.find("div", class_="jobTitle").a.text)
        # ex: Tommy - Singapore
        comp_loc = job.find("div", class_="empLoc").div.text
        comp, loc = comp_loc.split("–")
        # print(comp)
        company.append(comp.strip())
        location.append(loc.strip())

        # ------------- Scrape Job descriptions within a page -----------
        # job description is in another html, therefore retrieve it once again after
        # clicking.
        browser.click_link_by_href(job.find("a", class_="jobLink")["href"])
        html = browser.html
        soup = BeautifulSoup(html, "html.parser")
        job_desc.append(soup.find("div", class_="desc").text)

def scrape_all():
    # grab new html, grab page control elements
    html = browser.html
    soup = BeautifulSoup(html, "html.parser")
    result = soup.find("div", class_="pagingControls").ul
    pages = result.find_all("li")

    # Scrape first page before going to next
    scrape_current_page()
    for page in pages:
        # run if <a> exists since un-clickable do not have <a> skipping < and pg1
        if page.a:
            # within <a> tag click except next button         
            if not page.find("li", class_="Next"):
                try:
                    # Click to goto next page, then scrape it.
                    browser.click_link_by_href(page.a['href'])
                    # --------- call scrape data function here ---------
                    scrape_current_page()
                except:
                    print("This is the last page")

@app.route("/")
def home():
    return render_template("index.html")

@app.route("/scrape/<input>")
def test(input):
    title, loc = input.split("!")
    print(title, f'location = {loc}')

    # Find where we should fill using splinter then fill it up
    job_type = browser.find_by_id("KeywordSearch")
    job_type.fill(title)

    location = browser.find_by_id("LocationSearch")
    location.fill(loc)

    # Clicking button
    browser.find_by_id("HeroSearchButton").click()

    scrape_all()

这是我的代码,我相信到达@ app.route(“ / scrape /”)且有job_type.fill(title)时会发生错误。我不明白为什么它不起作用,我在代码开头实例化了浏览器,并使用它来查找搜索输入位置并简单地填充它。

注意:我正在从javascript中获得价值,例如形式为:数据科学家!巴黎

0 个答案:

没有答案
相关问题