删除URL后的所有字符?

时间:2017-08-07 18:05:55

标签: python beautifulsoup python-requests

基本上,我尝试删除网址中的网址扩展后的所有字符,但事实证明这很困难。该应用程序可以处理各种扩展名的各种URL列表。

这是我的来源:

import requests
from bs4 import BeautifulSoup
from time import sleep

#takes userinput for path of panels they want tested
import_file_path = input('Enter the path of the websites to be tested: ')

#takes userinput for path of exported file
export_file_path = input('Enter the path of where we should export the  panels to: ')

#reads imported panels
with open(import_file_path, 'r') as panels:
    panel_list = []
    for line in panels:
        panel_list.append(line)

x = 0

for panel in panel_list:
    url = requests.get(panel)
    soup = BeautifulSoup(url.content, "html.parser")
    forms = soup.find_all("form")
    action = soup.find('form').get('action')

    values = { 
    soup.find_all("input")[0].get("name") : "user",
    soup.find_all("input")[1].get("name") : "pass"
    }


    print(values)

    r = requests.post(action, data=values)
    print(r.headers)
    print(r.status_code)
    print(action)
    sleep(10)
    x += 1

我想要实现的是一个应用程序,它会自动从文本文档中提供的URL列表中测试您的用户名/密码。但是,当抓取动作标记时,BeautifulSoup返回一个不完整的URL,即不返回完整的http://example.com/action.php,它将返回action.php,就像在代码中一样。我能想到的唯一方法就是重申“行动”。变量为' panel'删除网址扩展名后的所有字符,然后执行'操作'。

谢谢!

0 个答案:

没有答案