从文件中删除特定的字符串

时间:2019-05-24 20:26:56

标签: python beautifulsoup

我正在尝试创建一个python程序,该程序将清除样式表中所有未使用的类。我想从样式表中删除DOm中未使用的类。

我设法使用以下代码从DOM中获取所有使用的类:

现在我已经拥有了,所以找到的类将从文件中删除,以便我可以看到它正在工作。我打算将其切换以保留类,并删除所有不属于DOM的内容,如上所述。

from flask import Flask, render_template

import requests
import cssutils
from bs4 import BeautifulSoup

'''

Scrape the given website's html for all class and id use cases within the tags.
Append all classes and ids to a dictionary for later use cases.

Remove all items in stylesheet that aren't in the dictionary / being used in the html.

@author Francesco Hayes
@date May 24, 2019


TODO:
Maybe use the join method to concatenate the rules in between the styles.

'''

WEB_URL = 'http://127.0.0.1:5500/website/index.html'

def get_page_classes(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    return [value for element in soup.find_all(class_=True) for value in element["class"]]


def get_file_classes(file):
    with open(file) as fp:
        return fp.read()


def convert_classes(classes, file_classes):
    new_lines = []    
    new_classes = []

    # loop over existing lines, do your changes, and build up a list of new_line

    for i in range(len(classes)):
        classes[i] = '.' + classes[i]
        new_classes.append(classes[i])
        print(new_classes)


    i = 0
    while i < len(file_classes):

        if file_classes[i] in new_classes:
            new_lines.append(file_classes[i])
            i += 1

            while file_classes[i][0] != '.':
                print(file_classes[i])
                new_lines.append(file_classes[i])
                i += 1

        else:
            i += 1


    return new_lines


def write_lines(file, lines):
    with open(file, 'w') as fp:
        for line in lines:
            fp.writelines(line)


page_classes = get_page_classes(WEB_URL)
print('Classes from Website: ', page_classes)

file_classes = get_file_classes("./website/style.css")
file_classes = file_classes.split()
print('\nClasses from Stylesheet: ', file_classes)

new_lines = convert_classes(page_classes, file_classes)
print('\nThe new stylesheet: ', new_lines)

write_lines("test.css", new_lines) 

我尝试将其作为列表循环遍历,并按其指示符“。”分割类。但是,然后我遇到了必须将新的过滤样式重写为文件的问题。每个班级都需要“。”再次。

本质上,我正在尝试实现一个流程的自动化,这将节省我手动进行操作的时间。

我希望这是有道理的,如果没有,我可以再尝试解释一下。谢谢!

1 个答案:

答案 0 :(得分:0)

最简单的方法是建立新的行列表,然后将其写入文件。

请注意,在进行特殊测试时,最好写入另一个文件。

让函数返回事物,然后将其链接到其他地方也是一种好习惯。这样可以使您的代码保持整洁,并使函数更易于阅读,更易于调试和可重用。

我不清楚您正在通过从DOM和文件中读取类来确切地做什么,所以在这里我将进行总结

from bs4 import BeautifulSoup
import requests

def get_page_classes(url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    return [value for element in soup.find_all(class_=True) for value in element["class"]]

def get_file_classes(file):
    with open(file) as fp:
        return fp.readlines()

def convert_classes(classes, file_classes):
    new_lines = []
    # here you should loop over the lines, do your changes, and build up a list of new_line
    # for line in file_classes:
    #   . .. whatever... 
    #    new_lines.append(...) 
    return new_lines

def write_lines(file, lines):
    with open(target_file, 'w') as fp:
        for line in lines:
            fp.writeline(line)

page_classes = get_page_classes(WEB_URL)
file_classes = get_file_classes("./website/bootstrap.css")
new_lines = convert_classes(page_classes, file_classes)
write_lines("output.css", new_lines) 

将其分解为函数的好处是,您可以打印出诸如page_classes并在其后注释掉各行,以查看您从每个函数中得到了什么。请注意,write_lines函数实际上不需要返回任何内容。