爬进csv(python)

时间:2018-05-24 15:42:18

标签: python python-3.x beautifulsoup

我有一个很小的爬虫(python,bs4)。但是如果我想要抓取的文本连续有两个以上的wordwraps(新行),则内容将被写入多个单元格中。

例如:

AAA
BBB
CCC

csv细胞的结果是“AAA BBB CCC”

不好的情况:

AAA
BBB

CCC

结果如下:
单元格1:AAA BBB
单元格2(第二行):CCC

守则是:

...
        beschreibung_container = container.find_all("pre", {"class":"is24qa-objektbeschreibung text-content short-text"}) or ""
        beschreibung = beschreibung_container[0].get_text().strip() if beschreibung_container else ""

        ausstattung_container = container.find_all("pre", {"class":"is24qa-ausstattung text-content short-text"}) or ""
        ausstattung = ausstattung_container[0].get_text().strip() if ausstattung_container else ""

        lage_container = container.find_all("pre", {"class":"is24qa-lage text-content short-text"}) or ""
        lage = lage_container[0].get_text().strip() if lage_container else ""
    except:
        print("Es gab einen Fehler")

    f.write(objektid + "##" + titel + "##" + adresse + "##" + criteria.replace("    ", ";") + "##" + preis.replace("    ", ";") + "##" + energie.replace("    ", ";") + "##" + beschreibung.replace("\n", " ") + "##" + ausstattung.replace("\n", " ") + "##" + lage.replace("\n", " ") + "\n")
...

是否有可能更换所有wordwraps?

1 个答案:

答案 0 :(得分:1)

您可以使用re.sub将与一个或多个换行符匹配的任何内容(// create a square box const l = 100, w = 100, h = 100, roundRadius = 5, bevelRadius = 10; var shape = new THREE.Shape(); shape.moveTo( -l/2 + roundRadius, -w/2 ); shape.lineTo( l/2 - roundRadius, -w/2 ); shape.absarc ( l/2 - roundRadius, -w/2 + roundRadius, roundRadius, -Math.PI/2, 0, false ); shape.lineTo( l/2, w/2 - roundRadius ); shape.absarc ( l/2 - roundRadius, w/2 - roundRadius, roundRadius, 0, Math.PI/2, false ); shape.lineTo( -l/2 + roundRadius, w/2 ); shape.absarc ( -l/2 + roundRadius, w/2 - roundRadius, roundRadius, Math.PI/2, Math.PI, false ); shape.lineTo( -l/2, -w/2 + roundRadius ); shape.absarc ( -l/2 + roundRadius, -w/2 + roundRadius, roundRadius, Math.PI, -Math.PI/2, false ); // extrude it var extrudeSettings = { amount: h, bevelEnabled: true, bevelThickness: bevelRadius, bevelSize: bevelRadius, bevelSegments: 20 }; const geo = new THREE.ExtrudeGeometry( shape, extrudeSettings ); )替换为所需字符串中的空格:

\n

如果您需要替换回车符(re.sub(r'\n+', ' ', str) )以及换行符,您可以使用:

\r

以下是您的代码将如何更改:

re.sub(r'[\r\n]+', ' ', str)
相关问题