使用正则表达式在HTML中获取信息

时间:2019-07-26 06:52:44

标签: python regex

我需要缩进每个段落的开头,删除图像文本和段落中的1行间距。

我已经尝试过

x=re.findall(r'<p(.*?)/p>',text)
fullStr = ' '.join(x)
y=re.sub(r'<.*?>',"",fullStr)

text = """"""<span aria-hidden="true" class="css-8i9d0s e13ogyst0">We  asked people on Oahu to give their ethnicity. Many had long answers.</span><span itemProp="copyrightHolder" class="emkp2hg2 css-1nwzsjy e1z0qqy90"><span class="css-1ly73wi e1tej78p0">Credit</span><span><span class="css-1dv1kvn">Credit</span><span>Photographs by Damon Winter/The New York Times; Illustration by Katie Scott</span></span></span></figcaption></figure></div></div></header></div><section name="articleBody" itemProp="articleBody" class="meteredContent css-1i2y565"><div class="css-1fanzo5 StoryBodyCompanionColumn"><div class="css-53u6y8"><p class="css-exrw3m evys1bk0">HONOLULU — Kristin Pauker still remembers her uncle’s warning about Dartmouth. “It’s a white institution,” he said. “You’re going to feel out of place.”</p><p class="css-exrw3m evys1bk0">Dr. Pauker, who is now a psychology professor, is of mixed ancestry, her mother of Japanese descent and her father white from an Italian-Irish background. Applying to colleges, she was keen to leave Hawaii for the East Coast, eager to see something new and different. But almost immediately after she arrived on campus in 1998, she understood what her uncle had meant.</p><p class="css-exrw3m evys1bk0">She encountered a barrage of questions from fellow students. <em class="css-2fg4z9 e1gzwzxm0">What was her ethnicity? Where was she from? Was she Native Hawaiian?</em> The questions seemed innocent on the surface, but she sensed that the students were really asking what box to put her in. And that categorization would determine how they treated her. “It opened my eyes to the fact that not everyone sees race the same way,” she told me.</p></div><aside class="css-o6xoe7"></aside></div><div class="css-1fanzo5 StoryBodyCompanionColumn"><div class="css-53u6y8"><p class="css-exrw3m evys1bk0">Back in Hawaii, being mixed was so common as to be nearly unremarkable. Many of her friends were some mixture of East Asian ethnicities, white, Filipino, Hawaiian and more, and for the most part, everyone hung out with everyone else. The Dartmouth student body, on the other hand, seemed self-segregated. The nonwhite students primarily stuck with their own race — blacks sat with blacks in the cafeteria, Asians with Asians, Native Americans with Native Americans. (Dartmouth, <a class="css-1g7m0tk" href="https://www.dartmouth.edu/~oir/data-reporting/factbook/1999_factbook.pdf" title="" rel="noopener noreferrer" target="_blank">which was around 75 percent white then</a>, has <!-- -->since doubled<!-- --> its share of nonwhite students.)</p></div><aside class="css-o6xoe7"></aside></div><div class="css-a7yk8a e73j0it0"><figure class="css-kyszhr e1g7ppur0" aria-label="media" role="group" itemProp="associatedMedia" itemscope="" itemID="https://static01.nyt.com/images/2019/06/30/opinion/30win1/merlin_156981537_23d689a4-fea5-4a1c-8aad-8ea49c9a4b4f-articleLarge.jpg?quality=90&amp;auto=webp" itemType="http://schema.org/ImageObject"><div class="css-1xdhyk6 erfvjey0"><span class="css-1ly73wi e1tej78p0">Image</span><div class="css-zjzyr8"><div data-testid="lazyimage-container" style="height:580px"></div></div></div><figcaption itemProp="caption description" class="css-1l6g02d e1xdpqjp0"><span aria-hidden="true" class="css-8i9d0s e13ogyst0">Shanti Grimmer, left (Indian/English/Swedish/French/German), and her mother Meredith Grimmer (English/French/German/Swedish)</span></figcaption></figure><figure class="css-kyszhr e1g7ppur0" aria-label="media" role="group" itemProp="associatedMedia" itemscope="" itemID="https://static01.nyt.com/images/2019/06/30/opinion/30win2/merlin_156981420_362b225a-1967-4343-be90-058e88628fa3-articleLarge.jpg?quality=90&amp;auto=webp" itemType="http://schema.org/ImageObject"><div class="css-1xdhyk6 erfvjey0"><span class="css-1ly73wi e1tej78p0">Image</span><div class="css-zjzyr8"><div data-testid="lazyimage-container" style="height:580px"></div></div></div><figcaption itemProp="caption description" class="css-1l6g02d e1xdpqjp0"><span aria-hidden="true" class="css-8i9d0s e13ogyst0">Philip Martin (African-American/Italian)</span></figcaption></figure></div><div class="css-1fanzo5 StoryBodyCompanionColumn"><div class="css-53u6y8"><p class="css-exrw3m evys1bk0">For the first time in her life, she wasn’t sure where she belonged, and she found herself wondering: <em class="css-2fg4z9 e1gzwzxm0">Does it have to be like this?</em></p><p class="css-exrw3m evys1bk0">A few years later, as a graduate student in psychology at Tufts, she began her first study probing that question. Psychologists argue that “essentialist” thinking — ideas about human beings’ unchangeable essence, their inherent inferiority or the threat they supposedly pose — makes racism possible. Dr. Pauker wanted to know when children started expressing essentialist views of race. </p><p class="css-exrw3m evys1bk0">She found that between ages 4 and 11, upper-middle-class children from mostly white neighborhoods around Boston increasingly viewed race as a permanent condition and expressed stereotypes <!-- -->about other racial groups<!-- -->: that blacks were aggressive or, on the flip side, good at basketball; that Asians were submissive and good at math. These children came from public schools in liberal areas. They probably weren’t deliberately taught these stereotypes at home. But they absorbed them from the American ether nonetheless.</p><p class="css-exrw3m evys1bk0">Would children in Hawaii express the same views? Dr. Pauker <!-- -->repeated the study <!-- -->with <!-- -->middle- and upper-middle-class grade-school students in and around Honolulu, and was not entirely surprised to find that in Hawaii, the children, including those who were white, tended not to express the same essentialist ideas about race. They were not race-blind. They recognized skin color, hair texture and other features commonly associated with race. But they did not<!-- --> attribute to race the inherent qualities <!-- -->— aggression or book smarts — that their mainland brethren did. “They didn’t believe that race was biological,” Dr. Pauker told me. </p></div><aside class="css-o6xoe7"></aside></div><div class="css-a7yk8a e73j0it0"><figure class="css-kyszhr e1g7ppur0" aria-label="media" role="group" itemProp="associatedMedia" itemscope="" itemID="https://static01.nyt.com/images/2019/06/30/opinion/30win3/merlin_156981471_c3e6a314-1249-4fa0-82b9-092a2c6f57b2-articleLarge.jpg?quality=90&amp;auto=webp" itemType="http://schema.org/ImageObject"><div class="css-1xdhyk6 erfvjey0"><span class="css-1ly73wi e1tej78p0">Image</span><div class="css-zjzyr8"><div data-testid="lazyimage-container" style="height:580px"></div></div></div><figcaption itemProp="caption description" class="css-1l6g02d e1xdpqjp0"><span aria-hidden="true" class="css-8i9d0s e13ogyst0">Chainton Saldebar (Hawaiian/Filipino/Spanish/Chinese/Italian)</span></figcaption></figure><figure class="css-kyszhr e1g7ppur0" aria-label="media" role="group" itemProp="associatedMedia" itemscope="" itemID="https://static01.nyt.com/images/2019/06/27/opinion/00sub2/merlin_157095930_9dd9a31f-419d-4738-be64-c34cf92dfc20-articleLarge.jpg?quality=90&amp;auto=webp" itemType="http://schema.org/ImageObject"><div class="css-1xdhyk6 erfvjey0"><span class="css-1ly73wi e1tej78p0">Image</span><div class="css-zjzyr8"><div data-testid="lazyimage-container" style="height:580px"></div></div></div><figcaption itemProp="caption description" class="css-1l6g02d e1xdpqjp0"><span aria-hidden="true" class="css-8i9d0s e13ogyst0">Imani Altemus-Williams (Black/Jewish/Native American)</span></figcaption></figure></div><div class="css-1fanzo5 StoryBodyCompanionColumn"><div class="css-53u6y8"><p class="css-exrw3m evys1bk0">She had a hypothesis to explain the difference. Whites dominated in the Boston area schools, but were a minority in Hawaii, and always had been. Hawaii also had the highest percentage of mixed-race people by a long shot in the country. (Among them was our first mixed-race president, Barack Obama, who was born there.) Mixed-race people, <a class="css-1g7m0tk"</a>

我没有换行和缩进。从开始的p标签开始,我仍然有这些东西。

虽然我知道有更好的库可以使用正则表达式解决此问题。

0 个答案:

没有答案