从列表中删除`\ n`

时间:2017-04-09 00:38:27

标签: python python-3.x web-scraping beautifulsoup

我有一个列表,其中包含从网站上删除的数据。列表是这样的

list1 = ['\nJob Description\n\nDESCRIPTION: Interacts with users and technical team members to analyze requirements and develop
technical design specifications.  Troubleshoot complex issues and make recommendations to improve efficiency and accurac
y. Interpret complex data, analyze results using statistical techniques and provide ongoing reports. Identify, analyze,
and interpret trends or patterns in complex data sets. Filter and "clean data, review reports, and performance indicator
s to locate and correct code problems. Work closely with management to prioritize business and information needs. Locate
 and define new process improvement opportunities. Employ excellent interpersonal and verbal communication skills necess
ary to effectively coordinate interrelated activities with coworkers, end-users, and management. Works autonomously with
 minimal supervision. Provides technical guidance and mentoring to other team members. Multi tasks and balances multiple
 assignments and priorities. Provides timely status updates.\nQUALIFICATIONS: Proven 5 years working experience as a dat
a analyst Technical expertise regarding data models, database design development, data mining and segmentation technique
s Knowledge of and experience with reporting packages (preferably Microsoft BI Stack), databases (SQL, DB2 etc.), and qu
ery language (SQL) Knowledge of statistics and experience using statistical packages for analyzing large datasets Strong
 analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information wi
th attention to detail and accuracy Adept at queries, report writing and presenting findings\nNTT DATA is a leading IT s.............]
  

如何删除" \ n"

请记住,在报废时必须在循环中完成,以便刮取数据," \ n"并删除不需要的空格,并将数据推入csv。

2 个答案:

答案 0 :(得分:1)

试试这个:

list2 = [x.replace('\n', '') for x in list1]

它使用列表推导来遍历list1并从原始成员中创建一个新列表,并在每个项目上调用str.replace以用空字符串替换\n

更多关于python list comprehensions here

要删除空格,请将上面的代码更改为

list2 = [x.replace('\n', '').replace(' ', '') for x in list1]

答案 1 :(得分:1)

从单个字符串中删除\n非常简单。

line = '\nJob Description\n\nDESCRIPTION:'
line.replace('\n', ' ')

你并不是非常具体地说明了什么构成了不想要的空间'但是通过简单的假设,它意味着连续两个空格,一个简单的方法是.replace(' ', ' ')来删除加倍的空格。将两者联系在一起,最终得到:

line.replace('\n', ' ').replace('  ', ' ')

这既简单又快捷。但是它并没有消除所有多余的空间。例如,3或4个空格的序列将变为2个空格。相反,您可以使用splitjoin的组合来删除所有多余的空格。

' '.join(line.split())`

这会将字符串拆分为所有空格(包括换行符,制表符和其他空格),并使用单个空格重新加入它们。如果它不能满足您的需求,可以使用正则表达式,但正则表达式解析不是那么有效但功能更强大。

import re
re.sub('\s{2,}', ' ', line)

用一个空格替换2个或更多空格。

无论您使用哪种方法清理单个字符串,仍然需要将其应用于列表中的每个元素。如果您选择的方法更复杂,则应将其转换为方法:

def process(line):
    return line.replace('\n', ' ').replace('  ',  ' ')

一种天真的方法是在处理每个元素的情况下重建列表。例如,使用列表生成器:

processed_results = [process(line) for line in list]

有一个非常大的列表,这可能是非常低效的。最好的方法是使用一个生成器,它一次只处理一个元素而不重建整个列表。

generated_results = (process(line) for line in list1)

注意它看起来与字符串理解方法几乎完全相同。您可以像使用列表一样遍历它:

for result in generated_results:
    # do something

请记住,生成器在使用时会被消耗,因此如果您需要多次迭代结果,则可能需要使用列表。只需执行以下操作即可将生成器转换为列表:

processed_results = list(generated_results)

TL; DR

最简单最有效的方法是使用splitjoin删除多余的空格,并使用生成器来提高效率以避免重建整个列表:

generated_results = (' '.join(line.split) for line in list1)