我只是想知道是否有更好的方法来执行此算法。我发现我需要经常进行这种类型的操作,而且我现在这样做的方式需要几个小时,因为我相信它会被认为是一个n ^ 2算法。我将在下面附上。
import csv
with open("location1", 'r') as main:
csvMain = csv.reader(main)
mainList = list(csvMain)
with open("location2", 'r') as anno:
csvAnno = csv.reader(anno)
annoList = list(csvAnno)
tempList = []
output = []
for full in mainList:
geneName = full[2].lower()
for annot in annoList:
if geneName == annot[2].lower():
tempList.extend(full)
tempList.append(annot[3])
tempList.append(annot[4])
tempList.append(annot[5])
tempList.append(annot[6])
output.append(tempList)
for i in tempList:
del i
with open("location3", 'w') as final:
a = csv.writer(final, delimiter=',')
a.writerows(output)
我有两个csv文件,每个文件包含15,000个字符串,我希望比较每个列的列,如果它们匹配,则将第二个csv的末尾连接到第一个csv的末尾。任何帮助将不胜感激!
谢谢!
答案 0 :(得分:2)
这种方式应该更有效:
import csv
from collections import defaultdict
with open("location1", 'r') as main:
csvMain = csv.reader(main)
mainList = list(csvMain)
with open("location2", 'r') as anno:
csvAnno = csv.reader(anno)
annoList = list(csvAnno)
output = []
annoMap = defaultdict(list)
for annot in annoList:
tempList = annot[3:] # adapt this to the needed columns
annoMap[annot[2].lower()].append(tempList) # put these columns into the map at position of the column of intereset
for full in mainList:
geneName = full[2].lower()
if geneName in annoMap: # check if matching column exists
output.extend(annoMap[geneName])
with open("location3", 'w') as final:
a = csv.writer(final, delimiter=',')
a.writerows(output)
它更高效,因为您只需要遍历每个列表一次。字典中的查找平均为O(1),因此您基本上可以获得线性算法。
答案 1 :(得分:1)
一种简单的方法是使用像Pandas这样的库。内置函数非常有效。
您可以使用pandas.read_csv()
将csv加载到数据框中,然后使用pandas函数对其进行操作。
例如,您可以使用Pandas.merge()
合并特定列上的两个数据框(也就是两个csv文件),然后删除不需要的数据框。
如果您有一些数据库知识,那么逻辑就非常相似。
答案 2 :(得分:0)
谢谢@limes的帮助。这是我用过的最后一个剧本,以为我会发布它来帮助别人。再次感谢!
<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xmlns.jcp.org/xml/ns/javaee"
xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
id="WebApp_ID" version="3.1">
<display-name>SPY</display-name>
<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>/WEB-INF/spring.xml, WEB-INF/security.xml</param-value>
</context-param>
<listener>
<listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
</listener>
<servlet>
<servlet-name>dispatcher</servlet-name>
<servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
<welcome-file-list>
<welcome-file>index.jsp</welcome-file>
</welcome-file-list>
<servlet-mapping>
<servlet-name>dispatcher</servlet-name>
<url-pattern>/</url-pattern>
</servlet-mapping>
<filter>
<filter-name>encoding-filter</filter-name>
<filter-class>
org.springframework.web.filter.CharacterEncodingFilter
</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>encoding-filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
<filter>
<filter-name>springSecurityFilterChain</filter-name>
<filter-class>org.springframework.web.filter.DelegatingFilterProxy</filter-class>
</filter>
<filter-mapping>
<filter-name>springSecurityFilterChain</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
</web-app>