Question

我只是想知道是否有更好的方法来执行此算法。我发现我需要经常进行这种类型的操作，而且我现在这样做的方式需要几个小时，因为我相信它会被认为是一个n ^ 2算法。我将在下面附上。

import csv

with open("location1", 'r') as main:
    csvMain = csv.reader(main)
    mainList = list(csvMain)

with open("location2", 'r') as anno:
    csvAnno = csv.reader(anno)
    annoList = list(csvAnno)

tempList = []
output = []

for full in mainList:
    geneName = full[2].lower()
    for annot in annoList:
        if geneName == annot[2].lower():
            tempList.extend(full)
            tempList.append(annot[3])
            tempList.append(annot[4])
            tempList.append(annot[5])
            tempList.append(annot[6])
            output.append(tempList)

        for i in tempList:
            del i

with open("location3", 'w') as final:
    a = csv.writer(final, delimiter=',')
    a.writerows(output)

我有两个csv文件，每个文件包含15,000个字符串，我希望比较每个列的列，如果它们匹配，则将第二个csv的末尾连接到第一个csv的末尾。任何帮助将不胜感激！

谢谢！

Answer 1

这种方式应该更有效：

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    output.extend(annoMap[geneName])

with open("location3", 'w') as final:
  a = csv.writer(final, delimiter=',')
  a.writerows(output)

它更高效，因为您只需要遍历每个列表一次。字典中的查找平均为O（1），因此您基本上可以获得线性算法。

Answer 2

一种简单的方法是使用像Pandas这样的库。内置函数非常有效。

您可以使用pandas.read_csv()将csv加载到数据框中，然后使用pandas函数对其进行操作。

例如，您可以使用Pandas.merge()合并特定列上的两个数据框（也就是两个csv文件），然后删除不需要的数据框。

如果您有一些数据库知识，那么逻辑就非常相似。

Answer 3

谢谢@limes的帮助。这是我用过的最后一个剧本，以为我会发布它来帮助别人。再次感谢！

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://xmlns.jcp.org/xml/ns/javaee"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_3_1.xsd"
         id="WebApp_ID" version="3.1">

    <display-name>SPY</display-name>

    <context-param>
        <param-name>contextConfigLocation</param-name>
        <param-value>/WEB-INF/spring.xml, WEB-INF/security.xml</param-value>
    </context-param>

    <listener>
        <listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
    </listener>

    <servlet>
        <servlet-name>dispatcher</servlet-name>
        <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
        <load-on-startup>1</load-on-startup>
    </servlet>

    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>

    <servlet-mapping>
        <servlet-name>dispatcher</servlet-name>
        <url-pattern>/</url-pattern>
    </servlet-mapping>

    <filter>
        <filter-name>encoding-filter</filter-name>
        <filter-class>
            org.springframework.web.filter.CharacterEncodingFilter
        </filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
        <init-param>
            <param-name>forceEncoding</param-name>
            <param-value>true</param-value>
        </init-param>
    </filter>

    <filter-mapping>
        <filter-name>encoding-filter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

    <filter>
        <filter-name>springSecurityFilterChain</filter-name>
        <filter-class>org.springframework.web.filter.DelegatingFilterProxy</filter-class>
    </filter>

    <filter-mapping>
        <filter-name>springSecurityFilterChain</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

</web-app>

更有效的方法来执行此搜索算法？

3 个答案: