Question

我需要创建一个程序来输出第2列=='Kashiwa'的值的行。通过标准输入提供csv格式的行。此外，我还需要删除“，”，“”，换行符和其他特殊字符（如果包含在“名称”列的值中）。这是输入示例：

2
Kashiwa
Name,Campus,LabName
Shin MORISHIA,Kashiwa,Laboratory of Omics
Kioshi ASAy,Kashiwa,Laboratory of Genome Informatics
Yukihido Tomari,Yayoi,Laboratory of RNA Function
Masao Kanobe ,Kashiwa,Laboratory of Large-Scale Bioinformatics

这是我的代码：

 #!usr/bin/env python3

 import sys
 import csv

 data = sys.stdin.readlines()

 chars = ('$','%','^','*', '\n', '"', "," )
 for line in data:
     for c in chars:
         line = ''.join(line.split(c))

 reader = csv.reader(data)
 next(reader)
 next(reader)
 print(",".join(next(reader)))

 for row in reader:

      if row[1] == 'Kashiwa':

         print(",".join(row))

似乎我的程序不会从“名称”列的值中删除特殊字符。我该怎么办？

Answer 1

好吧，在data = sys.stdin.readlines()之后，data是一个字符串列表。

您以这种方式进行处理：

 for line in data:                      # ok line is a variable pointing to a string from data
     for c in chars:                    # ok you process all of your special characters
         line = ''.join(line.split(c))  # line is now a brand new clean string...
                                        #  that you forget at once without changing data!

无论如何，Python字符串是一个不可更改的对象，因此您必须更改列表以包含新行：

 for i, line in enumerate(data):        # ok line is a variable pointing to a string from data
     for c in chars:                    # ok you process all of your special characters
         line = ''.join(line.split(c))  # line is now a brand new clean string...
     data[i] = line                 #  and data uses this new line

但是，如果您只想清理第一列，则无需将所有内容加载到内存中：

 #!usr/bin/env python3

 import sys
 import csv

 next(sys.stdin)
 next(sys.stdin)
 print(next(sys.stdin))
 reader = csv.reader(sys.stdin)

 chars = ('$','%','^','*', '\n', '"', "," )

 for row in reader:
     line = row[0]
     for c in chars:
         line = ''.join(line.split(c))
     row[0] = line

     if row[1] == 'Kashiwa':
         print(",".join(row))

Answer 2

在Todai的某些页面上获得印象。这就是我得到的。我将您提供给我们的数据放入一个csv文件中，以使其更易于阅读。

import pandas

chars = ['$','%','^','*', '\n', '"', "," ]
dataframe = pandas.read_csv("data.csv")
dataframe = dataframe[dataframe.Campus == 'Kashiwa']

for c in chars:
    dataframe["Name"] = dataframe["Name"].str.replace(c, '')
print(dataframe)

我在这里使用熊猫，这是快速阅读csvs的最佳方法，并且具有便捷的方法来更改chars表中字符的存在时更改所有行。在第三行中，如果实验室不在Kishiwa校园中，则可以很容易地删除所有行。我试过了，它有效。希望这可以帮助！

csv文件如下：

Name,Campus,LabName
Shi$n MORISHIA,Kashiwa,Laboratory of Omics
Kio%s$hi ASAy,Kashiwa,Laboratory of Genome Informatics
Yuki%hi**do Tomari,Kashiwa,Laboratory of RNA Function
Masao Kanobe ,Kashiwa,Laboratory of Large-Scale Bioinformatics

这是输出：

     Name            Campus                         LabName
0    Shin MORISHIA  Kashiwa                       Laboratory of Omics
1      Kioshi ASAy  Kashiwa          Laboratory of Genome Informatics
2  Yukihido Tomari  Kashiwa                Laboratory of RNA Function
3    Masao Kanobe   Kashiwa  Laboratory of Large-Scale Bioinformatics

从stdin替换csv中的特殊字符

2 个答案: