使用python从csv中提取信息

时间:2017-04-02 12:01:52

标签: python csv

我正在尝试从csv中提取每年发布的歌曲数量。我的数据看起来像这样

no,artist,name,year
"1","Bing Crosby","White Christmas","1942"
"2","Bill Haley & his Comets","Rock Around the Clock","1955"
"3","Sinead O'Connor","Nothing Compares 2 U","1990","35.554"
"4","Celine Dion","My Heart Will Go On","1998","35.405"
"5","Bryan Adams","(Everything I Do) I Do it For You","1991"
"6","The Beatles","Hey Jude","1968"
"7","Whitney Houston","I Will Always Love You","1992","34.560"
"8","Pink Floyd","Another Brick in the Wall (part 2)","1980"
"9","Irene Cara","Flashdance... What a Feeling","1983"
"10","Elton John","Candle in the Wind '97","1992"

我的文件包含3000行数据和其他字段,但我有兴趣提取每年发布的歌曲数量

我试图提取年份和歌曲,我的代码在这里,但我是python中的新手,因此我不知道如何解决我的问题。我的代码是

from itertools import islice
import csv


filename = '/home/rob/traintask/top3000songs.csv'
data = csv.reader(open(filename))
# Read the column names from the first line of the file
fields = data.next()[3]  // I tried to read the year columns
print fields
count = 0
for row in data:
    # Zip together the field names and values
    items = zip(fields, row)
    item = {}   \\ here I am lost, i think i should make a dict and set year as key and no of songs as values, but I don't know how to do it
    # Add the value to our dictionary
    for (name, value) in items:
        item[name] = value.strip()
        print 'item: ', item

我做错了。但如果有人给我一些提示或帮助,我怎么能算上一年内发布的歌曲。我会很感激的。

2 个答案:

答案 0 :(得分:2)

2个非常简单的代码行:

import pandas as pd
my_csv=pd.read_csv(filename)

并获得每年的歌曲数量:

songs_per_year= my_csv.groupby('year')['name'].count()

答案 1 :(得分:1)

您可以使用collections模块中的Counter对象..

>>> from collections import Counter
>>> from csv import reader
>>> 
>>> YEAR = 3
>>> with open('file.txt') as f:
...     next(f, None) # discard header
...     year2rel = Counter(int(line[YEAR]) for line in reader(f))
... 
>>> year2rel
Counter({1992: 2, 1942: 2, 1955: 1, 1990: 1, 1991: 1, 1968: 1, 1980: 1, 1983: 1})