您如何找到csv文件中特定数字的平均值?

时间:2018-12-05 00:09:42

标签: python loops csv math

with open('sortedsimpsons_episodes.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    print("Season 1")
    for idx,row in enumerate(csv_reader):
        if idx>=1 and idx<=13:
            print(f'"{row[1]}" is an episode in season {row[4]}, that has {row[7]} million views and an imdb rating of {row[9]}')

viewsAverage = round((30.3 + 30.4 + 27.6 + 33.5 + 31.2 + 27.1 + 26.7 + 25.4 + 20.2 + 27.4 + 28 + 27.1 + 27.5) / 13,2)
imdbAverage = round((7.4 + 8.3 + 7.9 + 7.5 + 7.8 + 7.9 + 8.2 + 7.8 + 7.8 + 7.6 + 7.7 + 8.1 + 7.5) / 13,2)
print("The average amount of views in season 1 is: "+str(viewsAverage)+ " million.")
print("The average imdb rating of season 1 is: " +str(imdbAverage))
csv_file.close()

CSV文件:

"Krusty Gets Busted" is an episode in season 1, that has 30.4 million views and an imdb rating of 8.3.
"The Call of the Simpsons" is an episode in season 1, that has 27.6 million views and an imdb rating of 7.9.
"Life on the Fast Lane" is an episode in season 1, that has 33.5 million views and an imdb rating of 7.5.
"The Crepes of Wrath" is an episode in season 1, that has 31.2 million views and an imdb rating of 7.8.
"Some Enchanted Evening" is an episode in season 1, that has 27.1 million views and an imdb rating of 7.9.
"Simpsons Roasting on an Open Fire" is an episode in season 1, that has 26.7 million views and an imdb rating of 8.2.
"Bart the Genius" is an episode in season 1, that has 24.5 million views and an imdb rating of 7.8.
"There's No Disgrace Like Home" is an episode in season 1, that has 26.2 million views and an imdb rating of 7.8.
"Moaning Lisa" is an episode in season 1, that has 27.4 million views and an imdb rating of 7.6.
"The Telltale Head" is an episode in season 1, that has 28 million views and an imdb rating of 7.7.
"Bart the General" is an episode in season 1, that has 27.1 million views and an imdb rating of 8.1.
"Homer's Odyssey" is an episode in season 1, that has 27.5 million views and an imdb rating of 7.5.
"Bart Gets an "F"" is an episode in season 2, that has 33.6 million views and an imdb rating of 8.2.
"Two Cars in Every Garage and Three Eyes on Every Fish" is an episode in season 2, that has 26.1 million views and an imdb rating of 8.1.
"Dead Putting Society" is an episode in season 2, that has 25.4 million views and an imdb rating of 8.
"Bart the Daredevil" is an episode in season 2, that has 26.2 million views and an imdb rating of 8.4.

以python打印整个文件时,它很长。它持续了27个季节。我想找到每个季节的观看次数和评分的平均值,而我只知道如何手动执行,如上面的代码所示。该代码可以正常工作并打印出我想要的内容,但是以这种方式进行将使我永远受益。如何在不手动输入所有数字的情况下找到一个季节的平均观看次数?

5 个答案:

答案 0 :(得分:0)

您可以使用词典来存储imdb评分列表或每个季节的观看者列表。

Python有一个不错的默认字典,您可以使用该字典自动为每个季节创建空列表:

from collections import defaultdict

ratings = defaultdict(list)
viewings = defaultdict(list)

for row in csv_reader:
    season, viewing, rating = row[4], row[7], row[9]

    ratings[season].append(rating)
    viewings[season].append(viewing)

然后,您可以获取评分列表,例如,并计算平均值:

>>> from statistics import mean
>>> mean(ratings['season 1'])
7.807692307692307

答案 1 :(得分:0)

在循环过程中,为什么不累加总数并除以计数?

viewsTotal = 0
imdbTotal = 0
total = 0
with open('sortedsimpsons_episodes.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    print("Season 1")
    for idx, row in enumerate(csv_reader):
        if idx >= 1 and idx <= 13:
            viewsTotal += float(row[7])
            imdbTotal += float(row[9])
            total = idx
            print(f'"{row[1]}" is an episode in season {row[4]}, that has {row[7]} million views and an imdb rating of {row[9]}')
viewsAverage = round(viewsTotal / total,2)
imdbAverage = round(imdbTotal / total,2)
print("The average amount of views in season 1 is: "+str(viewsAverage)+ " million.")
print("The average imdb rating of season 1 is: " +str(imdbAverage))

不确定csv_file循环后是否应该计算出较低的打印量和平均值。另外,您也不需要.close(),因为“ with open()”会在文件完成后关闭文件。

答案 2 :(得分:0)

要查找每个季节的观看次数和评分的平均值,首先需要按季节对行进行分组。

我假设:

  • row [1]是标题,
  • row [4]是季节,
  • row [7]是观看次数,
  • row [9]是费率。

所以,我想您有这样的事情(我用None替换了未知值):

rows = [
    ('title1', None, None, None, 1, None, None, 30.4, None, 8.5),
    ('title2', None, None, None, 2, None, None, 27.5, None, 6.5),
    ('title3', None, None, None, 1, None, None, 40.2, None, 4.0),
    ('title4', None, None, None, 1, None, None, 21.9, None, 2.6),
]

要对行进行排序和分组并从行中提取值,可以使用operator.itemgetter,如下所示:

import operator

get_season = operator.itemgetter(4)
get_views = operator.itemgetter(7)
get_rate = operator.itemgetter(9)

有了这个,您可以计算出平均值:

import itertools

rows.sort(key=get_season)
for season, group in itertools.groupby(rows, key=get_season):
    group = list(group)
    count = len(group)
    total_views = sum(get_views(row) for row in group)
    total_rate = sum(get_rate(row) for row in group)
    mean_views = total_views / count
    mean_rate = total_rate / count
    print(f"season {season} - views: {mean_views:.2f}, rate: {mean_rate:.2f}")

您得到:

season 1 - views: 30.83, rate: 5.03
season 2 - views: 27.50, rate: 6.50

就像另一个答案中所述,您还可以使用统计信息模块:

import itertools
import statistics

rows.sort(key=get_season)
for season, group in itertools.groupby(rows, key=get_season):
    group = list(group)
    mean_views = statistics.mean(get_views(row) for row in group)
    mean_rate = statistics.mean(get_rate(row) for row in group)
    print(
        f"season {season} - views: {mean_views:.2f}, rate: {mean_rate:.2f}")

答案 3 :(得分:0)

您的输入文件格式无效。但是,只要稍加脚手架,就可以使用避免使内置csv模块跳闸的避免问题。主要技巧是考虑定界符中的空格。这意味着csv.reader将为文件的每一行返回类似的内容。第一行指示每个项目的索引:

     0        1     2      3         4      5       6       7      8       9       10        11       12    13     14       15      16     17
['"title"', 'is', 'an', 'episode', 'in', 'season', '1,', 'that', 'has', '30.4', 'million', 'views', 'and', 'an', 'imdb', 'rating', 'of', '8.3.']
['"title"', 'is', 'an', 'episode', 'in', 'season', '1,', 'that', 'has', '27.6', 'million', 'views', 'and', 'an', 'imdb', 'rating', 'of', '7.9.']

这是有用的,除了有两个字段的末尾带有多余的字符(例如.,)之外的其他字段使它们无法成为有效的Python文字。这是通过删除这些字段字符串的最后一个字符来实现的。

代码:

from ast import literal_eval
from collections import namedtuple
import csv
from itertools import groupby
from operator import attrgetter, itemgetter
import itertools


# Data definition of fields.
FIELDS =     'title', 'season', 'views', 'rating' # Field names.
INDICES =       0,       6,        9,       17    # Index of each field.
TRUNCATE =               6,                 17    # Last char removal.

def parse(field):
    try:
        return literal_eval(field)  # Interpret as a Python literal.
    except Exception:
        return field # Assume it's an unquoted string.

# Create list or records made from fields of interest in csv file.
Record = namedtuple('Record', FIELDS)
records = []
with open('sortedsimpsons_episodes.csv', newline='') as csv_file:
    for row in csv.reader(csv_file, delimiter=' '):
        for index in TRUNCATE:  # Strip trailing char from designated fields.
            row[index] = row[index][:-1]
        raw_data = itemgetter(*INDICES)(row)  # Raw string data in each field.
        # Convert raw field string data to desired types.
        values = (parse(field) for field in raw_data)
        records.append(Record(*values))

# Calculate and print statistics.
grouper = attrgetter('season')
records.sort(key=grouper)
for season, records in groupby(records, key=grouper):
    records = list(records)
    views_avg = sum(rec.views for rec in records) / len(records)
    imdb_avg = sum(rec.rating for rec in records) / len(records)
    print("Season {}:".format(season))
    print("  Average number of views: {:.2f} million".format(views_avg))
    print("  Average imdb rating: {:.2f}".format(imdb_avg))

输出:

Season 1:
  Average number of views: 28.10 million
  Average imdb rating: 7.84
Season 2:
  Average number of views: 27.82 million
  Average imdb rating: 8.17

答案 4 :(得分:-1)

在这里,这个方法百分百有效:

import numpy as np
def get_avg_rating_and_views(df):
    avg_dict = {}
    i = 0
    for data in df["Field_name_to_scan"]:
      avg_dict[i] = []
      for d in data.split(): 
         try:
            float_val = float(d)
            if len(avg_dict[i]) < 2:
                avg_dict[i].append(float_val)
         except:
            pass
      i = i + 1


    views , imdb_ratings = list(zip(*avg_dict.values()))
    avg_view = np.average(views)
    print("Average View: ",avg_view)
    avg_imdb_ratings = np.average(imdb_ratings)
    print("Imdb average Rating", avg_imdb_ratings)

df = pd.read_csv("your_csv.csv")
get_avg_rating_and_views(df) 

基本上,您需要遍历该特定列中的每一行并获取这两个值。该值与行号相对应地存储,以后可以操纵以仅获取等级或视图。您可以使用numpy或其他任何库来确定观看次数和评论列表的平均值