正如有人建议的那样,我提出了一个可验证的例子。如果您从中取出大熊猫而只是放置原始值而不是数据框值,那么它会完美地工作。
如果您将大熊猫带回去,如下所示,程序将运行并为print(true_age)返回0。
import pandas as pd
import numpy as np
from datetime import datetime
data = np.array([['','bornYear','bornMonth', 'bornDay','diedYear','diedMonth','diedDay'],
['Record1',1932,8,17,1980,3,22],
['Record2',1950,4,12,1980,3,22]])
df = pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
byear = int(df.iloc[1]['bornYear'])
bmonth = int(df.iloc[1]['bornMonth'])
bday = int(df.iloc[1]['bornDay'])
died_year = df.iloc[1]['diedYear']
died_month = df.iloc[1]['diedMonth']
died_day = df.iloc[1]['diedDay']
now_year = datetime.now().year
now_month = datetime.now().month
now_day = datetime.now().day
age_raw = now_year - byear
true_age = 0
if died_year is not None:
died_year = int(died_year)
died_month = int(died_month)
died_day = int(died_day)
age_raw = float(died_year) - float(byear)
if bmonth > died_month:
if bday > died_day:
true_age = age_raw - 1
elif bday < died_day:
true_age = age_raw
elif bmonth < died_month:
true_age = age_raw
print(true_age)
所以,我有一个熊猫数据框,它是MySQL查询的结果,该查询搜索一个人的名字,然后返回有关他们的一些信息。这样的信息之一就是他们的年龄。该表包含活着的和已故的人。我正在努力做到这一点,以便如果这个人去世了,它将使用他们的实际年龄(去世时),而不是如果他们还活着的话,那将是多少岁。如果它们还活着,那么死亡日期的字段为空;如果它们死了,那么这些字段当然具有值。这是我已经声明的相关变量:
bmonth = int(storage.iloc[0]['birthMonth'])
bday = int(storage.iloc[0]['birthDay'])
byear = int(storage.iloc[0]['birthYear'])
died_year = storage.iloc[0]['deathYear']
died_month = storage.iloc[0]['deathMonth']
died_day = storage.iloc[0]['deathDay']
now_year = datetime.now().year
now_month = datetime.now().month
now_day = datetime.now().day
age_raw = now_year - byear
true_age = 0
现在,我已经将此设计为嵌套的if语句,但是我在某个地方出错了。如果这个人还活着,那么一切都会正常进行。当我打印年龄时,它会输出正确的年龄。但是,如果该人已故,则打印的年龄始终为零。这是嵌套的if语句以及相关的print语句:
#Here are the nested if statements:
if died_year is None:
if bmonth > now_month:
if bday > now_day:
true_age = age_raw - 1
elif bday < now_day:
true_age = age_raw
elif bmonth < now_month:
true_age = age_raw
elif died_year is not None:
died_year = int(died_year)
died_month = int(died_month)
died_day = int(died_day)
age_raw = died_year - byear
if bmonth > died_month:
if bday > died_day:
true_age = age_raw - 1
elif bday < died_day:
true_age = age_raw
elif bmonth < died_month:
true_age = age_raw
#And now the print statement:
print("DOB: "+str(bmonth)+"/"+str(bday)+"/"+str(byear)+" ("+str(true_age)+" years old)")
此外,我还有以下内容,以便在此人死后在输出中返回死亡日期。它工作正常并且返回正确的日期,所以我知道这些值都是正确的:
if died_year is not None:
print("*DECEASED: "+str(died_month)+"/"+str(died_day)+"/"+str(died_year))
注意,直到满足适当的条件,我才将变量dead_year,died_month和dead_day转换为整数。在if语句之外执行此操作将触发错误,因为不能将null值作为int()传递。我觉得我在这里错过了一些非常明显的东西,但也许没有。另外,如果有人有更好的方法来完成所有这些工作,那么我总是乐于学习如何提高效率。
答案 0 :(得分:2)
Pandas对时间序列有出色的支持,因此最好使用适当的工具。将列转换为单个Datetime列后,可以对其进行时间算术:
# demo dataframe
df = pd.DataFrame({
'birthMonth': [5, 2],
'birthDay': [4, 24],
'birthYear': [1924, 1997],
'deathMonth': [3, None],
'deathDay': [1, None],
'deathYear': [2008, None]
})
# convert birth dates to datetimes
birth = pd.to_datetime(df[['birthMonth', 'birthDay', 'birthYear']]
.rename(columns={'birthMonth': 'month', 'birthDay': 'day', 'birthYear': 'year'}))
# convert death dates to datetimes
death = pd.to_datetime(df[['deathMonth', 'deathDay', 'deathYear']]
.rename(columns={'deathMonth':'month', 'deathDay': 'day', 'deathYear': 'year'}))
# calculate age in days, normalizing 'now' to midnight of today
age = (pd.Timestamp.now().normalize() - birth).where(death.isnull(), other=death-birth)
编辑:请参见下面@ALollz的有关时间戳标准化的讨论。
答案 1 :(得分:0)
将每个值转换为日期时间对象然后进行if / elif过滤要容易得多。
import datetime
bmonth = int(storage.iloc[0]['birthMonth'])
bday = int(storage.iloc[0]['birthDay'])
byear = int(storage.iloc[0]['birthYear'])
died_year = storage.iloc[0]['deathYear']
died_month = storage.iloc[0]['deathMonth']
died_day = storage.iloc[0]['deathDay']
start = datetime.datetime(month = bmonth, day=bday, year=byear)
end = datetime.datetime(month=died_month, day=died_day, year=died_year)
(start-end).days#returns the difference between the days
您也可以在其中加入datetime.now()
。
希望有帮助,它将帮助您改善流程。
答案 2 :(得分:0)
您可以定义一个计算人的年龄的函数:
from datetime import date
def calc_age(row):
bm = row['bornMonth']
bd = row['bornDay']
by = row['bornYear']
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
birth_date = date(*[int(i) for i in (by, bm, bd)]) # suppose that all the parameters is not None
try:
end_date = date(*[int(i) for i in (dy, dm, dd)])
except (TypeError, ValueError): # if death date is None
end_date = date.today()
# is birth date after death date or today; if True == 1, else == 0
is_next_year = ((end_date.month, end_date.day) < (birth_date.month, birth_date.day))
age = end_date.year - birth_date.year - is_next_year
return age
将此功能沿行应用于数据框:
df.apply(calc_age, axis=1)
,如果没有遗漏数据,它将返回所有年龄段的人的pd.Series年龄。您可以将其连接到数据框:
df['personsAge'] = df.apply(calc_age, axis=1)
然后添加另一列状态并打印结果:
def is_dead(row):
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
try:
died = date(*[int(i) for i in (dy, dm, dd)])
return True
except ValueError:
return False
df['is_dead'] = df.apply(is_dead, axis=1)
def print_status(row):
bm = row['bornMonth']
bd = row['bornDay']
by = row['bornYear']
dm = row['diedMonth']
dd = row['diedDay']
dy = row['diedYear']
age = row['personsAge']
print("DOB: "+str(bm)+"/"+str(bd)+"/"+str(by)+" ("+str(age)+" years old)")
if row['is_dead']:
print("*DECEASED: "+str(dm)+"/"+str(dd)+"/"+str(dy))
df.apply(print_status, axis=1)
stdout:
DOB: 8/17/1932 (47 years old)
*DECEASED: 3/22/1980
DOB: 4/12/1950 (68 years old)
如果您不喜欢复制粘贴日期选择,请使用Andrey Portnoy's解决方案中的datetime
方法进行替换。