SQLAlchemy计算列

时间:2013-06-26 17:05:40

标签: python sqlalchemy calculated-columns

(新的SQLAlchemy用户提醒)我有三个表:一个人,从特定日期开始的人每小时费率和每日时间报告。我正在寻找正确的方法,以便从当天的人员小时费率中扣除时间费用。

是的,我可以在创建时计算该值,并将其作为模型的一部分,但请将此视为总结幕后更复杂数据的示例。我如何计算Time.cost?它是hybrid_propery,column_property还是完全不同的东西?

class Person(Base):
    __tablename__ = 'person'
    personID = Column(Integer, primary_key=True)
    name = Column(String(30), unique=True)

class Payrate(Base):
    __tablename__ = 'payrate'
    payrateID = Column(Integer, primary_key=True)
    personID  = Column(Integer, ForeignKey('person.personID'))
    hourly    = Column(Integer)
    starting  = Column(Date)
    __tableargs__ =(UniqueConstraint('personID', 'starting',
                                     name='uc_peron_starting'))

class Time(Base):
    __tablename__ = 'entry'
    entryID  = Column(Integer, primary_key=True)
    personID = Column(Integer, ForeignKey('person.personID'))
    workedon = Column(Date)
    hours    = Column(Integer)

    person = relationship("Person")

    def __repr__(self):
        return "<{date} {hours}hrs ${0.cost:.02f}>".format(self, 
                      date=self.workedon.isoformat(), hours=to_hours(self.hours))

    @property
    def cost(self):
        '''Cost of entry
        '''
        ## This is where I am stuck in propery query creation
        return self.hours * query(Payrate).filter(
                             and_(Payrate.personID==personID,
                                  Payrate.starting<=workedon
                             ).order_by(
                               Payrate.starting.desc())

2 个答案:

答案 0 :(得分:91)

你在这里遇到的问题,尽可能优雅地解决,使用非常高级SQLAlchemy技术,所以我知道你是一个初学者,但这个答案将向你展示一路走来到最后。但是,解决这样的问题需要一步一步走,你可以在我们经历的过程中以不同的方式得到你想要的答案。

在进入如何混合这个或其他什么之前,你需要考虑SQL。我们如何在任意一系列行中查询Time.cost?我们可以干净地链接Time to Person,因为我们有一个简单的外键。但是要将Time to Payrate链接起来,这个特定的模式很棘手,因为Time不仅通过person_id链接到Payrate,而且还通过workingon链接 - 在SQL中我们最容易使用“time.person_id = person.id AND time”加入到这个中。 workon BETWEEN payrate.start_date AND payrate.end_date“。但是你在这里没有“end_date”,这意味着我们也必须得到它。这个推导是最棘手的部分,所以我想出的就是这样(我已经小写了你的列名):

SELECT payrate.person_id, payrate.hourly, payrate.starting, ending.ending
FROM payrate LEFT OUTER JOIN
(SELECT pa1.payrate_id, MIN(pa2.starting) as ending FROM payrate AS pa1
JOIN payrate AS pa2 ON pa1.person_id = pa2.person_id AND pa2.starting > pa1.starting
GROUP BY pa1.payrate_id
) AS ending ON payrate.payrate_id=ending.payrate_id

可能有其他方法可以实现这一点,但这就是我想出的 - 其他方式几乎肯定会有类似的事情发生(即子查询,连接)。

因此,在付费开始/结束时,我们可以弄清楚查询的外观。我们希望使用BETWEEN将时间条目与日期范围匹配,但最新的payrate条目对于“结束”日期将为NULL,因此解决这个问题的一种方法是在非常高的日期使用COALESCE(另一种方法是使用条件):

SELECT *, entry.hours * payrate_derived.hourly
FROM entry
JOIN
    (SELECT payrate.person_id, payrate.hourly, payrate.starting, ending.ending
    FROM payrate LEFT OUTER JOIN
    (SELECT pa1.payrate_id, MIN(pa2.starting) as ending FROM payrate AS pa1
    JOIN payrate AS pa2 ON pa1.person_id = pa2.person_id AND pa2.starting > pa1.starting
    GROUP BY pa1.payrate_id
    ) AS ending ON payrate.payrate_id=ending.payrate_id) as payrate_derived
ON entry.workedon BETWEEN payrate_derived.starting AND COALESCE(payrate_derived.ending, "9999-12-31")
AND entry.person_id=payrate_derived.person_id
ORDER BY entry.person_id, entry.workedon

现在@hybrid可以在SQLAlchemy中为你做什么,当在SQL表达式级别运行时,就是“entry.hours * payrate_derived.hourly”部分,就是这样。所有JOIN等等,你需要在混合动力车外部提供。

所以我们需要将这个大子查询加入到这个中:

class Time(...):
    @hybrid_property
    def cost(self):
        # ....

    @cost.expression
    def cost(cls):
        return cls.hours * <SOMETHING>.hourly

让我们弄清楚<SOMETHING>是什么。将SELECT构建为对象:

from sqlalchemy.orm import aliased, join, outerjoin
from sqlalchemy import and_, func

pa1 = aliased(Payrate)
pa2 = aliased(Payrate)
ending = select([pa1.payrate_id, func.min(pa2.starting).label('ending')]).\
            select_from(join(pa1, pa2, and_(pa1.person_id == pa2.person_id, pa2.starting > pa1.starting))).\
            group_by(pa1.payrate_id).alias()

payrate_derived = select([Payrate.person_id, Payrate.hourly, Payrate.starting, ending.c.ending]).\
    select_from(outerjoin(Payrate, ending, Payrate.payrate_id == ending.c.payrate_id)).alias()

cost()混合,在表达方面,需要引用payrate_derived(我们将在一分钟内完成python方面):

class Time(...):
    @hybrid_property
    def cost(self):
        # ....

    @cost.expression
    def cost(cls):
        return cls.hours * payrate_derived.c.hourly

然后,为了使用我们的cost()混合,它必须位于具有该连接的查询的上下文中。请注意,我们使用Python的datetime.date.max来获取最大日期(方便!):

print session.query(Person.name, Time.workedon, Time.hours, Time.cost).\
                    select_from(Time).\
                    join(Time.person).\
                    join(payrate_derived,
                            and_(
                                payrate_derived.c.person_id == Time.person_id,
                                Time.workedon.between(
                                    payrate_derived.c.starting,
                                    func.coalesce(
                                        payrate_derived.c.ending,
                                        datetime.date.max
                                    )
                                )
                            )
                    ).\
                    all()

因此,连接很大,而且很笨,我们需要经常这样做,更不用说当我们使用Python内混合时,我们需要在Python中加载相同的集合。我们可以使用relationship()映射到它,这意味着我们必须设置自定义连接条件,但我们还需要使用一种称为非主映射器的鲜为人知的技术实际映射到该子查询。非主映射器为您提供了一种将类映射到某个任意表或SELECT构造的方法,仅用于选择行。我们通常永远不需要使用它,因为Query已经允许我们查询任意列和子查询,但要从relationship()中获取它需要映射。映射需要定义主键,并且关系还需要知道关系的哪一侧是“外来的”。这是这里最先进的部分,在这种情况下它可以像这样:

from sqlalchemy.orm import mapper, relationship, foreign

payrate_derived_mapping = mapper(Payrate, payrate_derived, non_primary=True,
                                        primary_key=[
                                            payrate_derived.c.person_id,
                                            payrate_derived.c.starting
                                        ])
Time.payrate = relationship(
                    payrate_derived_mapping,
                    viewonly=True,
                    uselist=False,
                    primaryjoin=and_(
                            payrate_derived.c.person_id == foreign(Time.person_id),
                            Time.workedon.between(
                                payrate_derived.c.starting,
                                func.coalesce(
                                    payrate_derived.c.ending,
                                    datetime.date.max
                                )
                            )
                        )
                    )

所以这是我们必须看到的最后一次加入。我们现在可以更早地进行查询:

print session.query(Person.name, Time.workedon, Time.hours, Time.cost).\
                    select_from(Time).\
                    join(Time.person).\
                    join(Time.payrate).\
                    all()

最后我们可以将新的payrate关系连接到Python级混合中:

class Time(Base):
    # ...

    @hybrid_property
    def cost(self):
        return self.hours * self.payrate.hourly

    @cost.expression
    def cost(cls):
        return cls.hours * payrate_derived.c.hourly

我们这里的解决方案付出了很多努力,但至少最复杂的部分,付费映射,完全只在一个地方,我们再也不需要再看一遍。

这是一个完整的工作示例:

from sqlalchemy import create_engine, Column, Integer, ForeignKey, Date, \
                    UniqueConstraint, select, func, and_, String
from sqlalchemy.orm import join, outerjoin, relationship, Session, \
                    aliased, mapper, foreign
from sqlalchemy.ext.declarative import declarative_base
import datetime
from sqlalchemy.ext.hybrid import hybrid_property


Base = declarative_base()

class Person(Base):
    __tablename__ = 'person'
    person_id = Column(Integer, primary_key=True)
    name = Column(String(30), unique=True)

class Payrate(Base):
    __tablename__ = 'payrate'
    payrate_id = Column(Integer, primary_key=True)
    person_id  = Column(Integer, ForeignKey('person.person_id'))
    hourly    = Column(Integer)
    starting  = Column(Date)

    person = relationship("Person")
    __tableargs__ =(UniqueConstraint('person_id', 'starting',
                                     name='uc_peron_starting'))

class Time(Base):
    __tablename__ = 'entry'
    entry_id  = Column(Integer, primary_key=True)
    person_id = Column(Integer, ForeignKey('person.person_id'))
    workedon = Column(Date)
    hours    = Column(Integer)

    person = relationship("Person")

    @hybrid_property
    def cost(self):
        return self.hours * self.payrate.hourly

    @cost.expression
    def cost(cls):
        return cls.hours * payrate_derived.c.hourly

pa1 = aliased(Payrate)
pa2 = aliased(Payrate)
ending = select([pa1.payrate_id, func.min(pa2.starting).label('ending')]).\
            select_from(join(pa1, pa2, and_(
                                        pa1.person_id == pa2.person_id,
                                        pa2.starting > pa1.starting))).\
            group_by(pa1.payrate_id).alias()

payrate_derived = select([Payrate.person_id, Payrate.hourly, Payrate.starting, ending.c.ending]).\
    select_from(outerjoin(Payrate, ending, Payrate.payrate_id == ending.c.payrate_id)).alias()

payrate_derived_mapping = mapper(Payrate, payrate_derived, non_primary=True,
                                        primary_key=[
                                            payrate_derived.c.person_id,
                                            payrate_derived.c.starting
                                        ])
Time.payrate = relationship(
                    payrate_derived_mapping,
                    viewonly=True,
                    uselist=False,
                    primaryjoin=and_(
                            payrate_derived.c.person_id == foreign(Time.person_id),
                            Time.workedon.between(
                                payrate_derived.c.starting,
                                func.coalesce(
                                    payrate_derived.c.ending,
                                    datetime.date.max
                                )
                            )
                        )
                    )



e = create_engine("postgresql://scott:tiger@localhost/test", echo=False)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

session = Session(e)
p1 = Person(name='p1')
session.add(p1)

session.add_all([
    Payrate(hourly=10, starting=datetime.date(2013, 5, 17), person=p1),
    Payrate(hourly=15, starting=datetime.date(2013, 5, 25), person=p1),
    Payrate(hourly=20, starting=datetime.date(2013, 6, 10), person=p1),
])

session.add_all([
    Time(person=p1, workedon=datetime.date(2013, 5, 19), hours=10),
    Time(person=p1, workedon=datetime.date(2013, 5, 27), hours=5),
    Time(person=p1, workedon=datetime.date(2013, 5, 30), hours=5),
    Time(person=p1, workedon=datetime.date(2013, 6, 18), hours=12),
])
session.commit()

print session.query(Person.name, Time.workedon, Time.hours, Time.cost).\
                    select_from(Time).\
                    join(Time.person).\
                    join(Time.payrate).\
                    all()

for time in session.query(Time):
    print time.person.name, time.workedon, time.hours, time.payrate.hourly, time.cost

输出(第一行是聚合版本,余数是每个对象):

[(u'p1', datetime.date(2013, 5, 19), 10, 100), (u'p1', datetime.date(2013, 5, 27), 5, 75), (u'p1', datetime.date(2013, 5, 30), 5, 75), (u'p1', datetime.date(2013, 6, 18), 12, 240)]
p1 2013-05-19 10 10 100
p1 2013-05-27 5 15 75
p1 2013-05-30 5 15 75
p1 2013-06-18 12 20 240

答案 1 :(得分:0)

很多时候,我能给出的最好的建议就是做不同的事情。像这样的多表计算列是数据库views的用途。根据时间表(或您想要的任何其他内容)构建视图,并在其中包含计算列,根据视图构建模型,然后重新设置。这也可能减轻数据库的压力。这也是为什么将设计限制为通过自动化migrations可以实现的危险的一个很好的例子。