GroupBy and Sum in SQLAlchemy?

34,194

There are a few issues here; you don't seem to be querying the right things. It's meaningless to select an Expense object when grouping by Expense.date. There needs to be some join condition between CostCenter and Expense, otherwise the rows will be duplicated, each count for each cost center but with no relation between the two.

Your query should look like this:

session.query(
    Expense.date,
    func.sum(Expense.value).label('total')
).join(Expense.cost_center
).filter(CostCenter.id.in_([2, 3])
).group_by(Expense.date
).all()

producing this sql:

SELECT expense.date AS expense_date, sum(expense.value) AS total 
FROM expense JOIN cost_center ON cost_center.id = expense.cost_center_id 
WHERE cost_center.id IN (?, ?) GROUP BY expense.date

Here is a simple runnable example:

from datetime import datetime
from sqlalchemy import create_engine, Column, Integer, ForeignKey, Numeric, DateTime, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session, relationship

engine = create_engine('sqlite://', echo=True)
session = Session(bind=engine)
Base = declarative_base(bind=engine)


class CostCenter(Base):
    __tablename__ = 'cost_center'

    id = Column(Integer, primary_key=True)


class Expense(Base):
    __tablename__ = 'expense'

    id = Column(Integer, primary_key=True)
    cost_center_id = Column(Integer, ForeignKey(CostCenter.id), nullable=False)
    value = Column(Numeric(8, 2), nullable=False, default=0)
    date = Column(DateTime, nullable=False)

    cost_center = relationship(CostCenter, backref='expenses')


Base.metadata.create_all()

session.add_all([
    CostCenter(expenses=[
        Expense(value=10, date=datetime(2014, 8, 1)),
        Expense(value=20, date=datetime(2014, 8, 1)),
        Expense(value=15, date=datetime(2014, 9, 1)),
    ]),
    CostCenter(expenses=[
        Expense(value=45, date=datetime(2014, 8, 1)),
        Expense(value=40, date=datetime(2014, 9, 1)),
        Expense(value=40, date=datetime(2014, 9, 1)),
    ]),
    CostCenter(expenses=[
        Expense(value=42, date=datetime(2014, 7, 1)),
    ]),
])
session.commit()

base_query = session.query(
    Expense.date,
    func.sum(Expense.value).label('total')
).join(Expense.cost_center
).group_by(Expense.date)

# first query considers center 1, output:
# 2014-08-01: 30.00
# 2014-09-01: 15.00
for row in base_query.filter(CostCenter.id.in_([1])).all():
    print('{}: {}'.format(row.date.date(), row.total))

# second query considers centers 1, 2, and 3, output:
# 2014-07-01: 42.00
# 2014-08-01: 75.00
# 2014-09-01: 95.00
for row in base_query.filter(CostCenter.id.in_([1, 2, 3])).all():
    print('{}: {}'.format(row.date.date(), row.total))
Share:
34,194
TH22
Author by

TH22

Updated on July 09, 2022

Comments

  • TH22
    TH22 almost 2 years

    I am trying to group a few fields in a table, and then sum the groups, but they are getting double counted.

    My models are as follows:

    class CostCenter(db.Model):
         __tablename__ = 'costcenter'
         id = db.Column(db.Integer, primary_key=True, autoincrement=True)
         name = db.Column(db.String)
         number = db.Column(db.Integer)
    
    class Expense(db.Model):
    
        __tablename__ = 'expense'
        id = db.Column(db.Integer, primary_key=True, autoincrement=True)
        glitem_id = db.Column(db.Integer, db.ForeignKey('glitem.id'))
        glitem = db.relationship('GlItem')
        costcenter_id = db.Column(db.Integer, db.ForeignKey('costcenter.id'))
        costcenter = db.relationship('CostCenter')
        value = db.Column(db.Float)
        date = db.Column(db.Date)
    

    I have been using:

    expenses=db.session.query(Expense,func.sum(Expense.value)).group_by(Expense.date).filter(CostCenter.id.in_([1,2,3]))
    

    When I print expenses it shows the SQL statement that follows. It looks correct to me, but I am not that familiar with SQL. The problem is that the values it outputs as sum_1 are being counted multiple times. If I have [1] item in the "in statment" it will sum all three. If I have [1,2], it will sum all three and then double it, and if i have [1,2,3], it will sum all three and triple it. I am not sure why it is counting multiple times. How do I fix this?

    SELECT expense.id AS expense_id, expense.glitem_id AS expense_glitem_id, expense.costcenter_id AS         expense_costcenter_id, expense.value AS expense_value, expense.date AS expense_date, sum(expense.value) AS sum_1 
    FROM expense, costcenter 
    WHERE costcenter.id IN (:id_1, :id_2, :id_3) GROUP BY expense.date
    

    Thanks!