Calculating a running total/rolling sum in SQL is a useful skill to have.
It can often come in handy for reporting and even when developing applications. Sometimes your users might want to see a running total of the points they have gained or perhaps the money they have earned. Like many problems in SQL, there are multiple ways you can solve this problem.
You can use analytic functions, self joins or an aggregate table that tracks the running sum. Here are a few examples(Also, skip down to the bottom if you just want to watch these explanations in video form).
Using An Analytic Function
Using an analytic function is the easiest way to calculate a running total. An analytic function lets you partition data by specific field. For instance, in this case, we can break down the rolling sum by driver_id and month. This will give us the running total by customer and month. So every month will start again at 0.
See the query below to see how to use an analytic function:
--We are assuming a month and year field are in the f_daily rides ----table which is abnormal.
--More likely you would have a datekey and a dimension table for all --your dates. However, we didn't want to add an extra join that ------might complicate the the concept we are explaining
SUM(Profit) over (partition by driver_id,year,month order by day)
This might be useful to compare how much profit a driver brings in day-over-day per month. You might be able to spot some sort of trend. Perhaps this can help you spot bad drivers or other trends. Also, it is just good for reporting out to the driver.
They could easily switch between different days to see how much they have made as of x day of the month.
However, using an analytic function in an interview can sometimes lead the interviewer to ask if you know a different way to solve the problem. It is not that your answer is wrong, it is just they might be interested in seeing your thinking.
Using A Self Join
Another option to solve the running total problem is to use a self-join. A self join refers to joining a table to itself. Unlike your typical join where you will use the “=” for all values, we will be joining on multiple values. First, we will join on the driver id and month/year normally and then we will join on the date of the trip. When we join on the date of the trip we will use the >= sign. This will join all the dates from one table that are greater than the dates from the other table.
So if we look at the query below we will see this basic structure. It is a little trickier than just using an analytic function. If you were able to come up with this on your own, great job!
FROM f_daily_rides t1
JOIN f_daily_rides t2
On driver_id =driver_id
And t1.Month = t2.Month
And t1.Year = t2.Year
And t1.day >=t2.day
Group by day
Here is what the data would look like in the subquery. If you notice the t1.day field has repeat values for dates where the t1.day is larger than the t2.day. By doing this you can aggregate on t1.day and get a running total.
Using A Subquery in the Select Clause
One of the last ways we have seen people calculate a running total is using a subquery in the select statement. This has some similarities to a self-join except the self-join occurs inside of the select statement. This is usually unadvised because using this method typically forces the query to scan the table for every row that exists in the original table all over again. This tends to be very inefficient but, we feel it is always good to have a general understanding of all the possible options.
--For simplicity we are not going to focus on the month and year
,(Select sum(profit) from f_daily_rides t2 where t1.day >=t2.day and t1.driver_id = t2.driver_id)
FROM f_daily_rides t1
Designing A Table To Track A Rolling Sum
Besides using direct ad-hoc queries, another option is to design a table to track the running total. What this design looks like will vary on the goal. More than likely the goal will be to either populate an application or dashboard. Based on the requirements changes the overall design. If we were to keep it simple like the example above where we have a rolling sum for a user per month then you just need to insert a new row that appends the previous value if it is for the same month and year.
This would limit how easy it is to be dynamic. For instance, what if you decide that you want to see the running total for every three months?
There are a couple ways you could approach this. Including some of the methods above. We will let you mull over that one!
Please feel free to reach out with any questions or comments. If you have any problems you would like solved, or new solutions then reach out today.
We are focusing on a problem solving series and would love to know what problems you are trying to solve or you would like to see solved.