Why WINDOW functions are changing the game for MySQL


Back when we were deciding on an SQL flavor at Zervant, the fact that MySQL didn't have WINDOW functions was a deal breaker. We had plenty of MySQL experience and not that much in PostgreSQL but I had to go with PostgreSQL anyway - just because of WINDOW functions. But that's about to change, as MySQL is getting this support soon, too.

Imagine you need to calculate the number of active users you have on any given day. Now imagine that the way you have defined an active user is as someone who has done something on your service within e.g. 30 days. The something might be e.g. uploaded something, downloaded something, made a purchase, edited their profile - doesn't matter. And imagine you want to split by dimensions.

Without WINDOW functions you don't have good options. You can accomplish this by either sub queries, which gets very expensive:
SELECT t.id,
         t.count,
         (SELECT SUM(t2.count) FROM t as t2 WHERE t2.id  <= t.id) AS cumulative_sum
    FROM TABLE t
ORDER BY t.id
or MySQL variables, which cannot be used in a VIEW:
SELECT t.id,
         t.count,
         (@running_total := @running_total + t.count) AS cumulative_sum
    FROM TABLE t
    JOIN (SELECT @running_total := 0) r
ORDER BY t.id
There might be a way to use a stored procedure and a table, but this gets complicated to implement.
Code credits: MagePsycho.

Window functions to the rescue

With WINDOW functions, MySQL takes a step towards being a viable alternative for an analytics database. With a WINDOW function you can easily calculate a running total over any dimensions. Placing this logic in a VIEW makes it even easier to do report
SELECT date, SUM(COUNT(*)) OVER w
FROM active
GROUP BY date
WINDOW w AS (PARTITION BY 1 ORDER BY date)
Simple and effective - and enables the kind of calculations you have to do daily for analytics.

If I had to make the choice between MySQL or something like PostgreSQL for an analytics database, this change would make me reconsider. How about you?

Comments

Popular posts from this blog

How to access AWS S3 with pyspark locally using AWS profiles tutorial