ActiveRecord is great for providing developers with an easy-to-use abstraction to the data layer, allowing us to treat database tables in an object oriented fashion and simplifying code required to interface with a database at the application level. However, there are times when the convenience of this abstraction does not outweigh the performance gained by pushing some calculations down to the database level.
A SQL view is a mechanism for storing complicated query logic in the database with an interface similar to what you would expect from a regular table. A view uses the result set of the stored query as an ad-hoc, read-only table, abstracting away the complex nature of the underlying query. View’s are created with a name and a query that defines the “schema” and populates the rows of the view.
Generating reports within a Rails application that present a picture of the database encompassing multiple models may be such a time when aggregate queries are better made at the database level before being passed back to the application. Thankfully, SQL views can be used as a quick and easy method of aggregating data at the database level and ActiveRecord can treat a view as it would with a table.
Moving on with an example, lets say we have a simplified bank accounting system containing Customers, Accounts, and Transactions as follows:
One of the most obvious and commonly accessed data points we have in this example system is the balance of all of the accounts for a Customer. In Ruby terms, ActiveRecord makes this easy:
Looking at the database requests generated by this small block we can see the beginnings of a potental performance problem down the track:
This is an example of the commonly named N+1 queries problem, where we are required to fetch a set of records from the database and then make one or more additional database queries per record. In the example above, we are fetching a set of 3 accounts for a customer in the first query, but then making an additional
SUM query per account to calculate the balance in dollars.
Now imagine that we want to generate a report for our bank that can show us the balance of all accounts for all customers of the system. Following the pattern from above, this translates to the following Ruby code:
We are now making 1 query to fetch all of the
Customer records, a query per customer to fetch all of the related accounts, and finally an additional
SUM query for each account to calculate the balance. If we have 100,000 customers each with 3 accounts then ActiveRecord will generate 1 + 100,000 + (100,000 * 3) queries to provide back this data!
Fortunately, ActiveRecord does have some mechanisms to avoid the N+1 query problem. Eager loading is a way that allows us to tell ActiveRecord that we will be needing to load a set of record and some given associations for each record in a single query. Following on from the above example, we know that when we generate our report we will need to retrieve every account for all customers and we can specify this to ActiveRecord using
ActiveRecord will now know that we are planning on accessing all of the accounts associated with our set of customers and it can save making multiple requests to the database by loading all of these accounts in a single query like so:
As you will notice from the log above, this only solves half of our problem. We are still making N+1 queries per each account to calculate the balance. In this trivial example, the easiest solution is to eager load all transactions and refactor the
#balance method to calculate the sum of the transaction amounts in Ruby rather than making a SQL
SUM query, but keeping this calculation at the database level is much more efficient. A better solution is to encapsulate these calculations within the database and provide ActiveRecord with an interface to the results - this is where PostgreSQL View’s can help us.
View Backed Models
Continuing with our example above - our goal is to make a view that calculates “all of the account balances for all of the accounts for all of our customers”. In SQL, this can be achieved as follows:
In this statement we are creating our view named
account_balances and providing a query that INNER JOINS
transactions and uses the
SUM aggregate grouped by
accounts.id to calculate the balance for each account. PostgreSQL will now abstract this query logic away for us and we can treat the view as we would any other table for read-only queries:
Now because we can treat our view as we would any other table when querying, we can now create an ActiveRecord model that uses this view and it will behave just like any other ActiveRecord model!
We can query our
AccountBalance model just like we would with an
In fact, because our
AccountBalance model has a
customer_id attribute that it inherits from the
accounts table, we can even use our new view model in associations:
Now back to our original example. Because we now have a named association
Customer, we can combine our original optimisation of eager loading the accounts with our
AccountBalance model to efficiently retrieve all the account balances in only two queries:
And now we have completely removed both of our N+1 query occurrences, while still maintaining an ActiveRecord style usage of our models. Now if your requirements are signficantly more complicated that this then you are more likely to need something a bit more robust that the solution I have described here, but view backed models can handle small, expensive queries and has the advantage of being quick to implement.
In our example above I introduced a namespace for our view-backed models as
ViewModels. This is a suggestion to help anyone reading the code that while the
AccountBalance class may look like an ActiveRecord model, it is only a read-only interface to our database view and therefore operations like
AccountBalance.create are not permitted and will cause an
ActiveRecord::StatementInvalid exception to be raised.
I recommend using an additional rake tasks to manage creating and updating views in your database rather than migrations, see this gist for an example.