ActiveRecord is great for providing developers with an easy-to-use abstraction to the data layer, allowing us to treat database tables in an object oriented fashion and simplifying code required to interface with a database at the application level. However, there are times when the convenience of this abstraction does not outweigh the performance gained by pushing some calculations down to the database level.

A SQL view is a mechanism for storing complicated query logic in the database with an interface similar to what you would expect from a regular table. A view uses the result set of the stored query as an ad-hoc, read-only table, abstracting away the complex nature of the underlying query. View’s are created with a name and a query that defines the “schema” and populates the rows of the view.

Generating reports within a Rails application that present a picture of the database encompassing multiple models may be such a time when aggregate queries are better made at the database level before being passed back to the application. Thankfully, SQL views can be used as a quick and easy method of aggregating data at the database level and ActiveRecord can treat a view as it would with a table.

Moving on with an example, lets say we have a simplified bank accounting system containing Customers, Accounts, and Transactions as follows:

class Customer < ActiveRecord::Base
  has_many :accounts
  has_many :transactions, through: :accounts
end

class Account < ActiveRecord::Base
  belongs_to :customer
  has_many :transactions

  def balance
	transactions.sum('amount_in_cents / 100')
  end
end

class Transaction < ActiveRecord::Base
  belongs_to :account
end

One of the most obvious and commonly accessed data points we have in this example system is the balance of all of the accounts for a Customer. In Ruby terms, ActiveRecord makes this easy:

customer = Customer.find(1)
customer.accounts.each do |account|
  puts "Account #{account.name}, balance: #{account.balance}"
end

Looking at the database requests generated by this small block we can see the beginnings of a potental performance problem down the track:

Account Load (0.4ms)  SELECT "accounts".* FROM "accounts"  WHERE "accounts"."customer_id" = $1  [["customer_id", 1]]
   (5.0ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 3]]
Account Visa, balance: 1148
   (0.2ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 2]]
Account Savings, balance: 813
   (0.1ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 1]]
Account Cheque, balance: 790

This is an example of the commonly named N+1 queries problem, where we are required to fetch a set of records from the database and then make one or more additional database queries per record. In the example above, we are fetching a set of 3 accounts for a customer in the first query, but then making an additional SUM query per account to calculate the balance in dollars.

Now imagine that we want to generate a report for our bank that can show us the balance of all accounts for all customers of the system. Following the pattern from above, this translates to the following Ruby code:

Customer.all.each do |customer|
  customer.accounts.each do |account|
    puts "Account #{account.name}, balance: #{account.balance}"
  end
end

We are now making 1 query to fetch all of the Customer records, a query per customer to fetch all of the related accounts, and finally an additional SUM query for each account to calculate the balance. If we have 100,000 customers each with 3 accounts then ActiveRecord will generate 1 + 100,000 + (100,000 * 3) queries to provide back this data!

Eager Loading

Fortunately, ActiveRecord does have some mechanisms to avoid the N+1 query problem. Eager loading is a way that allows us to tell ActiveRecord that we will be needing to load a set of record and some given associations for each record in a single query. Following on from the above example, we know that when we generate our report we will need to retrieve every account for all customers and we can specify this to ActiveRecord using includes:

Customer.all.includes(:accounts).each do |customer|
  customer.accounts.each do |account|
    puts "Account #{account.name}, balance: #{account.balance}"
  end
end

ActiveRecord will now know that we are planning on accessing all of the accounts associated with our set of customers and it can save making multiple requests to the database by loading all of these accounts in a single query like so:

Customer Load (0.5ms)  SELECT "customers".* FROM "customers"
  Account Load (18.1ms)  SELECT "accounts".* FROM "accounts"  WHERE "accounts"."customer_id" IN (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
   (26.3ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 4]]
Account Account #0, balance: 884
   (0.2ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 5]]
Account Account #1, balance: 1086
   (0.2ms)  SELECT SUM(amount_in_cents / 100) AS sum_id FROM "transactions"  WHERE "transactions"."account_id" = $1  [["account_id", 6]]
Account Account #2, balance: 926
...

As you will notice from the log above, this only solves half of our problem. We are still making N+1 queries per each account to calculate the balance. In this trivial example, the easiest solution is to eager load all transactions and refactor the #balance method to calculate the sum of the transaction amounts in Ruby rather than making a SQL SUM query, but keeping this calculation at the database level is much more efficient. A better solution is to encapsulate these calculations within the database and provide ActiveRecord with an interface to the results - this is where PostgreSQL View’s can help us.

View Backed Models

Continuing with our example above - our goal is to make a view that calculates “all of the account balances for all of the accounts for all of our customers”. In SQL, this can be achieved as follows:

CREATE VIEW account_balances AS (
  SELECT
    accounts.*,
    SUM(transactions.amount_in_cents / 100) as balance
  FROM accounts
  INNER JOIN transactions
  ON transactions.account_id = accounts.id
  GROUP BY accounts.id
);

In this statement we are creating our view named account_balances and providing a query that INNER JOINS accounts and transactions and uses the SUM aggregate grouped by accounts.id to calculate the balance for each account. PostgreSQL will now abstract this query logic away for us and we can treat the view as we would any other table for read-only queries:

SELECT * FROM account_balances;

id |    name    | customer_id |         created_at         |         updated_at         | balance
----+------------+-------------+----------------------------+----------------------------+---------
  8 | Account #1 |           3 | 2014-08-06 22:56:55.215964 | 2014-08-06 22:56:55.215964 |    1125
 16 | Account #0 |           6 | 2014-08-06 22:56:55.230804 | 2014-08-06 22:56:55.230804 |     893
 28 | Account #0 |          10 | 2014-08-06 22:56:55.25922  | 2014-08-06 22:56:55.25922  |     939
 15 | Account #2 |           5 | 2014-08-06 22:56:55.228934 | 2014-08-06 22:56:55.228934 |     979
 ... (30 rows)

Now because we can treat our view as we would any other table when querying, we can now create an ActiveRecord model that uses this view and it will behave just like any other ActiveRecord model!

class ViewModels::AccountBalance < ActiveRecord::Base
  self.primary_key = :id
  self.table_name = :account_balances
end

2.1.1 :003 > ViewModels::AccountBalance
 => ViewModels::AccountBalance(id: integer, name: string, customer_id: integer, created_at: datetime, updated_at: datetime, balance: integer)
 
 2.1.1 :004 > ViewModels::AccountBalance.all
  ViewModels::AccountBalance Load (1.8ms)  SELECT "account_balances".* FROM "account_balances"
 => #<ActiveRecord::Relation [#<ViewModels::AccountBalance id: 8, name: "Account #1", customer_id: 3, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 1125>, #<ViewModels::AccountBalance id: 16, name: "Account #0", customer_id: 6, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 893>, ...]>

We can query our AccountBalance model just like we would with an Account:

ViewModels::AccountBalance.find(4)
  ViewModels::AccountBalance Load (4.5ms)  SELECT  "account_balances".* FROM "account_balances"  WHERE "account_balances"."id" = $1 LIMIT 1  [["id", 4]]
 => #<ViewModels::AccountBalance id: 4, name: "Account #0", customer_id: 2, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 884>

In fact, because our AccountBalance model has a customer_id attribute that it inherits from the accounts table, we can even use our new view model in associations:

class Customer < ActiveRecord::Base
  has_many :accounts
  has_many :accounts_with_balance, class_name: 'ViewModels::AccountBalance'
end

class ViewModels::AccountBalance < ActiveRecord::Base
  self.primary_key = :id
  self.table_name = :account_balances

  belongs_to :customer
end

Customer.first.accounts_with_balance
2.1.1 :001 > Customer.first.account_balances
  Customer Load (1.5ms)  SELECT  "customers".* FROM "customers"   ORDER BY "customers"."id" ASC LIMIT 1
  ViewModels::AccountBalance Load (1.5ms)  SELECT "account_balances".* FROM "account_balances"  WHERE "account_balances"."customer_id" = $1  [["customer_id", 2]]
 => #<ActiveRecord::Associations::CollectionProxy [#<ViewModels::AccountBalance id: 6, name: "Account #2", customer_id: 2, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 926>, #<ViewModels::AccountBalance id: 5, name: "Account #1", customer_id: 2, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 1086>, #<ViewModels::AccountBalance id: 4, name: "Account #0", customer_id: 2, created_at: "2014-08-06 22:56:55", updated_at: "2014-08-06 22:56:55", balance: 884>]>
 

Now back to our original example. Because we now have a named association accounts_with_balance on Customer, we can combine our original optimisation of eager loading the accounts with our AccountBalance model to efficiently retrieve all the account balances in only two queries:

Customer.all.includes(:accounts_with_balance).each do |customer|
  customer.accounts_with_balance.each do |account|
    puts "Account #{account.name}, balance: #{account.balance}"
  end
end

Customer Load (0.7ms)  SELECT "customers".* FROM "customers"
  ViewModels::AccountBalance Load (2.1ms)  SELECT "account_balances".* FROM "account_balances"  WHERE "account_balances"."customer_id" IN (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)

Account #0, balance: 884
Account #2, balance: 926
Account #1, balance: 1086
Account #1, balance: 1125
...

And now we have completely removed both of our N+1 query occurrences, while still maintaining an ActiveRecord style usage of our models. Now if your requirements are signficantly more complicated that this then you are more likely to need something a bit more robust that the solution I have described here, but view backed models can handle small, expensive queries and has the advantage of being quick to implement.

In our example above I introduced a namespace for our view-backed models as ViewModels. This is a suggestion to help anyone reading the code that while the AccountBalance class may look like an ActiveRecord model, it is only a read-only interface to our database view and therefore operations like AccountBalance.create are not permitted and will cause an ActiveRecord::StatementInvalid exception to be raised.

I recommend using an additional rake tasks to manage creating and updating views in your database rather than migrations, see this gist for an example.