0

I am using Ruby on Rails 3.1.10 in developing a web application.

Objective is to find all users that a user is following.

Let there be two models User and Following

In User model:

has_many :following_users, :through => :followings

When calling user.following_users, rails help generates a query that INNER JOIN between users and followings table by its magical default.

When users table has over 50,000 records while followings table has over 10,000,000 records, the inner join generated is resource demanding.

Any thoughts on how to optimize the performance by avoiding inner joining two big tables?

David Ng
  • 470
  • 4
  • 13
  • Joining tables is how associations work in Rails. You're not going to avoid doing that (short of switching to a noSQL database, I suppose). Is your `followings` table indexed by `user_id` (or whatever the foreign key is that's used in this query)? Proper indexing can make a big difference in query times. – MrTheWalrus Sep 05 '13 at 05:42

1 Answers1

1

To avoid a single query with inner join, you can do 2 select queries by using the following method

# User.rb
# assuming that Following has a followed_id column for user that is being followed

def following_users_nojoin
    @following_users_nojoin ||= User.where("id IN (?)", followings.map(&:followed_id)) 
end

This will not create a join table but would make two sql queries. One to get all the followings that belong to the user (unless it is already in the cache) and second query to find all the followed users. A user_id index on following, as suggested in the comment, would speed up the first query where we get all the followings for the user.

The above method would be faster than a single join query if the followings of a user have already been retrieved.

Read this for details on whether it is faster to make multiple select queries over a single query with join. The best way to find out which one is faster is to benchmark both methods on your production database.

Community
  • 1
  • 1
tihom
  • 7,923
  • 1
  • 25
  • 29
  • Thanks tihom, I have a same thought through, but there are existing method calls `user.following_users.order("followings.id desc")` In this case, a join would be unavoidable? – David Ng Sep 05 '13 at 08:31
  • But when we try to order by `followings.id` , note that `followings` is not there yet. :( – David Ng Sep 05 '13 at 09:45
  • 1
    If this order is the only one you need then we can do this `user.following_users_nojoin.order("field(id, #{followings.reverse.map(&:followed_id).join(',')})")` . This is using the already cached `followings` and generating an ordered list of followed user ids. We can similarly use `followings` outside sql to filter the records. – tihom Sep 05 '13 at 09:55
  • Updated the earlier comment, as I didn't realize you were ordered by a column on followings. – tihom Sep 05 '13 at 10:10
  • Thanks! Though I was going for a more general handling method :P – David Ng Sep 05 '13 at 10:17
  • @David if you still need this, look in Rails eager loading. Sthing like `User.find(params[:id]).includes(:following_users)`. This would eager load `following_users` and `followings` without join. Also `user.following_users` is an Array now so need to user Array methods for sort or filtering e.g. `user.following_users.sort{|x| -x.following.id}` – tihom Sep 19 '13 at 00:19