Tuesday, November 9, 2010

NHibernate performance issues #1: evil List (non-inverse relationhip)

Lists are evil, at least when using NHibernate. You should re-consider if you really need to use specific List implementation, because it has unsuitale index column owned by parent, not children. List can't be used in inverse relationship which implies few (but major) inclusions:
  • extra sql UPDATE to persist mentioned index value
  • unscalable addition to the list - NHibernate needs to fetch all items and add new item after
  • inability to use fast cascade deletion by foreign keys
  • inability to use IStatelessSession for fast data insertion

Basic theorem: you don't need Lists! Furthermore, we'll discuss each bullet in details.

What is inverse relation?

First of all, it's necessary to clarify what inverse relation means. Reference between parent and child isn't hold by parent but child! See following picture:


Here is NHibernate mapping definition for inverse relation with using excellent Fluent NHibernate:
HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  LazyLoad().
  Cascade.AllDeleteOrphan();

Or hbm mapping:
<bag access="field.camelcase" cascade="all-delete-orphan" inverse="true" lazy="true" name="Books" mutable="true">
  <key foreign-key="FK_BOOK_LIBRARY_ID">
    <column name="LIBRARY_ID" />
  </key>
  <one-to-many class="Book" />
</bag>

Unscalable addition to List

Each item stored in standard (non-inverse) List holds index in special column defining collection order. It's simple smallint index.

Imagine the situation the you want to insert any new item to middle of mentioned list. Indexes of all items stored within rest of list need to be incremented. NHibernate doesn't execute any update query incrementing indexes but fetch all items from database.

Lets imagine described process for huge amount of children. Lets say that parent has 50 thousand of children. Object graph is stored in database, no object is cached. Parent is loaded from database, children list is marked as lazy. If you need to insert (or add) new item to collection, all 50 thousand items are loaded too. Again, again and again if you perform the operation.

Addition to List is totally unscalable.

Inability to use fast cascade deletion by foreign keys

Best unified approach to delete huge (various) object graph is to use cascade deletion hanged on foreign keys. I'll described this approach in separated articles in near future.

Cascade deletion is based on automatic removal of orphaned item. If relation between parent and child disappears, orphaned child is removed too. According to described process, cascade deletion is intended to be used along with inverse relation.

Following code snippet displays example of SQL foreign keys with cascade deletion:
alter table [Book] 
  add constraint FK_BOOK_LIBRARY_ID 
  foreign key (LIBRARY_ID) 
  references [Library] 
  on delete cascade

Extra sql update to store item's index

In the case of non-inverse relation, NHibernate inserts parent first and than all children. After these operations, extra update is produced to insert index column. I'll use already described situation counting with 50 thousand children, NHibernate produces 50 thousand of sql inserts and than sends additional 50 thousand of updates to SQL server - it means twice more sql statements!

See the following two sql queries extracted from log:
INSERT INTO [Book] (Isbn, Name, Pages, LIBRARY_ID, Id) VALUES (...);

UPDATE [Book] SET LIBRARY_ID = ... WHERE Id = ...;

What to pick instead of List?

It's simple to write that lists are evil. And what now, what to pick instead? There are three options.

1. At almost all cases, you really don't need List. See next chapter to find out which instance of collection to use.

List's index declares and defines list's order. At almost all cases, the list can be sorted according to any item's property. Does it make any sense to sort the list according to when items were added? I think it doesn't. I think that list needs to be sorted according to item's creation date, item's size, count's of item's replies etc., simply according to any item's property.

As our example defines, list of rentals should be sorted according to rental's start date, see mapping:
HasMany(b => b.Rentals).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("BOOK_ID").
  ForeignKeyConstraintName("FK_RENTAL_BOOK_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  OrderBy("StartOfRental").
  Cascade.AllDeleteOrphan();

2. You need to sort list according to index and you can move index to child entit. Map it as a standard private property and than sort the whole collection according the property - see previous example.

See example of index property placed at children:
private int OrderIndex {
    get { return Book.Rentals.IndexOf(this); }
    set { }
}

The drawback is also that you need to expose parent's children list (instead of ICollection or IEnumerable) because child has to be able to find out it's position within list.

3. You need to sort list according to index, you can't change domain model and it's impossible to sort items according to any item's property. There is no other choice than simply live with non-inverse collection along with it's all disadvantages described above.

Recommended solution how to use collection when using NHibernate persistence
  1. Parent entity contains Add and Remove method for each collection
  2. Use ICollection as collection interface
  3. If you need to expose the whole colletion use IEnumerable - it's just read-only collection, support sequential fetching, etc.
Here is code snippet of Book entity.
public class Book {

    private ICollection<Rental> rentals = new List<Rental>();
    ...
    public void AddRental(String rentee) {
        ...
        rentals.Add(new Rental(this, rentee));
    }
    ...
    public IEnumerable<Rental> Rentals {
        get { return rentals; }
    }
}

Summary
You should be aware of List usage along with NHibernate persistence. It can bring you serious performance issues. Try to re-design you collections to be index independent and provide sorting approach by another than simple index way.

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

You can also see Series of .NET NHibernate performance issues to read all series articles.

2 comments:

Kevin said...

One example where you need "arbitrary" ordering of items is a simple to-do list example, where a user needs to be able to reorder his to-dos arbitrarily. This is something I am dealing with now. A non-inverse relationship for me is not a good solution because we may have a user with hundreds of to-dos in a list. So, I am trying to come up with an efficient solution using a bag with ordering on a property column that is calculated based on the child's position in the list. So far, this is the only way I can see of letting the children maintain their own order. Still, when the user re-orders his items, all to-dos in the list need to be updated with regards to their position or order. So, still, we have potentially hundreds of updates (if the user has hundreds of to-dos.)

Anonymous said...

What you fail to mention is the drawback of using "order-by"

From the NHibernate reference documentation:

"Setting the order-by attribute tells NHibernate to use ListDictionary or ListSet class internally for dictionaries and sets, maintaining the order of the elements. Note that lookup operations on these collections are very slow if they contain more than a few elements."

/Johan H