Skip to main content

NHibernate performance issues #1: evil List (non-inverse relationhip)

Lists are evil, at least when using NHibernate. You should re-consider if you really need to use specific List implementation, because it has unsuitale index column owned by parent, not children. List can't be used in inverse relationship which implies few (but major) inclusions:
  • extra sql UPDATE to persist mentioned index value
  • unscalable addition to the list - NHibernate needs to fetch all items and add new item after
  • inability to use fast cascade deletion by foreign keys
  • inability to use IStatelessSession for fast data insertion

Basic theorem: you don't need Lists! Furthermore, we'll discuss each bullet in details.

What is inverse relation?

First of all, it's necessary to clarify what inverse relation means. Reference between parent and child isn't hold by parent but child! See following picture:


Here is NHibernate mapping definition for inverse relation with using excellent Fluent NHibernate:
HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  LazyLoad().
  Cascade.AllDeleteOrphan();

Or hbm mapping:
<bag access="field.camelcase" cascade="all-delete-orphan" inverse="true" lazy="true" name="Books" mutable="true">
  <key foreign-key="FK_BOOK_LIBRARY_ID">
    <column name="LIBRARY_ID" />
  </key>
  <one-to-many class="Book" />
</bag>

Unscalable addition to List

Each item stored in standard (non-inverse) List holds index in special column defining collection order. It's simple smallint index.

Imagine the situation the you want to insert any new item to middle of mentioned list. Indexes of all items stored within rest of list need to be incremented. NHibernate doesn't execute any update query incrementing indexes but fetch all items from database.

Lets imagine described process for huge amount of children. Lets say that parent has 50 thousand of children. Object graph is stored in database, no object is cached. Parent is loaded from database, children list is marked as lazy. If you need to insert (or add) new item to collection, all 50 thousand items are loaded too. Again, again and again if you perform the operation.

Addition to List is totally unscalable.

Inability to use fast cascade deletion by foreign keys

Best unified approach to delete huge (various) object graph is to use cascade deletion hanged on foreign keys. I'll described this approach in separated articles in near future.

Cascade deletion is based on automatic removal of orphaned item. If relation between parent and child disappears, orphaned child is removed too. According to described process, cascade deletion is intended to be used along with inverse relation.

Following code snippet displays example of SQL foreign keys with cascade deletion:
alter table [Book] 
  add constraint FK_BOOK_LIBRARY_ID 
  foreign key (LIBRARY_ID) 
  references [Library] 
  on delete cascade

Extra sql update to store item's index

In the case of non-inverse relation, NHibernate inserts parent first and than all children. After these operations, extra update is produced to insert index column. I'll use already described situation counting with 50 thousand children, NHibernate produces 50 thousand of sql inserts and than sends additional 50 thousand of updates to SQL server - it means twice more sql statements!

See the following two sql queries extracted from log:
INSERT INTO [Book] (Isbn, Name, Pages, LIBRARY_ID, Id) VALUES (...);

UPDATE [Book] SET LIBRARY_ID = ... WHERE Id = ...;

What to pick instead of List?

It's simple to write that lists are evil. And what now, what to pick instead? There are three options.

1. At almost all cases, you really don't need List. See next chapter to find out which instance of collection to use.

List's index declares and defines list's order. At almost all cases, the list can be sorted according to any item's property. Does it make any sense to sort the list according to when items were added? I think it doesn't. I think that list needs to be sorted according to item's creation date, item's size, count's of item's replies etc., simply according to any item's property.

As our example defines, list of rentals should be sorted according to rental's start date, see mapping:
HasMany(b => b.Rentals).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("BOOK_ID").
  ForeignKeyConstraintName("FK_RENTAL_BOOK_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  OrderBy("StartOfRental").
  Cascade.AllDeleteOrphan();

2. You need to sort list according to index and you can move index to child entit. Map it as a standard private property and than sort the whole collection according the property - see previous example.

See example of index property placed at children:
private int OrderIndex {
    get { return Book.Rentals.IndexOf(this); }
    set { }
}

The drawback is also that you need to expose parent's children list (instead of ICollection or IEnumerable) because child has to be able to find out it's position within list.

3. You need to sort list according to index, you can't change domain model and it's impossible to sort items according to any item's property. There is no other choice than simply live with non-inverse collection along with it's all disadvantages described above.

Recommended solution how to use collection when using NHibernate persistence
  1. Parent entity contains Add and Remove method for each collection
  2. Use ICollection as collection interface
  3. If you need to expose the whole colletion use IEnumerable - it's just read-only collection, support sequential fetching, etc.
Here is code snippet of Book entity.
public class Book {

    private ICollection<Rental> rentals = new List<Rental>();
    ...
    public void AddRental(String rentee) {
        ...
        rentals.Add(new Rental(this, rentee));
    }
    ...
    public IEnumerable<Rental> Rentals {
        get { return rentals; }
    }
}

Summary
You should be aware of List usage along with NHibernate persistence. It can bring you serious performance issues. Try to re-design you collections to be index independent and provide sorting approach by another than simple index way.

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

You can also see Series of .NET NHibernate performance issues to read all series articles.

Comments

Kevin said…
One example where you need "arbitrary" ordering of items is a simple to-do list example, where a user needs to be able to reorder his to-dos arbitrarily. This is something I am dealing with now. A non-inverse relationship for me is not a good solution because we may have a user with hundreds of to-dos in a list. So, I am trying to come up with an efficient solution using a bag with ordering on a property column that is calculated based on the child's position in the list. So far, this is the only way I can see of letting the children maintain their own order. Still, when the user re-orders his items, all to-dos in the list need to be updated with regards to their position or order. So, still, we have potentially hundreds of updates (if the user has hundreds of to-dos.)
Anonymous said…
What you fail to mention is the drawback of using "order-by"

From the NHibernate reference documentation:

"Setting the order-by attribute tells NHibernate to use ListDictionary or ListSet class internally for dictionaries and sets, maintaining the order of the elements. Note that lookup operations on these collections are very slow if they contain more than a few elements."

/Johan H

Popular posts from this blog

Performance Battle of NoSQL blob storages #1: Cassandra

Preface We spend last five years on HP Service Virtualization using MsSQL database . Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time - we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved - we had to learn a lot of stuff. Transactions, lock escalation , isolation levels , clustered and non clustered indexes, buffered reading, index structure and it's persistence, GUID ids in clustered indexes , bulk importing , omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database. Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavio

NHibernate performance issues #3: slow inserts (stateless session)

The whole series of NHibernate performance issues isn't about simple use-cases. If you develop small app, such as simple website, you don't need to care about performance. But if you design and develop huge application and once you have decided to use NHibernate you'll solve various sort of issue. For today the use-case is obvious: how to insert many entities into the database as fast as possible? Why I'm taking about previous stuff? The are a lot of articles how the original NHibernate's purpose isn't to support batch operations , like inserts. Once you have decided to NHibernate, you have to solve this issue. Slow insertion The basic way how to insert mapped entity into database is: SessionFactory.GetCurrentSession().Save(object); But what happen when I try to insert many entities? Lets say, I want to persist 1000 libraries each library has 100 books = 100k of books each book has 5 rentals - there are 500k of rentals  It's really slow! The inser

Java, Docker, Spring boot ... and signals

I spend last couple weeks working on java apps running within docker containers deployed on clustered CoreOS machines . It's pretty simple to run java app within a docker container. You just have to choose a base image for your app and write a docker file. Note that docker registry contains many java distributions usually based on open jdk. We use our internal image for Oracle's Java 8 , build on top of something like this docker file . Once you make a decision whether oracle or openjdk, you can start to write your own docker file. FROM dockerfile/java:oracle-java8 ADD your.jar /opt/your-app ADD /dependencies /opt/your-app/dependency WORKDIR /opt/your-app CMD ["java -jar /opt/your-app/your.jar"] However, your app would probably require some parameters. Therefore, last line usually calls your shell script. Such script than validates number and format of those parameters among other things. This is also useful during the development phase because none of us