Skip to main content

NHibernate performance issues #1: evil List (non-inverse relationhip)

Lists are evil, at least when using NHibernate. You should re-consider if you really need to use specific List implementation, because it has unsuitale index column owned by parent, not children. List can't be used in inverse relationship which implies few (but major) inclusions:
  • extra sql UPDATE to persist mentioned index value
  • unscalable addition to the list - NHibernate needs to fetch all items and add new item after
  • inability to use fast cascade deletion by foreign keys
  • inability to use IStatelessSession for fast data insertion

Basic theorem: you don't need Lists! Furthermore, we'll discuss each bullet in details.

What is inverse relation?

First of all, it's necessary to clarify what inverse relation means. Reference between parent and child isn't hold by parent but child! See following picture:


Here is NHibernate mapping definition for inverse relation with using excellent Fluent NHibernate:
HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  LazyLoad().
  Cascade.AllDeleteOrphan();

Or hbm mapping:
<bag access="field.camelcase" cascade="all-delete-orphan" inverse="true" lazy="true" name="Books" mutable="true">
  <key foreign-key="FK_BOOK_LIBRARY_ID">
    <column name="LIBRARY_ID" />
  </key>
  <one-to-many class="Book" />
</bag>

Unscalable addition to List

Each item stored in standard (non-inverse) List holds index in special column defining collection order. It's simple smallint index.

Imagine the situation the you want to insert any new item to middle of mentioned list. Indexes of all items stored within rest of list need to be incremented. NHibernate doesn't execute any update query incrementing indexes but fetch all items from database.

Lets imagine described process for huge amount of children. Lets say that parent has 50 thousand of children. Object graph is stored in database, no object is cached. Parent is loaded from database, children list is marked as lazy. If you need to insert (or add) new item to collection, all 50 thousand items are loaded too. Again, again and again if you perform the operation.

Addition to List is totally unscalable.

Inability to use fast cascade deletion by foreign keys

Best unified approach to delete huge (various) object graph is to use cascade deletion hanged on foreign keys. I'll described this approach in separated articles in near future.

Cascade deletion is based on automatic removal of orphaned item. If relation between parent and child disappears, orphaned child is removed too. According to described process, cascade deletion is intended to be used along with inverse relation.

Following code snippet displays example of SQL foreign keys with cascade deletion:
alter table [Book] 
  add constraint FK_BOOK_LIBRARY_ID 
  foreign key (LIBRARY_ID) 
  references [Library] 
  on delete cascade

Extra sql update to store item's index

In the case of non-inverse relation, NHibernate inserts parent first and than all children. After these operations, extra update is produced to insert index column. I'll use already described situation counting with 50 thousand children, NHibernate produces 50 thousand of sql inserts and than sends additional 50 thousand of updates to SQL server - it means twice more sql statements!

See the following two sql queries extracted from log:
INSERT INTO [Book] (Isbn, Name, Pages, LIBRARY_ID, Id) VALUES (...);

UPDATE [Book] SET LIBRARY_ID = ... WHERE Id = ...;

What to pick instead of List?

It's simple to write that lists are evil. And what now, what to pick instead? There are three options.

1. At almost all cases, you really don't need List. See next chapter to find out which instance of collection to use.

List's index declares and defines list's order. At almost all cases, the list can be sorted according to any item's property. Does it make any sense to sort the list according to when items were added? I think it doesn't. I think that list needs to be sorted according to item's creation date, item's size, count's of item's replies etc., simply according to any item's property.

As our example defines, list of rentals should be sorted according to rental's start date, see mapping:
HasMany(b => b.Rentals).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("BOOK_ID").
  ForeignKeyConstraintName("FK_RENTAL_BOOK_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  OrderBy("StartOfRental").
  Cascade.AllDeleteOrphan();

2. You need to sort list according to index and you can move index to child entit. Map it as a standard private property and than sort the whole collection according the property - see previous example.

See example of index property placed at children:
private int OrderIndex {
    get { return Book.Rentals.IndexOf(this); }
    set { }
}

The drawback is also that you need to expose parent's children list (instead of ICollection or IEnumerable) because child has to be able to find out it's position within list.

3. You need to sort list according to index, you can't change domain model and it's impossible to sort items according to any item's property. There is no other choice than simply live with non-inverse collection along with it's all disadvantages described above.

Recommended solution how to use collection when using NHibernate persistence
  1. Parent entity contains Add and Remove method for each collection
  2. Use ICollection as collection interface
  3. If you need to expose the whole colletion use IEnumerable - it's just read-only collection, support sequential fetching, etc.
Here is code snippet of Book entity.
public class Book {

    private ICollection<Rental> rentals = new List<Rental>();
    ...
    public void AddRental(String rentee) {
        ...
        rentals.Add(new Rental(this, rentee));
    }
    ...
    public IEnumerable<Rental> Rentals {
        get { return rentals; }
    }
}

Summary
You should be aware of List usage along with NHibernate persistence. It can bring you serious performance issues. Try to re-design you collections to be index independent and provide sorting approach by another than simple index way.

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

You can also see Series of .NET NHibernate performance issues to read all series articles.
3 comments

Popular posts from this blog

Http and TCP Load Balancing with Kubernetes

Kubernetes allows to manage large clusters which are composed of docker containers. And where is large computation power there is large amount of data throughput. However, there is still a need to spread the data throughput so it can reach and utilize particular docker container(s). This is called load balancingKubernetes supports two basic forms of load balancing. Internal among docker containers themselves and external for clients which call you application from outside environment - just users.


Internal Load Balancing with Kubernetes Usual approach during the modeling of an application in kubernetes is to provide domain models for pods, replications controllers and services. Look at the great documentation if you are not aware of all these principles.

For simplification, pod is a set of docker containers which are always located on the same node. Replication controller allows and guarantees scalability. The last but not least is the service which hides all instances of pods - cre…

Validating nginx config file using docker approach

I try to setup nginx as a load balancer. The configuration is just a file with a lot of constrains so I need a validation. There is no online validation service, as e.g. CoreOS has, and I don't want to install nginx on my laptop as I work on a distributed app.

Docker is right approach for me. Let say I have following config:


In short, I'm going to pass nginx config to running nginx instance and look to the logs.

Put you nginx.config to the temp and start the docker image:

sudo docker run --name nginx -v /tmp/nginx.config:/etc/nginx/nginx.conf:ro -d nginx It uses volume mapping so the command just starts a new docker container and mounts a local /tmp/nginx.config to the given in-container path. You can obviously change the volume path to your personal path. Is it working or not? Look at logs.

sudo docker logs nginx If there is no entry, your file is fine. In the case of an error, you can see something like this:

2016/01/08 11:37:31 [emerg] 1#1: unexpected "}" in /etc/…