Wednesday, November 17, 2010

NHibernate performance issues #2: slow cascade save and update (flushing)

What's the most powerful NHibernate's feature, except object mapping? Cascade operations, like insert, update or save. What's the best NHibernate's performance issue: cascade saving.

Cascade insert, save and update

If you let NHibernate to manage your entities (e.g. you load them from persistence), NHibernate can provide all persistence operations for you, it includes automatic:
  • insert
  • update
  • delete
All depends only on your cascade settings. What says documentation?
cascade (optional): Specifies which operations should be cascaded from the parent object to the associated object.
Attribute declares which kind of operation would be performed for particular case. All this stuff can be adjusted for traditional pattern parent - child.

Following code declares specific behavior what happen to children (books) when parent entity (Library) will be affected any of mentioned operations.
HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  Cascade.AllDeleteOrphan();
As you can see, each children will be affected by all kinds of operation performed on parent object. Obviously there is more options: it needn't do anything, it can only update etc.

That's all about cascade operation, it's pretty and fine NHibernate's stuff and it's well described at manual.

Cascade update issue

We have defined cascade operations. What's the problem?

When NHibernate finds it appropriate to go through all  relations and objects stored in first level cache - session, it checks all dirty flags and performs proper operations which can be very expensive operation.

What means "finds it appropriate"? It means when NHibernate flush the session to database (to opened transaction scope). It can be performed in following situations:
  • before execution of any query - HQL or criteria 
  • before commit 

You can see it in log. It looks like following log snippet:
done cascade NHibernate.Engine.CascadingAction+SaveUpdateCascadingAction for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
NHibernate.Engine.Cascade CascadeCollectionElements:0 deleting orphans for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
Lets show the problem in example. Assume that you let NHibernate manage pretty huge amount of objects, e.g. you fetched them from database. NHibernate stores all of them at first level cache - session. Than you execute a few HQL queries and finish it by commit. NHibernate call flush before each action. It's serious performance hit!

NHibernate can spent huge amount of time checking all loaded data if anyone has changed. Even if you suppose to read them only.

Avoiding described situation can significantly faster your application.

Avoid needless cascade save or update

1. Read only transaction
First of all, you can tell NHibernate that it needn't perform checking because you doesn't suppose to write any change. Simple set you transaction read-only. I'm using spring.net for transaction management.
[Transaction(TransactionPropagation.RequiresNew, ReadOnly = true)]
public void FindAllNewBooks(IEnumerable<Book> books);
NHibernate won't perform any cascade checking because you tell him that you aren't suppose to update data. But what if you want to update the data?

You can provide write transaction for all places, where you need to write the data - but it looses A as atomicity from ACID for the whole - business - operation.

2. Evict entity from session
You can tell NHibernate to do not care about entity in the future, means do not store entity in session and NHibernate will forget about the entity.
SessionFactory.GetCurrentSession().Evict(library);
But be aware to evict entity from NHibernate session. If NHibernate forget the entity, there is no way how to perform lazy loading so you can't use this solution if children or properties are lazy loaded.

3. Set Status.ReadOnly on entity when it's loaded
There is really tricky way how to set read-only attribute placed in NHibernate properties, see this article Ensuring updates on Flush. 

I'm not using this type because it's really low-level approach.

4. Use IStatelessSession for bulk operation
NHibernate provides IStatelessSession interface to perform bulk operation. I'd like to write the whole article about stateless session later.

5. There isn't really simple way how to fetch entity as read-only?
No there isn't. Despite that basic Factory pattern is often declared with method FindById({id}, bool forUpdate);, NHibernate is not able to provide such kind of functionality, you must use one of described work-arounds.

Summary

If you want to develop fast and scalable application, you need to deal with cascade save and update. It's the first issue you'll find in log.

The most useful and secure way is to decide which transactions can be marked as read-only. Mark as many as possible because the most use-cases of average application read the data only.

If you really need to write data, you should lower the amount of entities placed in first level cache - session. You avoid the situation when your code loaded thousand of entities, you add one new and all thousand + one are check for dirty data.

If you insert thousand entities into the dabase, e.g. in import use-case, just use stateless session.

You can also see Series of .NET NHibernate performance issues to read all series articles.

0 comments: