Skip to main content

NHibernate performance issues #2: slow cascade save and update (flushing)

What's the most powerful NHibernate's feature, except object mapping? Cascade operations, like insert, update or save. What's the best NHibernate's performance issue: cascade saving.

Cascade insert, save and update

If you let NHibernate to manage your entities (e.g. you load them from persistence), NHibernate can provide all persistence operations for you, it includes automatic:
  • insert
  • update
  • delete
All depends only on your cascade settings. What says documentation?
cascade (optional): Specifies which operations should be cascaded from the parent object to the associated object.
Attribute declares which kind of operation would be performed for particular case. All this stuff can be adjusted for traditional pattern parent - child.

Following code declares specific behavior what happen to children (books) when parent entity (Library) will be affected any of mentioned operations.
HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  Cascade.AllDeleteOrphan();
As you can see, each children will be affected by all kinds of operation performed on parent object. Obviously there is more options: it needn't do anything, it can only update etc.

That's all about cascade operation, it's pretty and fine NHibernate's stuff and it's well described at manual.

Cascade update issue

We have defined cascade operations. What's the problem?

When NHibernate finds it appropriate to go through all  relations and objects stored in first level cache - session, it checks all dirty flags and performs proper operations which can be very expensive operation.

What means "finds it appropriate"? It means when NHibernate flush the session to database (to opened transaction scope). It can be performed in following situations:
  • before execution of any query - HQL or criteria 
  • before commit 

You can see it in log. It looks like following log snippet:
done cascade NHibernate.Engine.CascadingAction+SaveUpdateCascadingAction for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
NHibernate.Engine.Cascade CascadeCollectionElements:0 deleting orphans for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
Lets show the problem in example. Assume that you let NHibernate manage pretty huge amount of objects, e.g. you fetched them from database. NHibernate stores all of them at first level cache - session. Than you execute a few HQL queries and finish it by commit. NHibernate call flush before each action. It's serious performance hit!

NHibernate can spent huge amount of time checking all loaded data if anyone has changed. Even if you suppose to read them only.

Avoiding described situation can significantly faster your application.

Avoid needless cascade save or update

1. Read only transaction
First of all, you can tell NHibernate that it needn't perform checking because you doesn't suppose to write any change. Simple set you transaction read-only. I'm using spring.net for transaction management.
[Transaction(TransactionPropagation.RequiresNew, ReadOnly = true)]
public void FindAllNewBooks(IEnumerable<Book> books);
NHibernate won't perform any cascade checking because you tell him that you aren't suppose to update data. But what if you want to update the data?

You can provide write transaction for all places, where you need to write the data - but it looses A as atomicity from ACID for the whole - business - operation.

2. Evict entity from session
You can tell NHibernate to do not care about entity in the future, means do not store entity in session and NHibernate will forget about the entity.
SessionFactory.GetCurrentSession().Evict(library);
But be aware to evict entity from NHibernate session. If NHibernate forget the entity, there is no way how to perform lazy loading so you can't use this solution if children or properties are lazy loaded.

3. Set Status.ReadOnly on entity when it's loaded
There is really tricky way how to set read-only attribute placed in NHibernate properties, see this article Ensuring updates on Flush. 

I'm not using this type because it's really low-level approach.

4. Use IStatelessSession for bulk operation
NHibernate provides IStatelessSession interface to perform bulk operation. I'd like to write the whole article about stateless session later.

5. There isn't really simple way how to fetch entity as read-only?
No there isn't. Despite that basic Factory pattern is often declared with method FindById({id}, bool forUpdate);, NHibernate is not able to provide such kind of functionality, you must use one of described work-arounds.

Summary

If you want to develop fast and scalable application, you need to deal with cascade save and update. It's the first issue you'll find in log.

The most useful and secure way is to decide which transactions can be marked as read-only. Mark as many as possible because the most use-cases of average application read the data only.

If you really need to write data, you should lower the amount of entities placed in first level cache - session. You avoid the situation when your code loaded thousand of entities, you add one new and all thousand + one are check for dirty data.

If you insert thousand entities into the dabase, e.g. in import use-case, just use stateless session.

You can also see Series of .NET NHibernate performance issues to read all series articles.

Comments

Popular posts from this blog

Performance Battle of NoSQL blob storages #1: Cassandra

Preface We spend last five years on HP Service Virtualization using MsSQL database . Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time - we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved - we had to learn a lot of stuff. Transactions, lock escalation , isolation levels , clustered and non clustered indexes, buffered reading, index structure and it's persistence, GUID ids in clustered indexes , bulk importing , omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database. Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavio

NHibernate performance issues #3: slow inserts (stateless session)

The whole series of NHibernate performance issues isn't about simple use-cases. If you develop small app, such as simple website, you don't need to care about performance. But if you design and develop huge application and once you have decided to use NHibernate you'll solve various sort of issue. For today the use-case is obvious: how to insert many entities into the database as fast as possible? Why I'm taking about previous stuff? The are a lot of articles how the original NHibernate's purpose isn't to support batch operations , like inserts. Once you have decided to NHibernate, you have to solve this issue. Slow insertion The basic way how to insert mapped entity into database is: SessionFactory.GetCurrentSession().Save(object); But what happen when I try to insert many entities? Lets say, I want to persist 1000 libraries each library has 100 books = 100k of books each book has 5 rentals - there are 500k of rentals  It's really slow! The inser

Jenkins + git revision in all build names

Jenkins by default assigns version of a build using local counter within each type of a build. An example is better. When you look at this overview, you definitely do not know which code revision was used in Compile build and which in Integration Tests . I've followed nice article regarding real CI pipeline using jenkin s. It uses Build Name Setter Plugin. Unfortunately this article uses SVN revision number. So I said I'll just use git revision as git is my source control. But it's not so easy as how it could seem for first look. My Jenkins setup comprised of first compile build step which clones git server and performs an compilation. Second build steps clones the repository from first step and executes integration tests . The problem here is that the second step does not know which git revision compile step cloned. Here is list of steps how to do that. 1. You obviously need Git Plugin , Build Name Setter Plugin and Parameterized Trigger Plugin 2. Compile