Čas dát NHibernatu sbohem #2 – performance

V minulém díle jsem jsem rozebíral pár nedostatků NHibernatu, které mají z mého pohledu zásadní vliv k nasazení či nenasazení toho výborného frameworku. V tomhle díle bych rád rozebral performance.

Performance lze různě ladit, jak jsem již před pár lety rozbíral. Je spousta možností, jak věci delat. Cachovani, stateless session atd. Ve chvíli, kdy se dostane člověk na hranu rychlosti databáze, musí se začít zaobírat méně obvyklými věcmi, které se běžně neladí.

Kontrukce entit

Vždy jsme na našel projektu ctili DDD, čili chtěli jsme mít rich entity a ne naked. Tím chci říct, že všechny atributy/fieldy jsou encapsulované get/set metodami. Pokud nechcete zveřejnit privátní fieldy, musíte začít používat reflection, což při velkém měřítku při používání pro většinu mapovaných entit začíná být pro .NET problém. Je to pomalé.

Existuje několik frameworků (např. FastReflect), které dokážou cachovat reflection přístup, čistou performance přístupu k privátním fieldům to pak zrychlí v našem případě 20x. Samozřejmě custom reflection framework není snadné do NHibernatu nejak integrovat. Je nutné si vyvinout vlastní komponentu, která bude konstruovat entity a tu tam zaintegrovat.

Konstrukce hierarchií entit

Popsaný problém pomalé reflexe plynule přechází v další issue, které člověk musí řešit. Většinou z persistence netaháte jeden typ class, ale hierachii tříd. To znamená, že se persistentního frameworku dotazuji na jeden objekt, který má však další dependance. Typicky v DDD se jedná o agregát, který je separovaný od ostatních agregátů – nemají mezi sebou přímou relaci – a ten obsahuje kolekce dalších druhů entit a tak dál rekurzivně. V našem případě jeden typ agregátu nabobtnal až na dvacet druhů class.

Z logiky věci NHibernate konstruuje celý agregát pomocí mappingu, tzn. map klíčů, entit a typů class. Jinak řečeno, materializované entity bude NHibernate ukládat do session a pro dané klíče a typy je linkovat mezi sebou do výsledného agregátu, až z toho nakonec vypadne pouze root instance. Ve chvíli, kdy těch typů a instancí není málo, snadno narazíme na to, že konstruování takového agregátu je dost pomalé.

Nhibernate je obecný framework, který se snaží za runtime zkonstruovat něco, co vy si můžete napsat sami. Performance programového přístupu versus vašeho zakompilovaného je samozřejmě velká v řádu desítek procent ušetřeného času.

Jediné, co musí programátor udělat, je napsat si vlastní linker a vlastní session, která je schopná absorvovat entity a propojovat je přes identifikátory.

Mapping

Přestože se to nemusí zdát, i samotné mapování může mít dopad na výkon Vaší aplikace. Ačkoliv je NHibernate jeden z nejvíc ohebných frameworků, co jsem poznal, mapping neumí zdaleka všechno. Pokud máte složitejší věci, např. kolekce kolekcí nebo dictinary kolekcí, musíte v mapování přijít s workaroundem, protože tohle standardně mapovat nelze. V našem případě jsem vždycky pro daný záznam vytvořili novou (zbytečnou) entitu, která tohle dokázala obalit a tu také namapovali.

Místo přirozeného řešení jsem kvůli jemnému omezení museli vytvořit entitu navíc. Samozřejmě, že takový zásah má vliv na performance, např. vytváření entit s následným garbage-collectingem atd. V serverových aplikacích je občas nutné hledět na každou zbytečnou entitu.

Garbage Collecting

Poslední téma, kterého bych se rád dotknul, je produkce entit a následný garbage collecting. Škálovatelná server-side aplikace by neměla plýtvat zdroji. Pokud to dělá, brzy v rámci optimalizací performance narazí. Tahle metrika je vždy strašně těžko měřitelná, protože každý profiler nebo typ měření dává jiné výsledky.

Jedna věc je však jistá. Jakékoliv ORM bude produkovat víc odpadu, protože bude vždy víc obecné – to znamená, že bude produkovat obalovací entity – než Vaše custom persistenční logika, která je prostě na míru. Bude produkovat víc obecný kód – jak jste to již popisoval v předchozí kapitole.

Při nasazení NHibernatu do produkce zvedne vždycky čas v GC.

Rozhřešení

Po několika letech používání NHibernatu jsme dospěli do situace, kdy jsme si pro zrychlení persistentní vrstvy museli přepsat určité komponenty nebo vyměnit NHibernate celý. Začali jsme s tím prvním. Jak jsme skončili uvidíme v další kapitole.

NHibernate performance issues #4: slow query compilation – named queries

NHibernate provides many approaches how to query the database:

First three of these querying methods define the body of query in the other than native SQL. It implies that NHibernate must transform these queries into native SQL – according to given dialect, e.g. into native MS SQL query.

If you really want to develop all-times fast application the described process can present unpleasant behavior. How to avoid query compilation?

Compiled named queries

Its ridiculous but everyone met compiled (and named) query. If you have at least once browsed the log which NHibernate produced, you have had to meet similar set of lines:

2010-12-13 21:26:42,056 DEBUG [7] NHibernate.Loader.Entity.AbstractEntityLoader .ctor:0 Static select for entity eu.podval.NHibernatePerformanceIssues.Model.Library: SELECT library0_.Id as Id1_0_, library0_.version as version1_0_, library0_.Address as Address1_0_, library0_.Director as Director1_0_, library0_.Name as Name1_0_ FROM [Library] library0_ WHERE library0_.Id=?

When NHibernate starts (SessionFactory as spring.net bean in my case) he complies certainly usable queries. In displayed case, NHibernate knows that he’ll probably search library entity in database by it’s id so he’ll generate native SQL query into internal cache and if the developer’s code call Find for Library entity, he’ll use exactly this pre-compiled select query in native SQL.

What’s the main benefit? If you would call thousand times library’s Find NHibernate would always has to compile the query. Instead of this inefficient behavior, NHibernate does it only once and uses pre-compiled version later.

How to speed you query? Use pre-complied named queries

As I’ve already written, NHibernate generates pre-compiled queries, stores them in the cache and uses them if necessary. NHibernate is the great framework, so he makes available such functionality even for you 🙂

Simply you can declare list of HQL (or SQL) named queries within any mapping (hbm.xml) file, NHibernate loads the file, parse queries and preserves the pre-compiled version in his internal cache so called queries in runtime not have to be intricately parsed to native SQL, but he’ll use already prepared ones.

How to use named queries?

  1. Define new hbm.xml file which will include your named queries, e.g. named-queries.hbm.xml
  2. Place this file within such assembly which NHibernate searches for hbm mapping files. Maybe you should have to update you Fluent NHibernate configuration to search for these files. Following code will do it for you or see FluentNHibernateSessionFactory.
m.HbmMappings.AddFromAssembly(Assembly.Load(assemblyName));

Now, it’s time to write your named-queries.hbm.xml file. It has the following structure.




<![CDATA[
select b from Book b
where exists elements(b.Rentals)
]]>


How to use it?

IList books = SessionFactory.GetCurrentSession().GetNamedQuery("Book.with.any.Rental").List();

How is the speed up of named queries?

Lets say, I’ll use Book.with.any.Rental query for any measures to we’ll see how omitted query compilation improves test response.

I’ve executed the test for both named query and plain HQL. According to debug labels, plain HQL case spent 40ms by parsing of HQL to native SQL.

Note that all written till now applies only to the first call. NHibernate is tricky framework so he caches queries for you automatically when he compiles them for first time. Lets call method to get books with any rental two times (first level cache is cleared among the calls):

  • first call took 190ms
  • second one only 26ms 

It’s also necessary to admit that database has also own query cache 🙂 The result is clear anyway.

What are real advantages of named queries?

It doesn’t seem such brilliant think to moil with writing the queries in xml file. What are real benefits?

  1. Speed (of the first call) – described example save 40ms of method call. It doesn’t seem so much. Imagine that you are developing huge project having almost hundred queries. It can save a lot of time! You should also notice that chosen query was very simple. According to my experiences, the compilation of more complicated query takes at least 200ms. It’s not small amount of time when you develop very quick application
  2. HQL parse for error on startup – you’ll find out that your query is correct or wrong at application’s startup because NHibernate do these things when he starts. You haven’t wait till the call of desired query
  3. Clean code – you aren’t mixing C# code together SQL (HQL) code
  4. Possibility to change query code after the application compilation – consider that you can change your HQL or SQL even if application was already compiled. You can simply expose named query hbm.xml file as ordinary xml file and you can tune your queries at runtime – means without additional compilation

You can also see Series of .NET NHibernate performance issues to read all series articles.

NHibernate performance issues #3: slow inserts (stateless session)

The whole series of NHibernate performance issues isn’t about simple use-cases. If you develop small app, such as simple website, you don’t need to care about performance. But if you design and develop huge application and once you have decided to use NHibernate you’ll solve various sort of issue. For today the use-case is obvious: how to insert many entities into the database as fast as possible?

Why I’m taking about previous stuff? The are a lot of articles how the original NHibernate’s purpose isn’t to support batch operations, like inserts. Once you have decided to NHibernate, you have to solve this issue.

Slow insertion
The basic way how to insert mapped entity into database is:

SessionFactory.GetCurrentSession().Save(object);

But what happen when I try to insert many entities? Lets say, I want to persist

  • 1000 libraries
  • each library has 100 books = 100k of books
  • each book has 5 rentals – there are 500k of rentals 

It’s really slow! The insertion took exactly 276 seconds! What’s the problem?

Each SQL insert is sent to server within own server request.

Batch processing with using adonet.batch_size
You can set property adonet.batch_size within your hibernate configuration to tell NHibernate that he can sent more queries to the SQL server within one statement. I’m going to set this value to 100. What’s the improvement? Insertion took 171 seconds right now. Better than 276! But isn’t it a lot of time? Yes it is!

The major problem is that NHibernate standard insertion via Session.Save is not intended to use for batch processing. NHibernate generated events, go through mapping, doesn’t group insert statements together in proper way by default. Obviously, it must take some time. Now, it’s the time to introduce …

Stateless session
NHibernate’s developers are smart guys so this significant functionality can’t stay in “not-intended for batch processing” state. Stateless session is tool intended for batch processing.

Stateless session is lightweight version of Session.Save method, it doesn’t throw so much events, it’s fast, it just generates one insert for given object according to mapping. It’s fast, so it apparently has any drawbacks.

Stateless session’s drawbacks

  • stateless session isn’t compatible with standard NHibernate session! There is another interface because it has completely different purpose. Spring.net’s support is missing, you can’t use transaction template. You must handl all the stuff by yourself.
  • because of intended fast behavior, stateless session doesn’t handle any cascade operation on children. You must manually push all objects to session, all children, their children, etc.

The last point seems to be very unpleasant drawback but if you look at previous picture showing NHibernate profiler you can see the major benefit of this approach.

Despite I’ve set adonet.batch_size to 100,  only 5 inserts are sent to SQL server within one statement. NHibernate groups inserts only for same type of entity. You aren’t able to achieve optimized query count with using standard way.

As I’ve said, you must call Insert method for each entity, so you can group all inserts of each specific entity by your code. Here are results of insertion:

  • 149 seconds – no advanced grouping when inserts are sent to sql server – insertion of first library followed by  insertion of it’s books, insertion of all book’s rentals, insertion of another library – we aren’t still use fully utilized power of adonet.batch_size because only 5 inserts are sent in one statement
foreach (Library library in libraries) {
session.Insert(library);
foreach (Book book in library.Books) {
session.Insert(book);
foreach (Rental rental in book.Rentals) {
session.Insert(rental);
}
}
}
  • 86 seconds – first of all libraries are processed by session’s insert following by all books and all rentals – this approach efficiently uses batch size, because for 100k of books it sents only 1000 statements to SQL server, each having 100 of insert followed by set of 5k of inserts statements for rentals
foreach (Library library in libraries) {
session.Insert(library);
}

foreach (Library library in libraries) {
foreach (Book book in library.Books) {
session.Insert(book);
}
}

foreach (Library library in libraries) {
foreach (Book book in library.Books) {
foreach (Rental rental in book.Rentals) {
session.Insert(rental);
}
}
}
  • 80 seconds – adonet.batch_size = 1000

Stateless session is efficient!
The best result is the small summary of measured times because the main benefit of stateless session will exactly appear. The example persists (1k + 100k + 500k) 601k of entities.

session type adonet.batch_size additional groupping time [s]
standard no no 276
standard 100 no 171
stateless session 100 no 149
stateless session 100 yes 86
stateless session 1000 yes 80

If you need to improve your application’s insertion time, just use stateless session.

You can also see Series of .NET NHibernate performance issues to read all series articles.

NHibernate performance issues #2: slow cascade save and update (flushing)

What’s the most powerful NHibernate’s feature, except object mapping? Cascade operations, like insert, update or save. What’s the best NHibernate’s performance issue: cascade saving.

Cascade insert, save and update

If you let NHibernate to manage your entities (e.g. you load them from persistence), NHibernate can provide all persistence operations for you, it includes automatic:

  • insert
  • update
  • delete

All depends only on your cascade settings. What says documentation?

cascade (optional): Specifies which operations should be cascaded from the parent object to the associated object.

Attribute declares which kind of operation would be performed for particular case. All this stuff can be adjusted for traditional pattern parent – child.

Following code declares specific behavior what happen to children (books) when parent entity (Library) will be affected any of mentioned operations.

HasMany(l => l.Books).
Access.CamelCaseField().
AsBag().
Inverse().
KeyColumn("LIBRARY_ID").
ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
ForeignKeyCascadeOnDelete().
LazyLoad().
Cascade.AllDeleteOrphan();

As you can see, each children will be affected by all kinds of operation performed on parent object. Obviously there is more options: it needn’t do anything, it can only update etc.

That’s all about cascade operation, it’s pretty and fine NHibernate’s stuff and it’s well described at manual.

Cascade update issue

We have defined cascade operations. What’s the problem?

When NHibernate finds it appropriate to go through all  relations and objects stored in first level cache – session, it checks all dirty flags and performs proper operations which can be very expensive operation.

What means “finds it appropriate“? It means when NHibernate flush the session to database (to opened transaction scope). It can be performed in following situations:

  • before execution of any query – HQL or criteria 
  • before commit 

You can see it in log. It looks like following log snippet:

done cascade NHibernate.Engine.CascadingAction+SaveUpdateCascadingAction for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
NHibernate.Engine.Cascade CascadeCollectionElements:0 deleting orphans for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books

Lets show the problem in example. Assume that you let NHibernate manage pretty huge amount of objects, e.g. you fetched them from database. NHibernate stores all of them at first level cache – session. Than you execute a few HQL queries and finish it by commit. NHibernate call flush before each action. It’s serious performance hit!

NHibernate can spent huge amount of time checking all loaded data if anyone has changed. Even if you suppose to read them only.

Avoiding described situation can significantly faster your application.

Avoid needless cascade save or update

1. Read only transaction
First of all, you can tell NHibernate that it needn’t perform checking because you doesn’t suppose to write any change. Simple set you transaction read-only. I’m using spring.net for transaction management.

[Transaction(TransactionPropagation.RequiresNew, ReadOnly = true)]
public void FindAllNewBooks(IEnumerable books);

NHibernate won’t perform any cascade checking because you tell him that you aren’t suppose to update data. But what if you want to update the data?

You can provide write transaction for all places, where you need to write the data – but it looses A as atomicity from ACID for the whole – business – operation.

2. Evict entity from session
You can tell NHibernate to do not care about entity in the future, means do not store entity in session and NHibernate will forget about the entity.

SessionFactory.GetCurrentSession().Evict(library);

But be aware to evict entity from NHibernate session. If NHibernate forget the entity, there is no way how to perform lazy loading so you can’t use this solution if children or properties are lazy loaded.

3. Set Status.ReadOnly on entity when it’s loaded
There is really tricky way how to set read-only attribute placed in NHibernate properties, see this article Ensuring updates on Flush. 

I’m not using this type because it’s really low-level approach.

4. Use IStatelessSession for bulk operation
NHibernate provides IStatelessSession interface to perform bulk operation. I’d like to write the whole article about stateless session later.

5. There isn’t really simple way how to fetch entity as read-only?
No there isn’t. Despite that basic Factory pattern is often declared with method FindById({id}, bool forUpdate);, NHibernate is not able to provide such kind of functionality, you must use one of described work-arounds.

Summary

If you want to develop fast and scalable application, you need to deal with cascade save and update. It’s the first issue you’ll find in log.

The most useful and secure way is to decide which transactions can be marked as read-only. Mark as many as possible because the most use-cases of average application read the data only.

If you really need to write data, you should lower the amount of entities placed in first level cache – session. You avoid the situation when your code loaded thousand of entities, you add one new and all thousand + one are check for dirty data.

If you insert thousand entities into the dabase, e.g. in import use-case, just use stateless session.

You can also see Series of .NET NHibernate performance issues to read all series articles.

Series of .NET NHibernate performance issues articles

I’ve spent with NHibernate persistence implementation to our product last four months. I’d like to provide set of articles regarding performance issues of the NHibernate usage.

NHibernate team has been releasing huge manual having 189 pages. It contains the basic description allowing the developer to write persistence and not totally mess it up. If you want to develop fast application, you need to read discussions (such as those at stackoverflow.com) and solve the problems particularly one by one. According to my experiences, I’ve decide to write the series of articles regarding NHibernate, especially performance.

Prerequsities
All following articles will contain examples. I love spring.net so it’s nonsense to do not use it because it’s fine integrated together as well as log4net.

All examples which will be described or used in the whole series will use three classes as domain model. According to Domain Driven Design, there are root aggregate Library, child Book and it’s child Rental. Don’t linger on various circumstances like book’s author should be separated aggregate or library is identified by it’s name. Already, it’s certainly that domain model needs to be changed because there are still no identificator at the database level.

Following picture defines relations between all domain classes, standard UML:

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

All series part

  1. NHibernate performance issues #1: evil List (non-inverse relationhip)
  2. NHibernate performance issues #2: slow cascade save and update (flushing)
  3. NHibernate performance issues #3: slow inserts (stateless session)
  4. NHibernate performance issues #4: slow query compilation – named queries

NHibernate performance issues #1: evil List (non-inverse relationhip)

Lists are evil, at least when using NHibernate. You should re-consider if you really need to use specific List implementation, because it has unsuitale index column owned by parent, not children. List can’t be used in inverse relationship which implies few (but major) inclusions:

  • extra sql UPDATE to persist mentioned index value
  • unscalable addition to the list – NHibernate needs to fetch all items and add new item after
  • inability to use fast cascade deletion by foreign keys
  • inability to use IStatelessSession for fast data insertion

Basic theorem: you don’t need Lists! Furthermore, we’ll discuss each bullet in details.

What is inverse relation?

First of all, it’s necessary to clarify what inverse relation means. Reference between parent and child isn’t hold by parent but child! See following picture:

Here is NHibernate mapping definition for inverse relation with using excellent Fluent NHibernate:

HasMany(l => l.Books).
Access.CamelCaseField().
AsBag().
Inverse().
KeyColumn("LIBRARY_ID").
ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
LazyLoad().
Cascade.AllDeleteOrphan();

Or hbm mapping:







Unscalable addition to List

Each item stored in standard (non-inverse) List holds index in special column defining collection order. It’s simple smallint index.

Imagine the situation the you want to insert any new item to middle of mentioned list. Indexes of all items stored within rest of list need to be incremented. NHibernate doesn’t execute any update query incrementing indexes but fetch all items from database.

Lets imagine described process for huge amount of children. Lets say that parent has 50 thousand of children. Object graph is stored in database, no object is cached. Parent is loaded from database, children list is marked as lazy. If you need to insert (or add) new item to collection, all 50 thousand items are loaded too. Again, again and again if you perform the operation.

Addition to List is totally unscalable.

Inability to use fast cascade deletion by foreign keys

Best unified approach to delete huge (various) object graph is to use cascade deletion hanged on foreign keys. I’ll described this approach in separated articles in near future.

Cascade deletion is based on automatic removal of orphaned item. If relation between parent and child disappears, orphaned child is removed too. According to described process, cascade deletion is intended to be used along with inverse relation.

Following code snippet displays example of SQL foreign keys with cascade deletion:

alter table [Book] 
add constraint FK_BOOK_LIBRARY_ID
foreign key (LIBRARY_ID)
references [Library]
on delete cascade

Extra sql update to store item’s index

In the case of non-inverse relation, NHibernate inserts parent first and than all children. After these operations, extra update is produced to insert index column. I’ll use already described situation counting with 50 thousand children, NHibernate produces 50 thousand of sql inserts and than sends additional 50 thousand of updates to SQL server – it means twice more sql statements!

See the following two sql queries extracted from log:

INSERT INTO [Book] (Isbn, Name, Pages, LIBRARY_ID, Id) VALUES (...);

UPDATE [Book] SET LIBRARY_ID = ... WHERE Id = ...;

What to pick instead of List?

It’s simple to write that lists are evil. And what now, what to pick instead? There are three options.

1. At almost all cases, you really don’t need List. See next chapter to find out which instance of collection to use.

List’s index declares and defines list’s order. At almost all cases, the list can be sorted according to any item’s property. Does it make any sense to sort the list according to when items were added? I think it doesn’t. I think that list needs to be sorted according to item’s creation date, item’s size, count’s of item’s replies etc., simply according to any item’s property.

As our example defines, list of rentals should be sorted according to rental’s start date, see mapping:

HasMany(b => b.Rentals).
Access.CamelCaseField().
AsBag().
Inverse().
KeyColumn("BOOK_ID").
ForeignKeyConstraintName("FK_RENTAL_BOOK_ID").
ForeignKeyCascadeOnDelete().
LazyLoad().
OrderBy("StartOfRental").
Cascade.AllDeleteOrphan();

2. You need to sort list according to index and you can move index to child entit. Map it as a standard private property and than sort the whole collection according the property – see previous example.

See example of index property placed at children:

private int OrderIndex {
get { return Book.Rentals.IndexOf(this); }
set { }
}

The drawback is also that you need to expose parent’s children list (instead of ICollection or IEnumerable) because child has to be able to find out it’s position within list.

3. You need to sort list according to index, you can’t change domain model and it’s impossible to sort items according to any item’s property. There is no other choice than simply live with non-inverse collection along with it’s all disadvantages described above.

Recommended solution how to use collection when using NHibernate persistence

  1. Parent entity contains Add and Remove method for each collection
  2. Use ICollection as collection interface
  3. If you need to expose the whole colletion use IEnumerable – it’s just read-only collection, support sequential fetching, etc.

Here is code snippet of Book entity.

public class Book {

private ICollection rentals = new List();
...
public void AddRental(String rentee) {
...
rentals.Add(new Rental(this, rentee));
}
...
public IEnumerable Rentals {
get { return rentals; }
}
}

Summary
You should be aware of List usage along with NHibernate persistence. It can bring you serious performance issues. Try to re-design you collections to be index independent and provide sorting approach by another than simple index way.

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

You can also see Series of .NET NHibernate performance issues to read all series articles.