Wednesday, November 17, 2010

NHibernate performance issues #2: slow cascade save and update (flushing)

What's the most powerful NHibernate's feature, except object mapping? Cascade operations, like insert, update or save. What's the best NHibernate's performance issue: cascade saving.

Cascade insert, save and update

If you let NHibernate to manage your entities (e.g. you load them from persistence), NHibernate can provide all persistence operations for you, it includes automatic:

insert
update
delete

All depends only on your cascade settings. What says documentation?

cascade (optional): Specifies which operations should be cascaded from the parent object to the associated object.

Attribute declares which kind of operation would be performed for particular case. All this stuff can be adjusted for traditional pattern parent - child.

Following code declares specific behavior what happen to children (books) when parent entity (Library) will be affected any of mentioned operations.

HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  Cascade.AllDeleteOrphan();

As you can see, each children will be affected by all kinds of operation performed on parent object. Obviously there is more options: it needn't do anything, it can only update etc.

That's all about cascade operation, it's pretty and fine NHibernate's stuff and it's well described at manual.

Cascade update issue

We have defined cascade operations. What's the problem?

When NHibernate finds it appropriate to go through all relations and objects stored in first level cache - session, it checks all dirty flags and performs proper operations which can be very expensive operation.

What means "finds it appropriate"? It means when NHibernate flush the session to database (to opened transaction scope). It can be performed in following situations:

before execution of any query - HQL or criteria
before commit

You can see it in log. It looks like following log snippet:

done cascade NHibernate.Engine.CascadingAction+SaveUpdateCascadingAction for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books
NHibernate.Engine.Cascade CascadeCollectionElements:0 deleting orphans for collection: eu.podval.NHibernatePerformanceIssues.Model.Library.Books

Lets show the problem in example. Assume that you let NHibernate manage pretty huge amount of objects, e.g. you fetched them from database. NHibernate stores all of them at first level cache - session. Than you execute a few HQL queries and finish it by commit. NHibernate call flush before each action. It's serious performance hit!

NHibernate can spent huge amount of time checking all loaded data if anyone has changed. Even if you suppose to read them only.

Avoiding described situation can significantly faster your application.

Avoid needless cascade save or update

1. Read only transaction
First of all, you can tell NHibernate that it needn't perform checking because you doesn't suppose to write any change. Simple set you transaction read-only. I'm using spring.net for transaction management.

[Transaction(TransactionPropagation.RequiresNew, ReadOnly = true)]
public void FindAllNewBooks(IEnumerable<Book> books);

NHibernate won't perform any cascade checking because you tell him that you aren't suppose to update data. But what if you want to update the data?

You can provide write transaction for all places, where you need to write the data - but it looses A as atomicity from ACID for the whole - business - operation.

2. Evict entity from session
You can tell NHibernate to do not care about entity in the future, means do not store entity in session and NHibernate will forget about the entity.

SessionFactory.GetCurrentSession().Evict(library);

But be aware to evict entity from NHibernate session. If NHibernate forget the entity, there is no way how to perform lazy loading so you can't use this solution if children or properties are lazy loaded.

3. Set Status.ReadOnly on entity when it's loaded
There is really tricky way how to set read-only attribute placed in NHibernate properties, see this article Ensuring updates on Flush.

I'm not using this type because it's really low-level approach.

4. Use IStatelessSession for bulk operation
NHibernate provides IStatelessSession interface to perform bulk operation. I'd like to write the whole article about stateless session later.

5. There isn't really simple way how to fetch entity as read-only?
No there isn't. Despite that basic Factory pattern is often declared with method FindById({id}, bool forUpdate);, NHibernate is not able to provide such kind of functionality, you must use one of described work-arounds.

Summary

If you want to develop fast and scalable application, you need to deal with cascade save and update. It's the first issue you'll find in log.

The most useful and secure way is to decide which transactions can be marked as read-only. Mark as many as possible because the most use-cases of average application read the data only.

If you really need to write data, you should lower the amount of entities placed in first level cache - session. You avoid the situation when your code loaded thousand of entities, you add one new and all thousand + one are check for dirty data.

If you insert thousand entities into the dabase, e.g. in import use-case, just use stateless session.

You can also see Series of .NET NHibernate performance issues to read all series articles.

HD2 Energy rom + CHT editor 2.0 = still unstable

I was really curious about new CHT (Cookie Home Tab) Editor 2.0. It's UI heart of the whole work with this device.

Without undue hesitation I've yesterday installed famous energy rom including final CHT 2.0 into my HD2. My experience is really bad, I fear that it's still really unstable piece of software.

CHT Editor brings really great new graphic, no doubt about that, see following screen-shots.

The down side of new graphic is unstable device, I was forced to restart my HD2 four times only for today. Why?

First restart followed execution of google maps app. I wasn't able to return to today screen. After an hour, I've tried to perform any call. I've chose person, clicked at photo and the call started but I was unable to go to "call screen". The last case was again about music player and today screen, I was again unable to return back.

I'm silently waiting to next release of new energy rom version, I hope that it will work, you should wait too because it's annoying to restart HD2 each hour.

Series of .NET NHibernate performance issues articles

I've spent with NHibernate persistence implementation to our product last four months. I'd like to provide set of articles regarding performance issues of the NHibernate usage.

NHibernate team has been releasing huge manual having 189 pages. It contains the basic description allowing the developer to write persistence and not totally mess it up. If you want to develop fast application, you need to read discussions (such as those at stackoverflow.com) and solve the problems particularly one by one. According to my experiences, I've decide to write the series of articles regarding NHibernate, especially performance.

Prerequsities
All following articles will contain examples. I love spring.net so it's nonsense to do not use it because it's fine integrated together as well as log4net.

All examples which will be described or used in the whole series will use three classes as domain model. According to Domain Driven Design, there are root aggregate Library, child Book and it's child Rental. Don't linger on various circumstances like book's author should be separated aggregate or library is identified by it's name. Already, it's certainly that domain model needs to be changed because there are still no identificator at the database level.

Following picture defines relations between all domain classes, standard UML:

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

All series part

NHibernate performance issues #1: evil List (non-inverse relationhip)

Lists are evil, at least when using NHibernate. You should re-consider if you really need to use specific List implementation, because it has unsuitale index column owned by parent, not children. List can't be used in inverse relationship which implies few (but major) inclusions:

extra sql UPDATE to persist mentioned index value
unscalable addition to the list - NHibernate needs to fetch all items and add new item after
inability to use fast cascade deletion by foreign keys
inability to use IStatelessSession for fast data insertion

Basic theorem: you don't need Lists! Furthermore, we'll discuss each bullet in details.

What is inverse relation?

First of all, it's necessary to clarify what inverse relation means. Reference between parent and child isn't hold by parent but child! See following picture:

Here is NHibernate mapping definition for inverse relation with using excellent Fluent NHibernate:

HasMany(l => l.Books).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("LIBRARY_ID").
  ForeignKeyConstraintName("FK_BOOK_LIBRARY_ID").
  LazyLoad().
  Cascade.AllDeleteOrphan();

Or hbm mapping:

<bag access="field.camelcase" cascade="all-delete-orphan" inverse="true" lazy="true" name="Books" mutable="true">
  <key foreign-key="FK_BOOK_LIBRARY_ID">
    <column name="LIBRARY_ID" />
  </key>
  <one-to-many class="Book" />
</bag>

Unscalable addition to List

Each item stored in standard (non-inverse) List holds index in special column defining collection order. It's simple smallint index.

Imagine the situation the you want to insert any new item to middle of mentioned list. Indexes of all items stored within rest of list need to be incremented. NHibernate doesn't execute any update query incrementing indexes but fetch all items from database.

Lets imagine described process for huge amount of children. Lets say that parent has 50 thousand of children. Object graph is stored in database, no object is cached. Parent is loaded from database, children list is marked as lazy. If you need to insert (or add) new item to collection, all 50 thousand items are loaded too. Again, again and again if you perform the operation.

Addition to List is totally unscalable.

Inability to use fast cascade deletion by foreign keys

Best unified approach to delete huge (various) object graph is to use cascade deletion hanged on foreign keys. I'll described this approach in separated articles in near future.

Cascade deletion is based on automatic removal of orphaned item. If relation between parent and child disappears, orphaned child is removed too. According to described process, cascade deletion is intended to be used along with inverse relation.

Following code snippet displays example of SQL foreign keys with cascade deletion:

alter table [Book] 
  add constraint FK_BOOK_LIBRARY_ID 
  foreign key (LIBRARY_ID) 
  references [Library] 
  on delete cascade

Extra sql update to store item's index

In the case of non-inverse relation, NHibernate inserts parent first and than all children. After these operations, extra update is produced to insert index column. I'll use already described situation counting with 50 thousand children, NHibernate produces 50 thousand of sql inserts and than sends additional 50 thousand of updates to SQL server - it means twice more sql statements!

See the following two sql queries extracted from log:

INSERT INTO [Book] (Isbn, Name, Pages, LIBRARY_ID, Id) VALUES (...);

UPDATE [Book] SET LIBRARY_ID = ... WHERE Id = ...;

What to pick instead of List?

It's simple to write that lists are evil. And what now, what to pick instead? There are three options.

1. At almost all cases, you really don't need List. See next chapter to find out which instance of collection to use.

List's index declares and defines list's order. At almost all cases, the list can be sorted according to any item's property. Does it make any sense to sort the list according to when items were added? I think it doesn't. I think that list needs to be sorted according to item's creation date, item's size, count's of item's replies etc., simply according to any item's property.

As our example defines, list of rentals should be sorted according to rental's start date, see mapping:

HasMany(b => b.Rentals).
  Access.CamelCaseField().
  AsBag().
  Inverse().
  KeyColumn("BOOK_ID").
  ForeignKeyConstraintName("FK_RENTAL_BOOK_ID").
  ForeignKeyCascadeOnDelete().
  LazyLoad().
  OrderBy("StartOfRental").
  Cascade.AllDeleteOrphan();

2. You need to sort list according to index and you can move index to child entit. Map it as a standard private property and than sort the whole collection according the property - see previous example.

See example of index property placed at children:

private int OrderIndex {
    get { return Book.Rentals.IndexOf(this); }
    set { }
}

The drawback is also that you need to expose parent's children list (instead of ICollection or IEnumerable) because child has to be able to find out it's position within list.

3. You need to sort list according to index, you can't change domain model and it's impossible to sort items according to any item's property. There is no other choice than simply live with non-inverse collection along with it's all disadvantages described above.

Recommended solution how to use collection when using NHibernate persistence

Parent entity contains Add and Remove method for each collection
Use ICollection as collection interface
If you need to expose the whole colletion use IEnumerable - it's just read-only collection, support sequential fetching, etc.

Here is code snippet of Book entity.

public class Book {

    private ICollection<Rental> rentals = new List<Rental>();
    ...
    public void AddRental(String rentee) {
        ...
        rentals.Add(new Rental(this, rentee));
    }
    ...
    public IEnumerable<Rental> Rentals {
        get { return rentals; }
    }
}

Summary
You should be aware of List usage along with NHibernate persistence. It can bring you serious performance issues. Try to re-design you collections to be index independent and provide sorting approach by another than simple index way.

Examples at github.com
All examples are placed at github.com, see: https://github.com/MartinPodval/eu.podval/tree/master/NHibernatePerformanceIssues

You can also see Series of .NET NHibernate performance issues to read all series articles.

Git on Windows: MSysGit

I have started to use Git today. I read a lot of discussions that there is no good tool for Windows platform. After forethought I have decided to used TortoiseGit. I also feared of difficult work related with Git as a lot of articles mentioned many instructions. As I already said, I have decided to use TortoiseGit, because I'm used to work with TortoiseSvn, but for start, MSysGit is enought. So this article is about MSysGit, next will be about TortoiseGit.

How to start with MSysgit on local machine?

Download and install Git for Windows
Create source code directory for your git app
Right click the directory at your favorite file browser. Menu should contain item "Git init here". It initializes chosen directory to be git-abled :-)

It was your first usage of Git.

Commit data to local Git repository

Now, you can add any file, your first source code, to created directory. If you are prepared to commit any changes to your local git repository, follow next instructions.

Right-click the directory.
Choose "Git Add All file now". The command adds all files to git control. No commit so far, lets say that it's similar to subversion's "Add file"
Right-click again, now choose "Git gui" item to start graphical GIT client.
You should see your changes prepared to commit to local git storage. You can write your comment and commit changes.

Push data to remote server - github.com

As I said, your commit so far affected only local repository. If you decided to share (or backup) your sources to real server, thats another story :-)

To do that, I have assigned to free-for-public-projects github.com hosting service. It seems really good, very nice web UI. After your registration, you obviously need to create any new repository, just follow website's instructions.

How to push data?

First of all, you need to create any SSH public key. Thats simple way how to trust you. MSysGit can generate key for you - Help -> Show SSH Key -> Generate Key.
Insert the key to your profile at github.com. Click at "Account settings", choose "SSH Public Keys". Now, your key is synchronized.
Lets change focus back to MSysGit GUI to upload repository to server.
Select "Remote" item at menu, choose "Add remote". Fill the location you have already get when you created repository. You can find the url placed at repository overview page. Url has following pattern: git@github.com:{your-user-name/{project-name}.git. After successful filling, the remote github.com repository is paired with your local git repository.
Now, you are able to push the data to remote server. Choose "Remote" -> "Push" and confirm the dialog.

I'm really interested in how I will like the work with Git and right now I really don't understand how potential conflicts are solved, I'll probable see later :-)

Martin Podval' Log

Menu

Wednesday, November 17, 2010

NHibernate performance issues #2: slow cascade save and update (flushing)

Tuesday, November 16, 2010

HD2 Energy rom + CHT editor 2.0 = still unstable

Tuesday, November 9, 2010

Series of .NET NHibernate performance issues articles

NHibernate performance issues #1: evil List (non-inverse relationhip)

Friday, November 5, 2010

Git on Windows: MSysGit

About

Blog Archive

Categories