Sunday, October 2, 2011

Book: The Art of Unit Testing: With Examples in .Net

I finished the reading of The Art of Unit Testing: With Examples in .Net book before a week. The book was categorized as an agile book so you can forget as the book is something like the bible of unit testing. It just summarizes the agile and pragmatic point of view to unit testing more then programmer's manual how to write correctly unit tests.

First chapters define unit testing basis, point to differences between unit and integration testing. The problem is that such text doesn't highlight those differences enough and the reader couldn't understand why the isolation is so much necessary. It's followed by the expression of stubs and mocks and the list of mock frameworks.

The rest of the content is sort of novel about importance of unit testing, the purpose of testing in the company. The most valuable chapter is almost last one which describes how to implement unit testing as a new feature in your company. Such text is applicable for any new technology you want to use in your business.

As I've already said, this book is sort of novel about unit testing, one text which can help you to form your point of view why and how to unit testing, that's all. There is lack of facts important to understand how and why to use isolation frameworks, what can happen when you would write w/o that - which happen in my company anyway :-) - how you would "unit test" any class etc.

Just read the book to complete your glance of unit testing but not suitable for unit testing novices.

Sunday, June 5, 2011

Using of GUID as primary key in MS SQL is antipattern

We've decided to persist our domain model into database probably a year ago so beside the xml persistence our product uses any database system as a primary storage. The next step was key failure because we've designed the database schema according to current state of our domain model. Why not as NHibernate is so powerful that you are able to map almost every relation type, every scope of accessibility etc.

Beside slow performance, I've already written those articles before, which we've fixed yet the another major issue appeared before a few months. The performance of MS SQL database during continuous insertion of medium-large rows slows down until the system appears almost dead. Why?

As I've already written above, we've decided to not adjust the model according common database recommendations and just persist it. The lifecycle of our domain model needn't be usually coupled with database persistence. So all primary keys in all domain entities were absolutely unique GUIDs.

It doesn't sound so dramatic, we've hoped that it would only occupied more space than simple integer and maybe slightly slows the performance but now I know that choose GUID as primary key is terrible decision.

GUID as primary key decreases performance a lot

I've made PoC how to increase the performance of a few complicated selects statements. They were no "join hell" but they were slow according to our acceptance criteria. I've discover clustered index as very powerful feature which can improve performance itself a lot. I've made further inquiry and found that simple select above sorted table using clustered index is 35% faster with using int as indexed key with respect to previous GUID. Very nice!

Notice that the index was already ordered, for one milion of rows it took almost four minutes so it was useless for us anyway.

GUID as primary key kills the insertion performance

I believe that you can tune the performance for the all time so 35% up was great but expected when you start to play with all those nice profiler apps. The worst issue we've met was that already referenced above.

The primary key was unique GUID generated during the model creation. The consequent inserts just use those IDs and use them in the particular insert, no big deal. The blocker problem was absolutely heavy decreasing performance of those inserts. Lets look at real numbers: first thousand of inserts took approximately one second, when the database table contains almost one million of rows, one thousand of additional inserts took almost 20 seconds!

The database was dead, processor was not doing anything, hard drive as well. No dead-lock was found, it just slept. The first suspicious was cluster primary key, it's cut-off only postponed the problem. After a few tests we've discovered that the clue is primary key. Cut-off the primary key notation returns the performance into original and requested numbers. Unfortunately our system also reads during these insertions so we are unable to switch all keys off.

The current solution is to rewrite model to use auto-generated integer primary keys which also clarified our view to domain driven design's value object to avoid the meaningless use of entity type of object everywhere it's possible.

Do not use GUID within database

I know that there are certainly places where GUID approach is usable but be aware of use them as any key. It:
  • takes significantly more space (4 bytes vs. 38 bytes), it can be serious issue when you use database with space limit, e.g. MS SQL Express
  • kills the database during the insertion
  • slows the performance of select clause, 35% down with respect to int
  • slows joins to tables 
  • almost exclude your application from the use of clustered index