Skip to main content

Using of GUID as primary key in MS SQL is antipattern

We've decided to persist our domain model into database probably a year ago so beside the xml persistence our product uses any database system as a primary storage. The next step was key failure because we've designed the database schema according to current state of our domain model. Why not as NHibernate is so powerful that you are able to map almost every relation type, every scope of accessibility etc.

Beside slow performance, I've already written those articles before, which we've fixed yet the another major issue appeared before a few months. The performance of MS SQL database during continuous insertion of medium-large rows slows down until the system appears almost dead. Why?

As I've already written above, we've decided to not adjust the model according common database recommendations and just persist it. The lifecycle of our domain model needn't be usually coupled with database persistence. So all primary keys in all domain entities were absolutely unique GUIDs.

It doesn't sound so dramatic, we've hoped that it would only occupied more space than simple integer and maybe slightly slows the performance but now I know that choose GUID as primary key is terrible decision.

GUID as primary key decreases performance a lot

I've made PoC how to increase the performance of a few complicated selects statements. They were no "join hell" but they were slow according to our acceptance criteria. I've discover clustered index as very powerful feature which can improve performance itself a lot. I've made further inquiry and found that simple select above sorted table using clustered index is 35% faster with using int as indexed key with respect to previous GUID. Very nice!

Notice that the index was already ordered, for one milion of rows it took almost four minutes so it was useless for us anyway.

GUID as primary key kills the insertion performance

I believe that you can tune the performance for the all time so 35% up was great but expected when you start to play with all those nice profiler apps. The worst issue we've met was that already referenced above.

The primary key was unique GUID generated during the model creation. The consequent inserts just use those IDs and use them in the particular insert, no big deal. The blocker problem was absolutely heavy decreasing performance of those inserts. Lets look at real numbers: first thousand of inserts took approximately one second, when the database table contains almost one million of rows, one thousand of additional inserts took almost 20 seconds!

The database was dead, processor was not doing anything, hard drive as well. No dead-lock was found, it just slept. The first suspicious was cluster primary key, it's cut-off only postponed the problem. After a few tests we've discovered that the clue is primary key. Cut-off the primary key notation returns the performance into original and requested numbers. Unfortunately our system also reads during these insertions so we are unable to switch all keys off.

The current solution is to rewrite model to use auto-generated integer primary keys which also clarified our view to domain driven design's value object to avoid the meaningless use of entity type of object everywhere it's possible.

Do not use GUID within database

I know that there are certainly places where GUID approach is usable but be aware of use them as any key. It:
  • takes significantly more space (4 bytes vs. 38 bytes), it can be serious issue when you use database with space limit, e.g. MS SQL Express
  • kills the database during the insertion
  • slows the performance of select clause, 35% down with respect to int
  • slows joins to tables 
  • almost exclude your application from the use of clustered index 

Comments

Nikola Malovic said…
If we speak about *normal* Guids - you are right, but if we speak about sequential ones (generated by SQL server with newsequetialid() or COMB.guid (used by NHibernate or genrated manually) then there's no real performance hit occuring by usage of guids neither in seek nor in insert operations.

That been said even without any perf hits occuring, GUIDS have disadvantage in amount of space they occupy and the fact they are nto human readable and thus was HiLo placed. They makes sense in scenarios (like the one I have) where you merge in cloud the data of multiple different databases of the disconected clients (kind a like Git approach) in which case they are really great.

Very old (perf measured on SQL 2000) but still very valid article on the subject of guid performance http://www.informit.com/articles/article.aspx?p=25862&seqNum=7

Popular posts from this blog

Performance Battle of NoSQL blob storages #1: Cassandra

Preface We spend last five years on HP Service Virtualization using MsSQL database . Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time - we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved - we had to learn a lot of stuff. Transactions, lock escalation , isolation levels , clustered and non clustered indexes, buffered reading, index structure and it's persistence, GUID ids in clustered indexes , bulk importing , omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database. Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavio

NHibernate performance issues #3: slow inserts (stateless session)

The whole series of NHibernate performance issues isn't about simple use-cases. If you develop small app, such as simple website, you don't need to care about performance. But if you design and develop huge application and once you have decided to use NHibernate you'll solve various sort of issue. For today the use-case is obvious: how to insert many entities into the database as fast as possible? Why I'm taking about previous stuff? The are a lot of articles how the original NHibernate's purpose isn't to support batch operations , like inserts. Once you have decided to NHibernate, you have to solve this issue. Slow insertion The basic way how to insert mapped entity into database is: SessionFactory.GetCurrentSession().Save(object); But what happen when I try to insert many entities? Lets say, I want to persist 1000 libraries each library has 100 books = 100k of books each book has 5 rentals - there are 500k of rentals  It's really slow! The inser

Java, Docker, Spring boot ... and signals

I spend last couple weeks working on java apps running within docker containers deployed on clustered CoreOS machines . It's pretty simple to run java app within a docker container. You just have to choose a base image for your app and write a docker file. Note that docker registry contains many java distributions usually based on open jdk. We use our internal image for Oracle's Java 8 , build on top of something like this docker file . Once you make a decision whether oracle or openjdk, you can start to write your own docker file. FROM dockerfile/java:oracle-java8 ADD your.jar /opt/your-app ADD /dependencies /opt/your-app/dependency WORKDIR /opt/your-app CMD ["java -jar /opt/your-app/your.jar"] However, your app would probably require some parameters. Therefore, last line usually calls your shell script. Such script than validates number and format of those parameters among other things. This is also useful during the development phase because none of us