Preface
We spend last five years on HP Service Virtualization using MsSQL database. Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time - we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved - we had to learn a lot of stuff.
Transactions, lock escalation, isolation levels, clustered and non clustered indexes, buffered reading, index structure and it's persistence, GUID ids in clustered indexes, bulk importing, omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database.
Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavior is the feeling I have once I did all this with my DB specialist colleague. When we started to care about the performance more and more we must have understood those features and we usually took out default behavior and plug-in some special one - we could have done that as we understood essentials of the system. This is the way.
As I've described those features above, a lot of NoSQL storages allow to use them on your own. You can easily provide inverted index in key-value store even if the database itself does not support such functionality. What you need to do is to write error-less code and maintain indexes by your own code. Look at twitter example for famous key-value store Redis. It is not so simple as three/four tables in MySql with build-in indexing, isn't it?
Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavior is the feeling I have once I did all this with my DB specialist colleague. When we started to care about the performance more and more we must have understood those features and we usually took out default behavior and plug-in some special one - we could have done that as we understood essentials of the system. This is the way.
As I've described those features above, a lot of NoSQL storages allow to use them on your own. You can easily provide inverted index in key-value store even if the database itself does not support such functionality. What you need to do is to write error-less code and maintain indexes by your own code. Look at twitter example for famous key-value store Redis. It is not so simple as three/four tables in MySql with build-in indexing, isn't it?
There is nothing bad about SQL databases. I still believe that they fit the purpose of 80% of programs as they are best for difficult querying and/or less than massive scaling. But who need real massive scaling? Everyone?
Why we are going to use NoSQL?
Here in this article and the whole series, I'm going to evaluate certain attributes of storages for persistence of binary data of variable length. The goal is to persist network traffic in binary data format.
No relation decomposition, no querying of particular attributes of particular messages. No implicit transactions complying with ACID. No indexes and so on. You can find the reasons in the preface, we just hope that we know what we do :-)
Cassandra
The first evaluated storage is Cassandra. It's famous NoSQL database, commercial support provides virtual confidence for fast support and so on.
Cassandra is somewhere in the middle between schemaless and schemaware (SQL like) databases as it supports schema and CQL query language. I was pretty surprised when I have saw features like time series with dynamic columns, build-in lists and sets. Next good stuff is OpsCenter monitoring tool, I always show some screenshots from the application when I want to demonstrate easy-of-use monitor for some distributed app. Pretty nice.
Hardware
Hardware used within the tests:
- DB cluster:
- HP Proliant BL460c gen 8, 32core 2.6 GHZ, 32GB RAM, W12k server
- HP Proliant BL460c gen 8, 32core 2.6 GHZ, 192GB RAM, ubuntu 12 server
- Tests executor:
- xeon, 16 cores, 32gb ram, w8k server
- 10Gbps network
- Java 7
Performance Measurement of Cassandra Cluster
The setup used one seed and one replica, java tests use datastax driver. Replication factor two when not explicitly defined. After some issues:
#GC and #cassandra ...ufff :-( pic.twitter.com/ohTciftP1e
— Martin Podval (@MartinPodval) February 17, 2014
... my final CQL create script was:
CREATE COLUMNFAMILY recording ( PARTKEY INT, ID uuid, VS_ID uuid, URL ascii, MESSAGE_TIME BIGINT, DATA BLOB, PRIMARY KEY((PARTKEY, VS_ID), ID) );
Batch Size
Datastax driver allows to use queries in batches, how does it scale?
256 rows seems like best size of the batch.
Number of Connections
The tests utilizes variable count of connections to the Cassandra cluster. Note that the batch size was 256 and consistency was set to ONE.
You can see that 2 or 4 connections per one client are enough whilst one connection slows down the overall performance.
Number of Partitions
The table contains partition key, see the CQL code snippet. The test generates variable values for the partition key. Batch size was also 256 and consistency one.
Partition key says where the data will reside, it defines target machine. As two nodes are involved in my tests, its better to use more than one partition.
Variable Consistency
Tuning of consistency is the most beautiful attribute in Cassandra. You can define consistency level in every aspect of the communication with cassandra driver in any language. There are different levels, see the documentation.
The test used 8 partitions, 8 connections and 256 batch size.
Well, I'm surprised. There is almost no difference between consistency levels for such blob size despite totally different purpose or implementation. This could happen due to almost ideal environment, very fast network and machines as well.
Variable Replicas
I used two replicas by default but this tests tries to reveal the price for second copy of the data.
There is apparent penalty for second replica.
Long Running
Previous tests used a few millions of inserts/rows but the current one uses hundred of millions. This is the way how to definitely omit the advantage of various caches and clever algorithms. This simulates heavy weight load test to the server.
8 partitions, 8 connections, 256 batch size.
8 partitions, 8 connections, 256 batch size.
Large Messages
The largest message in these tests has 20 kb so far, but what is the behavior when our app would meet some large file as a attachment?
Consistency One, 2 partitions, 2 connections, 2 batch size.
Occupied Space
Apache Cassandra automatically enables compression since 1.1 version.
Here is the structure of serialized entity.
CREATE COLUMNFAMILY recording (
PARTKEY INT,
ID uuid,
VS_ID uuid,
URL ascii,
MESSAGE_TIME bigint,
DATA blob,
PRIMARY KEY((PARTKEY, VS_ID), ID)
);
Conclusion
This kind of testing is almost opposite in comparison to neflix tests. They analyzed scalability but we have done testing of simple cluster having two machines only. We need to figure out the performance of one node (obviously in cluster deployment).
I was impressed as the numbers above are good, the overall throughput on the network reached hundred mbps coming out from one client.
What next. Same kinds of tests agains Kafka, Redis, Couchbase, Aerospike, Couchbase.