Performance Battle of NoSQL blob storages #1: Cassandra

Preface

We spend last five years on HP Service Virtualization using MsSQL database. Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time – we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved – we had to learn a lot of stuff.

Transactions, lock escalation, isolation levels, clustered and non clustered indexes, buffered reading, index structure and it’s persistence, GUID ids in clustered indexes, bulk importing, omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database.

Well, as I’m thinking about all issues which we encountered during the development, unpredictive behavior is the feeling I have once I did all this with my DB specialist colleague. When we started to care about the performance more and more we must have understood those features and we usually took out default behavior and plug-in some special one – we could have done that as we understood essentials of the system. This is the way.

As I’ve described those features above, a lot of NoSQL storages allow to use them on your own. You can easily provide inverted index in key-value store even if the database itself does not support such functionality. What you need to do is to write error-less code and maintain indexes by your own code. Look at twitter example for famous key-value store Redis. It is not so simple as three/four tables in MySql with build-in indexing, isn’t it?

There is nothing bad about SQL databases. I still believe that they fit the purpose of 80% of programs as they are best for difficult querying and/or less than massive scaling. But who need real massive scaling? Everyone?

Why we are going to use NoSQL?

Here in this article and the whole series, I’m going to evaluate certain attributes of storages for persistence of binary data of variable length. The goal is to persist network traffic in binary data format.
No relation decomposition, no querying of particular attributes of particular messages. No implicit transactions complying with ACID. No indexes and so on. You can find the reasons in the preface, we just hope that we know what we do 🙂

Cassandra

The first evaluated storage is Cassandra. It’s famous NoSQL database, commercial support provides virtual confidence for fast support and so on.
Cassandra is somewhere in the middle between schemaless and schemaware (SQL like) databases as it supports schema and CQL query language. I was pretty surprised when I have saw features like time series with dynamic columns, build-in lists and sets. Next good stuff is OpsCenter monitoring tool, I always show some screenshots from the application when I want to demonstrate easy-of-use monitor for some distributed app. Pretty nice.

Hardware

Hardware used within the tests:
  • DB cluster:
    • HP Proliant BL460c gen 8, 32core 2.6 GHZ, 32GB RAM, W12k server
    • HP Proliant BL460c gen 8, 32core 2.6 GHZ, 192GB RAM, ubuntu 12 server
  • Tests executor:
    • xeon, 16 cores, 32gb ram, w8k server
  • 10Gbps network
  • Java 7

Performance Measurement of Cassandra Cluster

The setup used one seed and one replica, java tests use datastax driver. Replication factor two when not explicitly defined. After some issues:

//platform.twitter.com/widgets.js

… my final CQL create script was:


  1. CREATE COLUMNFAMILY recording (



  2. PARTKEY INT,



  3. ID uuid,



  4. VS_ID uuid,



  5. URL ascii,



  6. MESSAGE_TIME BIGINT,



  7. DATA BLOB,



  8. PRIMARY KEY((PARTKEY, VS_ID), ID)



  9. );



Batch Size

Datastax driver allows to use queries in batches, how does it scale?

Blob size \ Batch Size [TPS] 128 256 512 1024 8196 32768
100 bytes (8 clients) 67 100 63 900 55 400 55 800 23 500 Timeouted
1 kb (4 clients) 15 600 20 600 20 300 19 400 5 500 N/A
5 kb (1 client) N/A 8 300 9 200 10 000 3 000

256 rows seems like best size of the batch.

Number of Connections

The tests utilizes variable count of connections to the Cassandra cluster. Note that the batch size was 256 and consistency was set to ONE.

Blob size \ Connections [TPS] 1 2 4 8 24
100 b 27 000 53 000 75 000 62 000 62 000
1 kb 17 900 31 000 42 000 40 000
5 kb 9 000 14 600 16 000 10 300
20 kb 3 300 4 500 3 800

You can see that 2 or 4 connections per one client are enough whilst one connection slows down the overall performance.

Number of Partitions

The table contains partition key, see the CQL code snippet. The test generates variable values for the partition key. Batch size was also 256 and consistency one.

Blob size \ Partitions [TPS] 1 8 64
100 b 58 000 71 000 70 000
10 kb 13 200 9100 7 300
Partition key says where the data will reside, it defines target machine. As two nodes are involved in my tests, its better to use more than one partition.

Variable Consistency

Tuning of consistency is the most beautiful attribute in Cassandra. You can define consistency level in every aspect of the communication with cassandra driver in any language. There are different levels, see the documentation.

The test used 8 partitions, 8 connections and 256 batch size.

Blob size \ Consisency [TPS] Any One Two
100 b 67 000 66 000 66 000

Well, I’m surprised. There is almost no difference between consistency levels for such blob size despite totally different purpose or implementation. This could happen due to almost ideal environment, very fast network and machines as well.

Variable Replicas

I used two replicas by default but this tests tries to reveal the price for second copy of the data.

Blob size \ Replicas [TPS] 1 2
100 b 121 000 72 000
10 kb 19 200 8 200

There is apparent penalty for second replica.

Long Running

Previous tests used a few millions of inserts/rows but the current one uses hundred of millions. This is the way how to definitely omit the advantage of various caches and clever algorithms. This simulates heavy weight load test to the server.

8 partitions, 8 connections, 256 batch size.

Blob size \ Replicas+Consistency [TPS] 1+ONE 2+ONE 2+TWO
100 b (1 billion) 121 000 63 000 62 00
10 kb (30 m) 8 200

Large Messages

The largest message in these tests has 20 kb so far, but what is the behavior when our app would meet some large file as a attachment?
Consistency One, 2 partitions, 2 connections, 2 batch size.

Blob size  [TPS]
500 kb 90
5 mb 9.5

Occupied Space

Apache Cassandra automatically enables compression since 1.1 version.

Blob size \ [Occupied space in bytes per message]
100b 89b

Here is the structure of serialized entity.

CREATE COLUMNFAMILY recording (
    PARTKEY INT,
    ID uuid,
    VS_ID uuid,
    URL ascii,
    MESSAGE_TIME bigint,
    DATA blob,
    PRIMARY KEY((PARTKEY, VS_ID), ID)
);

Conclusion

This kind of testing is almost opposite in comparison to neflix tests. They analyzed scalability but we have done testing of simple cluster having two machines only. We need to figure out the performance of one node (obviously in cluster deployment).
I was impressed as the numbers above are good, the overall throughput on the network reached hundred mbps coming out from one client. 
What next. Same kinds of tests agains Kafka, Redis, Couchbase, Aerospike, Couchbase.

2 thoughts on “Performance Battle of NoSQL blob storages #1: Cassandra

  1. Thanks for nice desc; i doubt though when you say batch means all the data size in batch or one val size of key)? if i want to store 1GB of blob into value of one key how does cassandra behave ? first how will we push one GB data into cassandra for one key CQL is failing while doing that. thx in advance

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this:
search previous next tag category expand menu location phone mail time cart zoom edit close