Wednesday, July 30, 2014

Performance Battle of NoSQL blob storages #1: Cassandra

Martin Podval

Preface

We spend last five years on HP Service Virtualization using MsSQL database. Non-clustered server. Our app utilizes this system for all kinds of persistence. No polyglot so far. As we tuned the performance of the response time - we started at 700ms/call and we achieved couple milliseconds per call at the end when DB involved - we had to learn a lot of stuff.

Transactions, lock escalation, isolation levels, clustered and non clustered indexes, buffered reading, index structure and it's persistence, GUID ids in clustered indexes, bulk importing, omit slow joins, sparse indexes, and so on. We also rewrite part of NHibernate to support multiple tables for one entity type which allows use scaling up without lock escalation. It was good time. The end also showed us that famous Oracle has half of our favorite features once we decided to support this database.

Well, as I'm thinking about all issues which we encountered during the development, unpredictive behavior is the feeling I have once I did all this with my DB specialist colleague. When we started to care about the performance more and more we must have understood those features and we usually took out default behavior and plug-in some special one - we could have done that as we understood essentials of the system. This is the way.

As I've described those features above, a lot of NoSQL storages allow to use them on your own. You can easily provide inverted index in key-value store even if the database itself does not support such functionality. What you need to do is to write error-less code and maintain indexes by your own code. Look at twitter example for famous key-value store Redis. It is not so simple as three/four tables in MySql with build-in indexing, isn't it?

There is nothing bad about SQL databases. I still believe that they fit the purpose of 80% of programs as they are best for difficult querying and/or less than massive scaling. But who need real massive scaling? Everyone?

Why we are going to use NoSQL?

Here in this article and the whole series, I'm going to evaluate certain attributes of storages for persistence of binary data of variable length. The goal is to persist network traffic in binary data format.

No relation decomposition, no querying of particular attributes of particular messages. No implicit transactions complying with ACID. No indexes and so on. You can find the reasons in the preface, we just hope that we know what we do :-)

Cassandra

The first evaluated storage is Cassandra. It's famous NoSQL database, commercial support provides virtual confidence for fast support and so on.

Cassandra is somewhere in the middle between schemaless and schemaware (SQL like) databases as it supports schema and CQL query language. I was pretty surprised when I have saw features like time series with dynamic columns, build-in lists and sets. Next good stuff is OpsCenter monitoring tool, I always show some screenshots from the application when I want to demonstrate easy-of-use monitor for some distributed app. Pretty nice.

Hardware

Hardware used within the tests:
  • DB cluster:
    • HP Proliant BL460c gen 8, 32core 2.6 GHZ, 32GB RAM, W12k server
    • HP Proliant BL460c gen 8, 32core 2.6 GHZ, 192GB RAM, ubuntu 12 server
  • Tests executor:
    • xeon, 16 cores, 32gb ram, w8k server
  • 10Gbps network
  • Java 7

Performance Measurement of Cassandra Cluster

The setup used one seed and one replica, java tests use datastax driver. Replication factor two when not explicitly defined. After some issues:


... my final CQL create script was:
  1. CREATE COLUMNFAMILY recording (
  2. PARTKEY INT,
  3. ID uuid,
  4. VS_ID uuid,
  5. URL ascii,
  6. MESSAGE_TIME BIGINT,
  7. DATA BLOB,
  8. PRIMARY KEY((PARTKEY, VS_ID), ID)
  9. );

Batch Size


Datastax driver allows to use queries in batches, how does it scale?

Blob size \ Batch Size [TPS]1282565121024819632768
100 bytes (8 clients)67 10063 90055 40055 80023 500Timeouted
1 kb (4 clients)15 60020 60020 30019 4005 500N/A
5 kb (1 client)N/A8 3009 20010 0003 000

256 rows seems like best size of the batch.

Number of Connections


The tests utilizes variable count of connections to the Cassandra cluster. Note that the batch size was 256 and consistency was set to ONE.


Blob size \ Connections [TPS]124824
100 b27 00053 00075 00062 00062 000
1 kb17 90031 00042 00040 000--
5 kb9 00014 60016 00010 300--
20 kb3 3004 5003 800

You can see that 2 or 4 connections per one client are enough whilst one connection slows down the overall performance.

Number of Partitions


The table contains partition key, see the CQL code snippet. The test generates variable values for the partition key. Batch size was also 256 and consistency one.

Blob size \ Partitions [TPS]1864
100 b58 00071 00070 000
10 kb13 20091007 300

Partition key says where the data will reside, it defines target machine. As two nodes are involved in my tests, its better to use more than one partition.

Variable Consistency


Tuning of consistency is the most beautiful attribute in Cassandra. You can define consistency level in every aspect of the communication with cassandra driver in any language. There are different levels, see the documentation.

The test used 8 partitions, 8 connections and 256 batch size.

Blob size \ Consisency [TPS]AnyOneTwo
100 b67 00066 00066 000

Well, I'm surprised. There is almost no difference between consistency levels for such blob size despite totally different purpose or implementation. This could happen due to almost ideal environment, very fast network and machines as well.

Variable Replicas


I used two replicas by default but this tests tries to reveal the price for second copy of the data.

Blob size \ Replicas [TPS]12
100 b121 00072 000
10 kb19 2008 200

There is apparent penalty for second replica.

Long Running


Previous tests used a few millions of inserts/rows but the current one uses hundred of millions. This is the way how to definitely omit the advantage of various caches and clever algorithms. This simulates heavy weight load test to the server.

8 partitions, 8 connections, 256 batch size.

Blob size \ Replicas+Consistency [TPS]1+ONE2+ONE2+TWO
100 b (1 billion)121 00063 00062 00
10 kb (30 m)--8 200--

Large Messages


The largest message in these tests has 20 kb so far, but what is the behavior when our app would meet some large file as a attachment?

Consistency One, 2 partitions, 2 connections, 2 batch size.

Blob size  [TPS]
500 kb90
5 mb9.5

Occupied Space


Apache Cassandra automatically enables compression since 1.1 version.


Blob size \ [Occupied space in bytes per message]
100b89b

Here is the structure of serialized entity.

CREATE COLUMNFAMILY recording (
    PARTKEY INT,
    ID uuid,
    VS_ID uuid,
    URL ascii,
    MESSAGE_TIME bigint,
    DATA blob,
    PRIMARY KEY((PARTKEY, VS_ID), ID)
);

Conclusion

This kind of testing is almost opposite in comparison to neflix tests. They analyzed scalability but we have done testing of simple cluster having two machines only. We need to figure out the performance of one node (obviously in cluster deployment).

I was impressed as the numbers above are good, the overall throughput on the network reached hundred mbps coming out from one client. 

What next. Same kinds of tests agains Kafka, Redis, Couchbase, Aerospike, Couchbase.

2 komentářů:

About me

Powered by Blogger.