Friday, November 20, 2015

Building go-lang project /Kubernetes Ingress/ from scratch with no go experience

Martin Podval
I have been working with Kubernetes and I wanted to build it's contrib yesterday. However, nginx implementation of Kubernetes' Ingress is written in go-lang. Even though I needed to change some const string, it required recompilation.

Go is not java, and go building system is not maven. Setting up the environment was not straightforward. I was facing couple troubles but I'm going to take it from the beginning. My laptop uses ubuntu 15.04 - well 15.10 since 9pm :-) - and I have never installed go lang yet.

Go lang installation on Ubuntu

First of all, you need to install go lang. You can use official repo, but it contains older version 1.3. However, do not install it using apt-get as kubernetes or it's dependencies require higher version of go-lang. Of course, I originally installed version 1.3  but some fatal error occurred later. It forced me do to the manual installation anyway.

Here is simple tutorial.


Last two lines affect only the current terminal. You should update /etc/environment in the case you consider to use go-lang in the future.

Then, you can try to get version of go via:

$ go version
go version go1.5.1 linux/amd64

Go-lang compilation and building


Now, you can pull your git project, e.g.:

git clone https://github.com/kubernetes/contrib

There is a build command in go-lang. You can type it into you terminal:


What's wrong? We have not setup GOPATH yet. This directory is something like maven cache. It's pretty simple, you can just point the variable to your home, e.g.
export GOPATH=/home/martin/.go
Try to build it again. There are no errors, but nothing is happening. I found in some stackoverflow discussion that there is a get command as well. Unfortunately, build is consist of get anyway. Then I found -v as verbose.


Actually, we are on right track. The fetching of dependencies takes a time so it seemed that go build or go get did not work but that's not true.

That's all, now building should work:


Go-lang 1.3 vs 1.5

I just wanted to discuss why to prefer manual installation over ubuntu repository. For the record. Building kubernetes project using go-lang 1.3 from ubuntu official repo failed:

Tuesday, November 17, 2015

Using CoreOS stack and Kubernetes #2: Why use CoreOS as Cloud Operating System

Martin Podval
I'd like to deal in this part with potential benefits resulting from using CoreOS as an operating system in your cloud deployment. You can install kubernetes on various operating systems so you can make a decision what to choose. So why CoreOS? What is my experience?

Etcd, Fleet and Flannel Preinstalled

First reason is obvious. CoreOS always provides latest version of all components in Kubernetes cluster. 

My experience: we have profited from pre-installed components from the beginning. E.g. in early stages when etcd was coming with new beautiful and powerful API (v.2), they put both - old and new - versions together so we just enabled one of them. The setup of all components together is not very simple so you can save couple hours by choosing preinstalled and pre-setuped CoreOS.

No Package Manager, Read Only Partitions

It sounds more like disadvantage than benefit, but ...

Look at CoreOS releases what it consist of.

Fore example, CoreOS includes basic linux utils so you can employ many popular command line tools. But it's not recommended to install anything else. Take what is installed and all machines within the cluster can be easily added, removed and/or replaced. All parts of your application are supposed to be distributed as docker containers.

CoreOS installation also use a concept of nine disk partitions. Some of them are read only, some of them contain operating system. This forces an administrator to keep mutable data on one of them. This, again, improves node replaceability.

My experience: this is great for operations. It's matter of few seconds to add a new node. However, it's sometimes tough to work with CoreOS when you are used to rely on some tools, like htop. Speaking of which, there is nothing against manual download anyway, e.g. via the cloud config.

Online Updates

There is a great update methodology. You can setup a CoreOS node to do an automatic update. What does it mean in real?

You choose an update channel (alpha, beta, stable) and CoreOS does automatic checking of new versions as well. You can manually use tool update_engine_client to manage updates from command line. This is useful for debugging in early stages when you did not setup updates properly and they might fail.

Once the update engine detects a new version, it immediately starts to download new bytes. There is a notion of active and passive partitions. The current boot runs from active partition, downloading uses passive one.

CoreOS needs a reboot to apply the new version of the operating system. However, consider running cluster of many and many nodes. What would happen when they downloaded new operating system version? They would reboot all together!

Here is locksmith tool. This stuff utilizes etcd persistent storage to do simple semaphore for all running and potentially rebooting CoreOs nodes. In short, this distributed lock guarantees that only one machine is being rebooted in a time.

My experience: this is one of best things on CoreOS. You are just subscribed on some channel with proper reboot strategy and your cluster is continually up-to-date. Either linux kernel, fleet or etcd, linux tool or newly added Kubelet.

We have also encountered problems with one of new versions of CoreOS. For examples, there was a new version of golang and docker started to hang once it finished an image pulling. You can manually rollback or downgrade CoreOS version back. This tutorial just switch current node to passive read-only disk partition with previous version of CoreOS.

Cloud Configuration File

It's always pretty long procedure to setup and configure a machine when it's just installed with fresh operating system. Therefore, CoreOS brings with concept of cloud config files.

The point is to have the only file which contains the whole configuration of a node.

I'll dedicate one chapter to this concept. However, it's usual to store following information in cloud configs:

  • setup CoreOS specifics, e.g. update channel, rebooting strategy etc.
  • adjust any systemd service
  • write files, like proxy setting, certificates etc.
  • setup node hostname
  • configure etcd, fleet, kubernetes or docker tools
My experience: it's pretty useful to have one cloud config for the whole cluster. You can put it to some storage, your git repository or artifactory. All nodes can take this instance and apply the content during it's boot. This guarantees that all nodes have same configuration. 

There is a lot of other useful things on CoreOS but these above were major. I'd like to dedicate next article to the installation.

Here is a link to the whole series.

Monday, November 9, 2015

Using CoreOS stack and Kubernetes #1: Introduction

Martin Podval
We were lucky enough in December 2014 to join the group of teams who use CoreOS stack and Kubernetes on their way to become next generation of cloud infrastructure. It has been almost one year so I'd like to provide a article series about our experience with the whole stack.

The Motivation

You usually want to model your business domain, provide useful APIs, break your application into pieces, services, and so on. Well, it's your work.

The distributed computing is one of most challenging disciplines in the computer science. Why is that? Because of an asynchronicity in the form of remote calls among distributed components. There are no locks like in your favorite languages. However, there are remote calls with no guarantees of any response or in any time.

It's pretty challenging to provide high-available application, with no downtime during updates, crashes. The application which scales according to the needs. The application with guarantees any data consistency.

What are typical questions and considerations when you start to build such app?

  • how can I run exactly 3 instances of a service in my app? 
  • how can I detect that some instance failed? 
  • how can I run a new replica instead of dead one? 
  • what if there are more than 3 instances because the dead replica was not so much dead and it's now back in the cluster? 
  • what if there are two replicas - dead one and new one - which process same part of the data? 
  • how can I guarantee that all replicas can see the same configuration? 
  • where can service B discover a link to running service A? 
  • where can service B discover new instance of service A because the first one failed? 
  • how can I install all that mess to one operating system? 
I could write many and many questions like these above. CoreOS and Kubernates allows you to address many of these questions.

CoreOS stack and Kubernates provide well tested but tiny platform for your cluster/cloud infrastructure. You can focus on your business not on the infrastructure.

Components

Here is a diagram how all tools fits together:



  • CoreOS :It's just very simple linux distribution prepared for cluster/cloud deployment.
  • Fleet is responsible for running units (services) on remote nodes
  • Etcd is distributed key value store using Raft consensus. The purpose is to store the configuration in namespaces. I've already wrote some articles about etcd
  • Flannel allows to provide private networking among nodes - or docker container in this case
  • Kubernetes uses all tools together to provide cluster management. You can describe your application via kubernetes descriptors and use kubectl or REST API to run, scale or fail-over your app. Obviously in cooperation with the application. One can say that it's PaaS for dockerized application. And (s)he would be right.

What should I read to become more familiar with all these?

If you would have only 30 minutes check out this video:



What's next?

I'd like to write article series about our experience with CoreOS and Kubernetes. I'd like to deal with the installation in next article.

Here is a link to the whole series.

Sunday, September 6, 2015

Apache Kafka Presentation for CZJUG

Martin Podval
Apache Kafka is famous technology these days. Being almost traditional messaging system from user point of view, it also supports scalability, high throughput and failover as well. I've already wrote an article.

Guys from Czech Java User Group gave me a chance to had a talk about Kafka. Here is a video from the talk in czech language.



Slides are also published on slideshare.

Sunday, July 26, 2015

Designing Key/Value Repository API with Java Optional

Martin Podval
I spent some time last month by defining our repository API. Repository is commonly component used by service layer in your application to persist the data. In the time of polyglot persistence, we use this repository design discussed in this article to persist business domain model - designed according to (our experience with) domain driven design.

Lessons Learned

We have large experience since we used nhibernate as a persistent framework in earlier product version. First, and naive, idea consist in allowing the programmers to write queries to the database on his own. Unfortunately the idea failed soon. This scenario heavily relied on a belief that every programmer knows how persistence/database work and s/he wants to write those queries effectively. It inevitably inflicted error-prone and inefficient queries. Essentially, nobody was responsible for the repositories because everyone contributed to them. Persistence components was just a framework.

The whole experience implies to design very strong and highly review-able API designed by technology-aware engineers. Usually with strong commitment to all dependent layers.

Technical Implications Affects the API

The API must obviously reflect functional requirements. They are what we want the repository to do. According to our experience, such API must also reflect technical and implementation implications. Basically, the design without knowing if the implementation will use SQL database or NoSQL Key/Value store or what are boundaries of domain aggregates will result to not efficient implementation.

To provide more realistic example, lets talk about address, consider it as an aggregate. The repository usually provides CRUD methods for the address. But what if there is a functional requirement to return only address' street? Should the API contain such method, e.g. get street by address id?

It depends on technical implementation:
  1. What is typical maximal size of serialized address, e.g. resulting json? Does it fit to one tcp packet traveling through network or does it fit to one read operation from hard drive on the storage node? I mean: does even make any sense to fetch partial entity contrary to full entity?
  2. How often is the street read and/or write? Read/write ratio.
    1. Is it better to duplicate the data - to store street separately and within the full json - as it's often read?
    2. Is it better to store the whole address together because of often updates outnumbering the reading?
Let say you will ignore these questions and provide all methods required from user points of view. You just allow to fetch street and address in two different methods. Let say there is also functional requirement to fetch zip code from the address. Developers who are not familiar with repository internals will typically use the method to fetch street followed by the fetch of zip code on the next line. That's because it's natural thinking: to compose methods on API. However, this is obviously inefficient because of two remote calls to the storage.

If you answer similar questions you can easily make the decision that the only reasonable implementation is to provide getAddress only - to return the whole address aggregate. All developers now have no other chance that to use this method and use address as a whole.

You just define the repository API in most efficient way, you just tell developers how to use underlying persistence.

Implemenation

Once we know what kind of methods to place on repository API, there are some implementation constraints it worth to mention.

Repository is not a Map

... so do not try to express CRUD methods like some remote (hash)map

Every programmer, and probably man himself, starves for patterns and solves problems according to his or her past/current experience. CRUD using key/value store sounds like an application of a map. This idea almost implies the repository interface can probably reflect map interface for both method arguments and returning types.

However, there are certain circumstances, you need keep in mind.

1. Error States

In-memory map just CRUD's or not. In case of GET, there is a key or not. Repository on the other hand does remote calls using unreliable network to unreliable (set of) node(s). Therefore there is a broad range of potential issues you can meet.

2. Degraded Access

Look at Maps' DELETE. The remove method returns an entity being removed. Well, in case of map, it's just fine. On the other hand, it seems like overhead in case of repository considering slow network access. I'm not saying anything about stuff like consensus or QUORUM evaluation. It's not cheap. I've also doubts whether someone would use this returning value. He just needs to remove an entity via identifier.

Excluding simple in-memory implementations, the repository methods usually perform one or more remote calls. Contrary to local in-memory calls, those remotes use slow network system under the hood. What is the implication? Considering GET method, there are other states than a key does exist/not-exist. Or, returning current value in the case of REMOVE a key can take a time.

Optimistic Locking

Basically, every our entity contains long version used for CAS-like operation - optimistic locking. Contention thus happens on storage system itself. It's up to the system or up to the query how and what to do. Especially in distributed system this is kind of problem you do not want to solve.

Most of NoSQL storages use light approach usually called compare-and-set. Redis itself supports non-blocking transactions via MULTI, EXEC and WATCH primitives. Cassandra uses different approach built in query language support.

Java Optional<T> as Returning Type

We have eventually decided to use use java's Optional<T> so our API does not return any null. However, there is one exception in method with tree-state resulting type. Here is nice discussion on stackoverflow regarding where to use and where do not to use this syntax.

However, the implementation later approved this idea as a good approach. The point here is that everyone who use a method with Optional returning type is much more aware of null state, or Optional.Empty for record. I found out during the refactoring that 40% of code which used previous repository version (in memory) did not handle null as valid returning type.

Generic Repository API Interface

We eventually ended up with following API.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/**
 * @throws timeout exception
 * @throws generic persistence exception
 */
interface Repository<T> {

    /**
     * @throws if entity already exists
     */
    void add(T entity);
    /**
     * @return {@link java.util.Optional} when found, {@link java.util.Optional#EMPTY} otherwise
     */
    Optional<T> get(Id id);

    /**
     * @return {@link java.util.Optional} when found and persisted version is different given <code>version</code>, 
     * {@link java.util.Optional#EMPTY} otherwise if there is no persisted entity for given <code>version</code>,
     * <code>null</code> when found and persisted <code>version</code> is same as provided one
     */
    Optional<T> getIfNotMatched(Id id, long version);

    boolean exist(Id id);

    /**
     * Persist given <code>entity</code> and increment it's version when succeeded
     * @throws stale entity exception when given entity' version is different than persisted one
     * @throws if entity not exist
     */
    void update(T entity);

    /**
     * Deletes the whole entity hierarchy including all children
     * @throws stale entity exception when given entity' version is different than persisted one
     * @throws if entity not exist
     */
    void delete(Id id, long version);
}

Wednesday, June 3, 2015

Java, Docker, Spring boot ... and signals

Martin Podval
I spend last couple weeks working on java apps running within docker containers deployed on clustered CoreOS machines. It's pretty simple to run java app within a docker container. You just have to choose a base image for your app and write a docker file.

Note that docker registry contains many java distributions usually based on open jdk. We use our internal image for Oracle's Java 8, build on top of something like this docker file. Once you make a decision whether oracle or openjdk, you can start to write your own docker file.

FROM dockerfile/java:oracle-java8
ADD your.jar /opt/your-app
ADD /dependencies /opt/your-app/dependency
WORKDIR /opt/your-app
CMD ["java -jar /opt/your-app/your.jar"]

However, your app would probably require some parameters. Therefore, last line usually calls your shell script. Such script than validates number and format of those parameters among other things. This is also useful during the development phase because none of us want to build and start an image always when something gets changed and you need to test it. So the last line of your docker file is usually similar to next snippet:

CMD ["/opt/your-app/start.sh"]

So far so good. Let say that I've just developed my code and made integration tests working. Once you are satisfied with your solution, you try something like this:

sudo docker run --name you-app-name your-image -e ENVIRONMENT_PARAM=VALUE ...
...
sudo docker stop your-app-name

It's pretty reasonable to use spring boot when your app is based on spring contexts. Thanks Dagi. You just put maven dependency to root pom.

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Spring boot omits boiler-plate code and does a lot of things you eventually should do too, see documentation. I based my Main class on CommandLineRunner and I have to admit that the application or main itself is one of shortest main methods I have ever wrote.

New problem appeared once I tried to stop application. My process should receive SIGTERM and than SIGKILL according to the documentation. Nice thing about spring boot is automatic shutdown hook registered within your application. Unfortunately application log content obtained via docker logs showed that there was no such thing like ...

will receive SIGTERM, and after a grace period, SIGKILL.
... the docker deamon just killed my app. Short research gave me a clue as signals are not distributed in child bash scripts. I finally found interesting article describing issues in java, shell scripts and signals in full details. As I was lucky developer, I just slightly changed last line of shell script to something like:

exec java -jar ...
This simple change allows graceful shutdown and my java app can now close running spring context. This transitively means that all registered auto-closable beans are now terminated according to all needs.

Wednesday, March 4, 2015

ETCD: POST vs. PUT understanding

Martin Podval
ETCD is distributed key value store used as a core component in CoreOS. I've already send a post earlier this week. Here is a page describing how to use ETCD basic commands = ETCD API. Code snippets placed in a page mostly use put, but ETCD allows to use post as well. Most of us understand differences between those two commands in a notion of a REST(ful) service, but how does it work in key value store?

POST

Example over many words.

curl -v http://127.0.0.1:2379/v2/keys/test -XPOST -D value="some value"
curl -v http://127.0.0.1:2379/v2/keys/test -XPOST -D value="some value"

Two same command result into following content:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "action": "get",
  "node": {
    "key": "/test",
    "dir": true,
    "nodes": [
      {
        "key": "/test/194",
        "value": "",
        "modifiedIndex": 194,
        "createdIndex": 194
      },
      {
        "key": "/test/195",
        "value": "",
        "modifiedIndex": 195,
        "createdIndex": 195
      }
    ],
    "modifiedIndex": 194,
    "createdIndex": 194
  }
}

So ETCD adds an index value and put it into resulting key - which is also path to the value. For instance:

curl -v http://127.0.0.1:2379/v2/keys/test/194 -XGET

Allows you to get the specific key. The index is explicitly expressed in the url.

PUT

Put command just add or update given key. Let say I would use following example:

curl -v http://127.0.0.1:2379/v2/keys/test -XPUT -D value="some value"

Resulting content on test key is expected.

1
2
3
4
5
6
7
8
9
{
  "action": "get",
  "node": {
    "key": "/test",
    "value": "",
    "modifiedIndex": 198,
    "createdIndex": 198
  }
}

How to Model Add and Update Method?

My current task is to model and implement repository using ETCD under the hood. Usual repository contains CRUD methods for particular set of entities. Reasonable approach is to separate add from update to do not replace existing object, e.g. when using optimistic locking.

I don't want to see revision - index - numbers within keys so post command is not useful here. ETCD brings prevExist parameter for this use cases.

I want to perform add method which expect that there is no content on given key. I'll use following statement:

curl -v http://127.0.0.1:2379/v2/keys/test?prevExist=false -XPUT -D value="some value"

When you did not delete the key, as I did not, you can get following error:

1
2
3
4
5
6
{
  "errorCode": 105,
  "message": "Key already exists",
  "cause": "/test",
  "index": 198
}

On the other hand, use false to express update existing entity.

curl -v http://127.0.0.1:2379/v2/keys/test?prevExist=true -XPUT -D value="some value"

This command results into positive response.

< HTTP/1.1 200 OK

The repository uses put for both add and update methods but value for prevExist is the difference.

Monday, March 2, 2015

Playing with ETCD cluster in Docker on Local

Martin Podval
I've started to write some management component last week. We would like to utilize CoreOs with the whole stack, as much as possible, at least within such early phase of our project.

The core component of our solution is ETCD - distributed key value store. Something like my favorite piece of software - Redis. Word 'distributed' means that the core of all things within your solution needs to be synchronized or 'consensused'. ETCD uses Raft. I'd love to know how my desired component works in real environment where everything can die.

In the age of docker - where every piece of software is docker-ized, it's pretty simple to start ETCD cluster on local in a second. Following piece of code starts three etcd instances linked together in one cluster.

docker run -d -p 4001:4001 -p 2380:2380 -p 2379:2379 --net=host --name etcd0 quay.io/coreos/etcd:v2.0.3 \
 -name etcd0 \
 -advertise-client-urls http://localhost:2379,http://localhost:4001 \
 -listen-client-urls http://localhost:2379,http://localhost:4001 \
 -initial-advertise-peer-urls http://localhost:2380 \
 -listen-peer-urls http://localhost:2380 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://localhost:2380,etcd1=http://localhost:2480,etcd2=http://localhost:2580

docker run -d -p 4101:4101 -p 2480:2480 -p 2479:2479 --net=host --name etcd1 quay.io/coreos/etcd:v2.0.3 \
 -name etcd1 \
 -advertise-client-urls http://localhost:2479,http://localhost:4101 \
 -listen-client-urls http://localhost:2479,http://localhost:4101 \
 -initial-advertise-peer-urls http://localhost:2480 \
 -listen-peer-urls http://localhost:2480 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://localhost:2380,etcd1=http://localhost:2480,etcd2=http://localhost:2580

docker run -d -p 4201:4201 -p 2580:2580 -p 2579:2579 --net=host --name etcd2 quay.io/coreos/etcd:v2.0.3 \
 -name etcd2 \
 -advertise-client-urls http://localhost:2579,http://localhost:4201 \
 -listen-client-urls http://localhost:2579,http://localhost:4201 \
 -initial-advertise-peer-urls http://localhost:2580 \
 -listen-peer-urls http://localhost:2580 \
 -initial-cluster-token etcd-cluster-1 \
 -initial-cluster etcd0=http://localhost:2380,etcd1=http://localhost:2480,etcd2=http://localhost:2580

The inspiration is obvious, but this stuff simply runs everything on your computer.  Parameter --net=host provides full transparency from port&network point of view.

You can now use following URL in a browser:

http://localhost:4101/v2/keys/?recursive=true

Good thing is also to check all members of your cluster. You will kill them later.

http://localhost:2379/v2/members

You can easily delete all keys in XYZ namespace using curl once you did you tests. Note that you can delete only one of your keys so you can't perform following command on your root namespace.

curl http://127.0.0.1:2379/v2/keys/XYZ?recursive=true -XDELETE

I also prefer to see http status code as ETCD uses http status codes.

curl -v http://127.0.0.1:2379/v2/keys/XYZ

In advance to status codes, it always returns a json with their own errors codes. See a snippet at the end of the following listing. You can get something similar to:

* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 2379 (#0)
> GET /v2/keys/XYZ HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:2379
> Accept: */*

< HTTP/1.1 404 Not Found
< Content-Type: application/json
< X-Etcd-Cluster-Id: 65a1e86cb62588c5
< X-Etcd-Index: 6
< Date: Sun, 01 Mar 2015 22:55:14 GMT
< Content-Length: 69

{"errorCode":100,"message":"Key not found","cause":"/XYZ","index":6}
* Connection #0 to host localhost left intact

At the end of playing with ETCD cluster, you will probably want to remove all etcd's containers. I use simple script which removes every docker container, but you can improve it using grep to remove only those hosting ETCD.

sudo docker rm -f `docker ps --no-trunc -aq`

The last interesting thing is the performance. I've reminded Redis which can handle one million of transactions per second using one thread. I was surprised when ETCD responded usually in 20-30ms. Much worse fact is that I've also encountered client timeouts because of 400-500ms RT per request. Raft is obviously not for free. But the purpose of ETCD is massive reading scalability. Well, good to know.

Sunday, February 1, 2015

Performance Battle of NoSQL blob storages #3: Redis

Martin Podval
We have already measured performance stats for Apache Cassandra and Apache Kafka as well. To include Redis within a comparison of persistent storages could see like some misunderstanding at first sight. On the other hand, there are certain use-cases allowing us to think about to store data in main memory, especially in private data centers. Primarily once your cluster includes a machine having almost equal size of hard drive and RAM :-)

Redis is enterprise, or advanced, key-value store with optional persistence. There are couple of reasons why everyone loves Redis. Why I do?

1. It's pretty simple.

Following command can install redis server on ubuntu. That's all.

apt-get install redis-server

2. It's incredible fast. Look at following tables. One million remote operations per second.

3. It supports large set of commands. More than some kind of database, it's rather enterprise remote-aware hash-map, hash-set, sorted-list or pub/sub channel solution supporting TTL or Lua script environment among others. See all commands.

4. It's optimized to use low computer resources, cpu and ram. Despite the redis server is single thread app it can achieve such great performance.

I've already started to talk about purposes at the beginning. We have primarily targeted two points. First, we can use Redis within super-fast deployment of our app when latency matters.

Secondly, I wanted to compare in-memory and persistent stores. Does it really worth to think about such in-memory solution?

Setup

I used following setup:
  • Redis server 2.2.6: HP Proliant BL460c gen 8, 32core 2.6 GHZ, 192GB RAM, ubuntu 12 server
  • Tests executor: xeon, 16 cores, 32gb ram, w8k server
  • 10Gbps network
  • jedis java client
  • kryo binary serialization
As Redis cluster feature was in development in the time of measuring these numbers, I used only one machine. 32 cores were really overestimated as Redis used one plus one core indeed.

Performance Measurement of Redis

Batch Size

Appending using LPUSH to some eight different keys.

Blob size \ Batch Size [TPS]128256102432768
100b570k570k557k600k
20kb38k40k35k33k

As main memory is touched only, it's all about network transmission. It's almost same for all sizes of batches.

Variable Connections


Utilizing LPUSH again to append to different number of keys. Every keys is accessed using different connection.

Blob size \ Connections [TPS]124832128
100b446750k646k560k960k998k
20kb9.2k16.8k20.8k34k35k52k
Ohhh. One million inserted messages per second to one Redis instance. Incredible. Java client uses NIO so this is the answer why it somehow scales with much more tcp connections. Increasing number of network pipes where a client can push the data enables better throughput.

Occupied Memory


Blob within a list.

Blob sizeBytes per Message
100b152
20kb19kb
There is some build-in compression which appeared within large message test.

Long Running


The goal of this test is to fill the main memory (192GB) with one redis instance to find out if there are some scalability limitations.

Blob sizeTPS
100b842k
20kb18.2k
Redis fill the main memory with blob messages till OOM. The shape of this progress within the time is almost flat.


Persistence


Even if Redis stores the data in the main memory, there is a way how to persist the data.

Blob size \ [TPS]With EOF (every second)With AOF (always)Without
100b800k330k960k

Redis forks a new thread for I/O. Numbers are almost same if the data are persisted within one second frames. TPS goes significantly down when Redis writes every updated key to the drive but this mode ensures best durability.

Large Messages


How the performance is affected when message size increases to tens of mbytes?

Blob sizeTPS
500 kb1418
5 mb85
100 mb1,2

Conclusion

The numbers are impressive. One-thread app successfully process almost one million of small messages in memory.

Redis performance is incredible. We could expect certain limitations because of one-threaded design which causes "serializable" behavior. Maybe this lockless implementation is the reason why redis server can handle such great throughput.

On the other hand, the right comparison against other competitors uses redis persistence feature. The performance is much more worse. Three times. Well, the persistence requirement can be the decision maker.

I've already mentioned great command set. You can probably model almost any behavior. Even if Redis primarily targets caches, there are commands allowing to calculate various stats, hold unique sets etc. The script made from these commands is powerful and very fast.

Everything is always about performance and features. Redis has both of them :-)

About me

Powered by Blogger.