c3.x8large Websocket Baseline
The tests below provide a localhost baseline we can use to check the machine-> machine testing and also give us visibility of the sort of message behaviour we expect. This builds on the HTTP Polling tests which provide a view on the subscribe behaviour using long polling.
All the WS tests were run using the examples available in the test application. e.g:
$ ./run.sh 2 com.rubris.client.runners.WSSubscribeMessageRunner 127.0.0.1 8091
login 1000 false 1000 timeHandler 100000
2k User with 4 modules
We started the Websocket at the same rate as the previous HTTP Polling for 2k users so we can identify the comparative behaviour.
Client instance 1
Subscribes took ~3 sec
2016-08-31 09:09:00,131 main DEBUG started OK.
09:09:00.880 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 100
09:09:00.885 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Channels complete 100
09:09:00.886 [nioEventLoopGroup-2-3] INFO HttpSubscribeOnly - Subscribe sent 100
09:09:00.914 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Clients ready 200
09:09:00.932 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Subscribe complete 100
09:09:01.015 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Channels complete 200
09:09:01.015 [nioEventLoopGroup-2-3] INFO HttpSubscribeOnly - Subscribe sent 200
09:09:01.087 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Subscribe complete 200
09:09:01.091 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Clients ready 300
09:09:01.122 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Channels complete 300
09:09:01.123 [nioEventLoopGroup-2-9] INFO HttpSubscribeOnly - Subscribe sent 300
09:09:01.286 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 300
09:09:01.386 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Clients ready 400
09:09:01.530 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Channels complete 400
09:09:01.531 [nioEventLoopGroup-2-9] INFO HttpSubscribeOnly - Subscribe sent 400
09:09:01.649 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Subscribe complete 400
09:09:01.652 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 500
09:09:01.730 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Channels complete 500
09:09:01.731 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 500
09:09:02.019 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Subscribe complete 500
09:09:02.196 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Clients ready 600
09:09:02.299 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Channels complete 600
09:09:02.300 [nioEventLoopGroup-2-1] INFO HttpSubscribeOnly - Subscribe sent 600
09:09:02.482 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Subscribe complete 600
09:09:02.486 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Clients ready 700
09:09:02.580 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Channels complete 700
09:09:02.581 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 700
09:09:02.679 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Clients ready 800
09:09:02.755 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Subscribe complete 700
09:09:02.755 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Channels complete 800
09:09:02.755 [nioEventLoopGroup-2-1] INFO HttpSubscribeOnly - Subscribe sent 800
09:09:02.765 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Clients ready 900
09:09:02.868 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Channels complete 900
09:09:02.868 [nioEventLoopGroup-2-9] INFO HttpSubscribeOnly - Subscribe sent 900
09:09:02.891 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 800
09:09:02.965 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 1000
09:09:03.012 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 900
09:09:03.030 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Channels complete 1000
09:09:03.031 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 1000
Client instance 2
Subscribes took ~4 sec.
2016-08-31 09:09:02,722 main started OK.
09:09:03.732 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Clients ready 100
09:09:03.949 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Channels complete 100
09:09:03.952 [nioEventLoopGroup-2-3] INFO HttpSubscribeOnly - Subscribe sent 100
09:09:04.378 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Clients ready 200
09:09:04.395 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 100
09:09:04.570 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Channels complete 200
09:09:04.570 [nioEventLoopGroup-2-1] INFO HttpSubscribeOnly - Subscribe sent 200
09:09:04.699 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 300
09:09:04.745 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Clients ready 400
09:09:04.751 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Subscribe complete 200
09:09:04.922 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Channels complete 300
09:09:04.923 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 300
09:09:04.952 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Channels complete 401
09:09:04.953 [nioEventLoopGroup-2-1] INFO HttpSubscribeOnly - Subscribe sent 400
09:09:04.973 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 500
09:09:05.123 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Subscribe complete 300
09:09:05.144 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Channels complete 500
09:09:05.144 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 500
09:09:05.200 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 400
09:09:05.201 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Clients ready 600
09:09:05.350 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 500
09:09:05.406 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Channels complete 600
09:09:05.407 [nioEventLoopGroup-2-7] INFO HttpSubscribeOnly - Subscribe sent 600
09:09:05.415 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Clients ready 700
09:09:05.553 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 800
09:09:05.610 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Channels complete 700
09:09:05.611 [nioEventLoopGroup-2-3] INFO HttpSubscribeOnly - Subscribe sent 700
09:09:05.616 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Subscribe complete 600
09:09:05.689 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Clients ready 900
09:09:05.808 [nioEventLoopGroup-2-7] INFO HttpSubscribeLimit - Channels complete 800
09:09:05.809 [nioEventLoopGroup-2-9] INFO HttpSubscribeOnly - Subscribe sent 800
09:09:05.880 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Subscribe complete 700
09:09:06.201 [nioEventLoopGroup-2-3] INFO HttpSubscribeLimit - Channels complete 900
09:09:06.202 [nioEventLoopGroup-2-3] INFO HttpSubscribeOnly - Subscribe sent 900
09:09:06.206 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Clients ready 1000
09:09:06.409 [nioEventLoopGroup-2-9] INFO HttpSubscribeLimit - Subscribe complete 800
09:09:06.490 [nioEventLoopGroup-2-1] INFO HttpSubscribeLimit - Subscribe complete 900
09:09:06.517 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Channels complete 1000
09:09:06.517 [nioEventLoopGroup-2-5] INFO HttpSubscribeOnly - Subscribe sent 1000
09:09:06.759 [nioEventLoopGroup-2-5] INFO HttpSubscribeLimit - Subscribe complete 1000
Here we see that the effect of the subscribe is much more obvious as outside this contention the max message timings are significantly better than the HTTP Polling tests.
Outside the subscribes the max latencies have dropped to ~300ms.
Network throughput
Throughput is sustained at ~18Gb/s.
CPU usage
12528 ec2-user 20 0 27.2g 6.6g 5.0g S 1200.1 11.1 14:35.73 java
12818 ec2-user 20 0 20.3g 3.0g 20m S 262.3 5.2 3:23.39 java
12729 ec2-user 20 0 20.3g 4.0g 19m S 259.3 6.7 3:30.57 java
9474 ec2-user 20 0 20.3g 3.2g 20m S 0.3 5.4 0:45.34 java
12528 ec2-user 20 0 27.2g 6.6g 5.0g S 1200.1 11.1 15:11.87 java
12818 ec2-user 20 0 20.3g 3.1g 20m S 270.3 5.3 3:31.53 java
12729 ec2-user 20 0 20.3g 4.0g 19m S 259.7 6.8 3:38.39 java
~~~
The cpu usage is less for the clients, as it is doing much less work (it has no poll requests to make to receive the data). The server is also using slightly less CPU given it does not have to deal with the inbound polling data more time can be given straight to the push.
We can see the extra throughput from the network above and the overall performance.
##### Client instance 1
09:10:37.585 [main] WARN HttpSubscribeClientRunner - Total ms/sec 1091172 ~~~
Client instance 2
09:10:40.145 [main] WARN HttpSubscribeClientRunner - Total ms/sec 1089323
We see about a 10% improvement in throughput and about a 40% reduction in latencies compared to the same resources for the HTTP Polling.
4k Users with 8 modules
In order to see the effects of increasing to 8 modules added more users,as with 2k users we found that the clients were overwhelmed by the server resulting in large max latency deviations as the client sockets were repeatedly pushed into a temporarily blocked state (This is is discussed in the Websocket Rate limiting documentation).
In practice we do not really care about this as it is unlikely we will push data this quickly and the clients recover fine (albeit with some larger message latencies). However, for this test we are interested in sustained throughput and accordingly we bumped the number of users to 4k for this test in 4 instances to give force the server to service more sockets, rather than push to a more limited number of users. This also pushed the number of subscriptions to 200,000 topics spread across 4k users.
We can see the effect below of the subscribes which all occur in the first few seconds.
Client Instance 1
Client Instance 2
Client Instance 3
Client Instance 4
Network Throughput
The network throughput average about 23Gb/s.
If we reduce the client number we can push up to 30Gb/s, however as mentioned above, the clients become overwhelmed and we see a largish number of blocked sockets and longer latencies resulting from this.
At this level each client receives about 650k messages/s giving us a total in the region of 2.6 million m/s.