Configuration
Many aspects of Rubris can be modified via configuration objects. All the properties are encapsulated in the ModuleConfig object. The default module config can be altered by retrieving it from the server Object and setting properties, or alternatively new config objects can be created and set in the module config. Indeed the ModuleConfig itself can be replaced if required.
The configuration object and its child config objects are structured as follows:
ModuleConfig {
writerConfig,
socketConfig,
endpointConfig,
readerPoolConfig,
writerPoolConfig,
transportConfig {
HTTP: HTTPConfig,
WS: websocketConfig,
EIO: engineIOConfig
}
sslConfig,
fileCache
}
Module Config
License (default demo license for 5 days)
This should be set with the correct dev, test or production license.
Client Timeouts (default 900,000 ms)
Client timeouts are used to clean up abandoned or broken sessions. See UserManagement for details.
- clientTimeout - (default 900,000ms) how long the client has been inactive
- timeoutCheck - (default 900,000ms) the frequency at which the check is performed
FileCache Configuration (default off)
The File cache can be used to server static data such as .js, .css, .html etc for times when you want to bundle some content with the application itself, rather than serve it from a WebServer or CDN. The file cache is configured with 3 settings:
- resourceDir (default null) (disables the fileCache completely)
- The base resource directory for the static files
- mimeTypeFile (default “/rubris/config/mime.conf”)
- a file defining the mimetype mappings for the static resources for the HTTP content-type headers. Note: This is in the JAR and covers all the current standard mimetypes.
- reloadable (default false)
- If the filesystem is watched for file changes (default false)
- gzipSize (default 4096)
- the size at which files are cached in GZIP format. This is done once on load and not at request time.
Each of these can be configured by either containing the FileCache from a newly created ModuleConfig, or creating your own and setting it on the ModuleConfig:
FileCache cache = config.fileCache;
// if starts with classpath: will try to load as resource -
// otherwise load as a full file
cache.setResourceDir(resourceDir);
// always try to load from the classpath ( very unusual to override)
cache.setMimeTypeFile(mimeTypeFile);
If re-loadablility is enabled the fileCache will register a FileWatcher on the directories under the resourceDir and reload the file if it is changed (or a new one is added). This can be turned off using the re-loadable property on the FileCache and should be set to false
for production. When the resource path is in a jar this will be automatically disabled.
Setting the resource directory to null
will prevent any path being loaded and remove all static file support in the application.
Once these are loaded as part of the initialisation Rubris ignores the file system completely (apart from reloads) and any attempt to read a file not in the path or not loaded will return a 404. There is NO loading from the file system for resource requests in any manner.
The resources are the made available on the HTTP url rooted at /
for the configured directory root. So for example if the files were under /resources/HTTP/files
on the classpath and the root was set as /resources/HTTP
then the paths would be HTTP://url:port/files/xxx
(with xxx being any file or directory/file under that path).
If the resource directory was configured to be /resources/HTTP/files
then the path would be HTTP://url:port/xxx
.
As the file system is not used for anything other than loading the files at startup into memory, there is no such thing as directory listing or file path navigation that can be used form the client.
By default files larger than 4096 bytes are put in the cache in both a gzip and raw form. This is configurable by setting the gzipSize
property on the FileCache. Browsers that explicitly send an encoding accept header that includes GZIP will get the gzip version returned. Otherwise the raw form will be returned.
ModuleCount (default 1)
The module count configures the server to run with N-modules. Each module consists of:
- A reader thread and epoll selector
- for reading incoming data from connections
- An event notifier thread
- A thread that pushes outbound data for each client that has an event available (an event in this sense is an outgoing message/reply)
- A TimerWheel thread
- That is used to register for and service future scheduled events (such as polling requests)
- A writer thread and epoll selector
- a thread that is used to register blocked writes for callbacks when the TCP write buffer is full
- A Reader Memory Slab
- A memory mapped file that is used to back the Memory Blocks used by the Reader
- A Writer Memory Slab
- A memory mapped file that is to back the Memory Blocks used by the event notifier
If the OS supports the SO_REUSE_PORT option (to enable multiple threads/processes to multiplex the listen port at an OS level) then each module also has its own:
- Acceptor Thread
- A thread that listens for incoming connections and creates new user connections and assigns to a reader.
Otherwise a single Acceptor is shared among all the modules and sockets are round-robin allocated on each accept cycle. Note: As the accept cycle accepts as many sockets as pending in a block this can mean that the socket allocation between modules is not symmetrical in numbers.
Each user connection is assigned to a module for its lifetime and then threads in that module explicitly service the connection lifecycle.
When scaling the application for more users it is generally expected that the number of modules will increase (you can think of each module as an independent processing unit within the library). However, this also requires an increase in the number of CPUs (and memory) available. A rule of thumb is that 2 CPUs should run 1 Module, 4 CPUS for 2 etc.
A single module will provide good sustained throughput for about 500 flat out connections, or 1000 very busy connections or up to 8192 mostly idle connections ( this also depends on other factors such as ping timings, polling frequency, message size etc).
clientSubscriptionSize (default 1024)
This is used to size the underlying number of expected client subscriptions per client. It is used mainly for the initial map sizings that hold the subscriptions. So if you only have a few subscriptions you can reduce the memory used by default by reducing this value. IF you overrun the sizing the map will be resized upwards, but setting it too small will potentially lead to lots of resizes.
clientDirectQueueSize (default 128)
This is used to size the Queue for RPC responses/Direct message responses.
All RPC messages use a single DirectMessageQueue private to each client. Attempting to send responses in a stream has the potential to overrun the queue and for the message offer to be rejected.
An unbounded direct message queue size is not supported.
Generally you will not need to increase the size as the nature of the Polling means that RPC messages will not usually accumulate into double digits, however for WS clients and long or a bursty asynch reply behaviour this may be required to be increased.
serverUserName (default $SERVER_USER$
)
This is used to set the username that is set for server side pinned topics. It is not recommended to change this - but can be done if required.
addressRequired (default true)
This is used to get the connection address on accept to be cached for logging - or use later int he accept permission callbacks (default true). In some cases it may be better to defer this until it is needed.
longPollSlotDuration (default 5000ms)
This sets the size of the long poll slots in the timerwheel. If no pending data is available for the long poll then the connection is placed in the currentslot +1. When the TimerWheel thread reaches this slot it releases the polling thread (unless an event happens in the meantime in which case it has already been fired). The maximum time to hold the tread on the server is then 2* this value. In reality it will be some fraction+ slotTime depending on when it enters the Wheel. This enables us to set longer waits on the server. Unless this is causing an issue and you want to reduce the poll frequency it is generally best to leave this value as it forms a reasonable balance between repeat polling and liveness.
maxFailedReads (default 256)
This sets the maximum number of 0 length reads before terminating the socket on the current request read cycle. This is reset after a complete read.
partialReadDecrementSize (default 8192)
This sets the byte count for a single read from a socket, above which, the failedread count for a protocol request will be decremented if it has previously returned 0 bytes for a socket read in the current request read cycle. This interacts closely with maxFailedReads.
When using very large http requests it is easy for the protocol read cycle to overrun the maxFailedReads as each partial read of the request is followed by a speculative read which can return 0 bytes on a slow connection. For example, a 512k request with a 1400 byte network MTU will make around 390 reads if TCP packets are delivered singly, and if slow enough to trigger a speculative failed read after each packet, this request will easily reach the default maxFailedReads. In practice, even on a slowish network a significant proportion of these TCP packets will arrive in the OS TCP buffer together and if more bytes than readResetSize are read in a single socket read, the failedCount (if above 1) will be decremented to offset an earlier 0 byte read in this request.
This can be particularly useful if an upstream proxy, or the network delivers requests in jittery 4/8/16k bursts, and should still allow the request to complete, even with small or a number of interleaved 0 byte reads, provided it is not reduced to trickling bytes continually.
This does NOT measure the gaps between the packets, just the size when the OS notifies there is data to be read.
Technically, this setting amends the maxFailedReads to become the number of times a read supplied less than this amount, followed by a 0 byte speculative read. Therefore, even for a very large (1mb +) request, providing the upstream fills the OS TCP buffer with this amount per read, the fail count will never increment beyond 1.
If the network supplies a large number of very small packets, then the fail limit will still be reached. It is best to empirically determine the delivery throughput (proxy chunking behaviour etc) against the largest regular request size to see what is reasonable for the network structure.
maxFailedWrites (default 256)
This sets the maximum number of 0 length writes before terminating the socket on the current buffer write. This is reset after a the socket is re-enabled for writing.
maxPrivateQueueSize (default 256*1024)
This sets the maximum number messages allowed in an “unbounded” queue.
proxyProtocolVersion (default 0)
This sets the server to expect that initial connections will be prefixed with a proxy protocol string as defined <HTTP://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-proxy-protocol.html>. The only version currently supported is 1.
useNative config
Currently this configures the server to replace the NIO acceptor and sockets with a native library that bypasses the Java NIO layer.
The native dependency currently is only available for Linux x86 64-bit. It also requires a minimum kernel version of at least 3.9 (Released October 2013).
See Native Libraries docs for details of loading and usage.
maxUnCompressSize config (default (131072*2) * 4 )
This sets the maxsize that a compressed subscribe message can be decompressed into. The maxUnCompressSize is also the maximum compressed message size, as the edge case is that they are both the same (we do not handle the pathological case where the compressed size is larger if it exceedes this value).
If you find yourself adjusting this value upwards it is strongly recommended that you send such data using a standard web upload or revisit your design decisions.
Reader/Writer Pool Configs
The reader/writer pools are really large memory mapped files that are divided into blocks. Each File represents a slab of memory that a BlockPool uses as the backing for the pool of blocks. A block is the basic structure to which the bytes read from a socket are stored and acted upon, or written to memory for the writer push on the socket. The Reader and Writer use isolated pools which can be configured separately and therefore do not share their memory. In this sense Rubris is memory-heavy oriented as in modern day machines memory is cheap and we can easily allocate in the multiple GBs to take advantage of this as long as we do not impact the heap (as this memory does not).
Multiple Instances and Mapped files
As Rubris maps its files using the name of the “serviceName” in the pattern rubris_reader_$MODULE_NUM_$Servicename-$SEQUENCE_$SET (e.g rubris_reader_0_prices_1_0.mem, rubris_reader_1_prices_1_0.mem) multiple instances on the same machine with the same serviceName will share the same mapped file unless the directory location is configured differently.
This is especially important as the SO_RESUEPORT, SO_REUSEADDRESS means that 2 instances can easily be run simultaneously (if a previous one has not shut down correctly) and will lead to apparent memory corruptions.
Blocks and modules
The pool config options are (per reader/writer per module):
- blocks (default 2048)
- the number of blocks in a pool. This should be sized around the maximum concurrent active reads in progress per module. In reality the actual number of blocks in use will be less than this as each block is returned to the pool after each complete read.
- However, if the client is still sending data or the read cycle can only get a partial message from the socket read then that instance of the read block is retained. Therefore, the conservative approach is to have the same number of blocks as users per module, however a 25% pool size to users or so should be sufficient under normal operations.
- Attempting to use more blocks than the pool provides (e.g if we adopt a more aggressive memory strategy) will cause allocation of unpooled blocks. Which, while not a huge problem does allocate new native memory as direct buffers. One should be careful about relying on this especially as it can lead to OOM and native memory exhaustion.
- baseDir (default
null
- the current working directory that the apps was run from)- The directory in which to store the mapping file.
- blockSize (default 131072*2 - 256k)
- A power of 2 is required and is the maximum amounts of bytes that can be read/write for a single HTTP message or WS frame. Where HTTP requests/WS frames are larger than this the “read until complete” idiom will cause progressively larger unpooled blocks to be allocated until enough bytes are available in the buffer or the maxUnpooledSize is reached (at which point an error will be thrown).
- For writing the write will attempt to pack as many messages as available in a single block and push these out as a single WS frame/HTTP poll response. Left over messages will be picked up on subsequent sends. If a single message is larger than the block (minus space for the HTTP/WS headers) then an unpooled block will attempt to be allocated again up to the maximum unpooled block size.
- maxUnpooledSize (default blockSize * 16)
- This is the maximum inbound/outbound single HTTP request or WS frame that can be supported. An unpooled block is allocated when the large pool is too small for a single message or the pool is exhausted. It is destroyed once freed unlike the other blocks so does include some garbage pressure (although the memory itself is direct).
- LargeBlockScaleFactor (default 4)
- A smaller number of pooled blocks are provided at a scaled up size set by this factor. So if you have a normal size of 128K and a scale factor of 8 the large block size will be 1Mb. The maximum file size is 500Mb for this pool and therefore the larger the size the fewer the blocks available).
- The large blocks are expected to be used when we have occasional large message that are not the default, but nevertheless occur with enough frequency that we do not want to to pay the cost of allocation (e.g when we have a large State Of the World call for each user at login)
As Rubris is primarily expected to be used as a messaging solution it is not expected that these limits need to be raised in normal practice. If you need very large file/data handling then it is strongly advised you run a normal web application along side Rubris as it is likely to be better suited to this task.
The above figures make each mapped file about 500Mb in length. MMAP means that not all of this file is held in memory initially, rather pages are brought in when required by the OS. In general once the entire file has been paged in memory you want it to stay there which means that the physical memory on the box should have capacity to hold all the memory pools without paging back to disk or the application may suffer significant jitter.
The memory is not part of Java’s heap so does not affect garbage collection.
Java’s limitations means that the largest practical mmap is 2GB so to support larger block sizes the MemorySlab will map multiple 1GB files to support the desired size or number of blocks if the number overruns the 1GB size.
For instance to support a default size of 256k and still have 8192 blocks per module, 2 files are required, similarly to support 16384 blocks and 128k blocks requires 2 files.
The block pool size is limited to a maximum of 2GB and there are careful trade offs to be made here. Large blocks while pushing more data per cycle, proportionally use up more time per user operation (as more data is potentially available for each read/write cycle). This increases the latency potentially for each user.
Reducing the block size decreases the time spent but also limits the size of the messages that can be sent and introduces more delay as very large messages can end up repeatedly being copied to larger buffers so increasing the write time.
The number of blocks available is predicated on partial writes/reads potentially occurring for each user. So for example if we have > 2048 users on a single module and the OS does not flush in its entirety the bytes pending for a large proportion of the users for that cycle the blocks will be retained and will not be available for other users until future writes release them.
This can then obviously lead to pool exhaustion which will fall back to trying to create new DirectMemory for each read/write which can be slow and eventually lead to memory issues for the application/OS.
While not all users will exhibit this behaviour and in normal operations relatively few blocks will be retained, one should be hesitant about having undersized pools as a network issue could affect the box as a whole causing an attempt to create many unpooled blocks which can lead to OOM issues.
The read and write blocks do not need to be sized equally. So if you have large outbound but small inbound then they can be sized accordingly. In general it is better to size the blocks according to your messages sizes and deal with very large occasional messages by setting the large block scaling factor (which assigns a much reduced number of larger blocks as a fallback) or setting a very large unpooled block size to deal with the outliers that overrun the large block fallback pool.
Scaling is expected to be achieved by increasing the module count rather than scaling up the counts within individual pools.
Socket Config
The Socket provides a large amount of configuration that are used to control both the accept behaviour, the port and interfaces the application listens on and some of the native elements.
The config options that are most relevant are:
- bindAddress (default 0.0.0.)
- The network address to listen on
- listenPort (default 8089)
- the port to serve the HTTP/WS connections from
- reuseaddress (default true)
- reuseport (default false)
- for OSes which support multiple threads bound to the same listening port (Linux 3.9 and above). This is not meaningful for Windows and is only supported when the native options are enabled as Java does not allow custom Socket options to be set.
- sendbuf (default 131072*2)
- the buffer size (on the server side) for the outgoing TCP send buffer. (This figure interacts with the send/receive block size)
- rcvbuf (default 131072)
- the buffer size (on the server side) for the incoming TCP receive buffer. (This figure interacts with the send/receive block size)
- fastStart (default false)
- enable TCP fastopen (linux 3.7 and above) - only enabled with native options enabled (temporarily disabled)
- connectionBacklog (default 8192*2)
- length of accept backlog queue
- selectorArray (default true)
- use a customized selector set instead of Java’s normal one. This allocates no memory, unlike Java’s built-in one. It is sized to the same amount as the maxConnections.
- maxConnections
- the maximum number of connections per module. By default this is set to the same amount as the connectionBacklog.
- TCPnodelay (default true)
Endpoint Config
endpoints (default 128)
By default the server can support up to 128 discrete endpoints. This can be extended by setting the endpoints config option to increase this number. However, if you do find you are approaching this number it may be worthwhile examining the granularity of the endpoint vs the topics used within the endpoints.
wildcard (default *
)
character to use as wildcard in wildcard subscriptions
WriterConfig
- blockedWriteScaler (5)
- The number of blocked writes at which the multiplier is incremented. e.g blocked count < 5 is 1 slot, 5-10 2 slots in duration etc.
- failedSendResetTime (60000 ms)
- The time which must elapse to reset failed count to 0. Therefore a gap longer than this must occur to reset the blocked count multiplier.
- maxBlockedTime (10000 ms)
- The upper limit on the maximum blocked pause time. The scalar ceiling multiplier.
- slotDuration (100)
- the time width for each slot in the timer wheel and tick rate for the Timer thread- essentially this is the minimum resolvable granularity (it is not advised to change this)
- slots (256)
- the number of slots in the wheel - should be more than max time to allow for drifting and thread pauses.
SSLConfig
See the SSLConfig.md file for details
Transport configuration
AllowedOrigins
AllowedOrigins is used for 2 purposes:
- Websocket allowed origins
- CORS allowed origins for HTTP requests
The current config is the same for 2 purposes as it is assumed that the serving of the websocket JS/HTML code is the same that one would allow CORS requests from.
The origin supports wildcards and by default is “*”, which supports all origins except empty.
For real world purposes it is strongly suggested that wildcards are not used as they are difficult to restrict without attempts at exploitation.
== Transport configs per transport type
The transportConfigMap is a map keyed by transport type. Currently:
- ModuleConfig.HTTP
- ModuleConfig.WS
- ModuleConfig.EngineIOConfig
Each object is used to configure the transports individually. The WS implementation does use some of the HTTP config for the first part of the upgrade mechanism.
HTTPConfig
The HTTP config provides a few options:
- serviceName
- The service name which may be used for origin checking for approval. This is not currently used but if host reflected information is required this will be used in preference to relying on the host header (which has been subject to attacks)
- servicePath
- the path for the url for engineio. e.g used as HTTP://server/servicePath
- customCorsHeaders (default [])
- add custom headers that can be returned as part of the cors allowed headers.
- Currently this is:
Access-Control-Allow-Headers: Content-Type, Authorization, X-Authorization, X-Requested-With, Location
- allowedOrigins (default *)
- the allowed origins determines which origins are allowed to make CORS requests
WSConfig
The only config the WS currently supports is the allowed origin restrictions.
- allowedOrigins (default *)
- the allowed origins determines which origins are allowed to make WS requests
- requires Host Header (default true)
- requires Origin Header (default true)
- rateLimited (true)
- rateLimitBytesPerPeriod (256 * 1024)
- rateLimitPeriodMillis (200 ms)
The requires origin/requires host settings can be used if proxy servers strip a header (will allow non-RFC compliant connectivity).
EngineIOConfig
The config is used to set the following for the Engine-IO client:
- PingTimeout (30000 ms)
- Ping Frequency (30000 ms)
- Websocket Upgrade Enabled (true)
- logPings (true)
- Whether to log heartbeat pings from the client
- maxBrowserDriftTime (0 ms)
- Log browser request times that are greater than this value. Set to 0 to disable.
The settings above are used to construct the handshake data returned to the JS client, which Engine-IO then uses to set its ping timeout behaviour and whether to upgrade to Websockets or not.