Configuration

Many aspects of Rubris can be modified via configuration objects. All the properties are encapsulated in the ModuleConfig object. The default module config can be altered by retrieving it from the server Object and setting properties, or alternatively new config objects can be created and set in the module config. Indeed the ModuleConfig itself can be replaced if required.

The configuration object and its child config objects are structured as follows:

  ModuleConfig {
    writerConfig,
    socketConfig,
    endpointConfig,
    readerPoolConfig,
    writerPoolConfig,
    transportConfig {
      HTTP: HTTPConfig,
      WS: websocketConfig,
      EIO: engineIOConfig
    }
    sslConfig,
    fileCache
  }

Module Config

License (default demo license for 5 days)

This should be set with the correct dev, test or production license.

Client Timeouts (default 900,000 ms)

Client timeouts are used to clean up abandoned or broken sessions. See UserManagement for details.

FileCache Configuration (default off)

The File cache can be used to server static data such as .js, .css, .html etc for times when you want to bundle some content with the application itself, rather than serve it from a WebServer or CDN. The file cache is configured with 3 settings:

Each of these can be configured by either containing the FileCache from a newly created ModuleConfig, or creating your own and setting it on the ModuleConfig:

  FileCache cache = config.fileCache;

  // if starts with classpath: will try to load as resource -
  // otherwise load as a full file
  cache.setResourceDir(resourceDir);

  // always try to load from the classpath ( very unusual to override)
  cache.setMimeTypeFile(mimeTypeFile);

If re-loadablility is enabled the fileCache will register a FileWatcher on the directories under the resourceDir and reload the file if it is changed (or a new one is added). This can be turned off using the re-loadable property on the FileCache and should be set to false for production. When the resource path is in a jar this will be automatically disabled.

Setting the resource directory to null will prevent any path being loaded and remove all static file support in the application.

Once these are loaded as part of the initialisation Rubris ignores the file system completely (apart from reloads) and any attempt to read a file not in the path or not loaded will return a 404. There is NO loading from the file system for resource requests in any manner.

The resources are the made available on the HTTP url rooted at / for the configured directory root. So for example if the files were under /resources/HTTP/files on the classpath and the root was set as /resources/HTTP then the paths would be HTTP://url:port/files/xxx (with xxx being any file or directory/file under that path).

If the resource directory was configured to be /resources/HTTP/files then the path would be HTTP://url:port/xxx.

As the file system is not used for anything other than loading the files at startup into memory, there is no such thing as directory listing or file path navigation that can be used form the client.

By default files larger than 4096 bytes are put in the cache in both a gzip and raw form. This is configurable by setting the gzipSize property on the FileCache. Browsers that explicitly send an encoding accept header that includes GZIP will get the gzip version returned. Otherwise the raw form will be returned.

ModuleCount (default 1)

The module count configures the server to run with N-modules. Each module consists of:

If the OS supports the SO_REUSE_PORT option (to enable multiple threads/processes to multiplex the listen port at an OS level) then each module also has its own:

Otherwise a single Acceptor is shared among all the modules and sockets are round-robin allocated on each accept cycle. Note: As the accept cycle accepts as many sockets as pending in a block this can mean that the socket allocation between modules is not symmetrical in numbers.

Each user connection is assigned to a module for its lifetime and then threads in that module explicitly service the connection lifecycle.

When scaling the application for more users it is generally expected that the number of modules will increase (you can think of each module as an independent processing unit within the library). However, this also requires an increase in the number of CPUs (and memory) available. A rule of thumb is that 2 CPUs should run 1 Module, 4 CPUS for 2 etc.

A single module will provide good sustained throughput for about 500 flat out connections, or 1000 very busy connections or up to 8192 mostly idle connections ( this also depends on other factors such as ping timings, polling frequency, message size etc).

clientSubscriptionSize (default 1024)

This is used to size the underlying number of expected client subscriptions per client. It is used mainly for the initial map sizings that hold the subscriptions. So if you only have a few subscriptions you can reduce the memory used by default by reducing this value. IF you overrun the sizing the map will be resized upwards, but setting it too small will potentially lead to lots of resizes.

clientDirectQueueSize (default 128)

This is used to size the Queue for RPC responses/Direct message responses.

All RPC messages use a single DirectMessageQueue private to each client. Attempting to send responses in a stream has the potential to overrun the queue and for the message offer to be rejected.

An unbounded direct message queue size is not supported.

Generally you will not need to increase the size as the nature of the Polling means that RPC messages will not usually accumulate into double digits, however for WS clients and long or a bursty asynch reply behaviour this may be required to be increased.

serverUserName (default $SERVER_USER$)

This is used to set the username that is set for server side pinned topics. It is not recommended to change this - but can be done if required.

addressRequired (default true)

This is used to get the connection address on accept to be cached for logging - or use later int he accept permission callbacks (default true). In some cases it may be better to defer this until it is needed.

longPollSlotDuration (default 5000ms)

This sets the size of the long poll slots in the timerwheel. If no pending data is available for the long poll then the connection is placed in the currentslot +1. When the TimerWheel thread reaches this slot it releases the polling thread (unless an event happens in the meantime in which case it has already been fired). The maximum time to hold the tread on the server is then 2* this value. In reality it will be some fraction+ slotTime depending on when it enters the Wheel. This enables us to set longer waits on the server. Unless this is causing an issue and you want to reduce the poll frequency it is generally best to leave this value as it forms a reasonable balance between repeat polling and liveness.

maxFailedReads (default 256)

This sets the maximum number of 0 length reads before terminating the socket on the current request read cycle. This is reset after a complete read.

partialReadDecrementSize (default 8192)

This sets the byte count for a single read from a socket, above which, the failedread count for a protocol request will be decremented if it has previously returned 0 bytes for a socket read in the current request read cycle. This interacts closely with maxFailedReads.

When using very large http requests it is easy for the protocol read cycle to overrun the maxFailedReads as each partial read of the request is followed by a speculative read which can return 0 bytes on a slow connection. For example, a 512k request with a 1400 byte network MTU will make around 390 reads if TCP packets are delivered singly, and if slow enough to trigger a speculative failed read after each packet, this request will easily reach the default maxFailedReads. In practice, even on a slowish network a significant proportion of these TCP packets will arrive in the OS TCP buffer together and if more bytes than readResetSize are read in a single socket read, the failedCount (if above 1) will be decremented to offset an earlier 0 byte read in this request.

This can be particularly useful if an upstream proxy, or the network delivers requests in jittery 4/8/16k bursts, and should still allow the request to complete, even with small or a number of interleaved 0 byte reads, provided it is not reduced to trickling bytes continually.

This does NOT measure the gaps between the packets, just the size when the OS notifies there is data to be read.

Technically, this setting amends the maxFailedReads to become the number of times a read supplied less than this amount, followed by a 0 byte speculative read. Therefore, even for a very large (1mb +) request, providing the upstream fills the OS TCP buffer with this amount per read, the fail count will never increment beyond 1.

If the network supplies a large number of very small packets, then the fail limit will still be reached. It is best to empirically determine the delivery throughput (proxy chunking behaviour etc) against the largest regular request size to see what is reasonable for the network structure.

maxFailedWrites (default 256)

This sets the maximum number of 0 length writes before terminating the socket on the current buffer write. This is reset after a the socket is re-enabled for writing.

maxPrivateQueueSize (default 256*1024)

This sets the maximum number messages allowed in an “unbounded” queue.

proxyProtocolVersion (default 0)

This sets the server to expect that initial connections will be prefixed with a proxy protocol string as defined <HTTP://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-proxy-protocol.html>. The only version currently supported is 1.

Proxy Protocol

useNative config

Currently this configures the server to replace the NIO acceptor and sockets with a native library that bypasses the Java NIO layer.

The native dependency currently is only available for Linux x86 64-bit. It also requires a minimum kernel version of at least 3.9 (Released October 2013).

See Native Libraries docs for details of loading and usage.

maxUnCompressSize config (default (131072*2) * 4 )

This sets the maxsize that a compressed subscribe message can be decompressed into. The maxUnCompressSize is also the maximum compressed message size, as the edge case is that they are both the same (we do not handle the pathological case where the compressed size is larger if it exceedes this value).

If you find yourself adjusting this value upwards it is strongly recommended that you send such data using a standard web upload or revisit your design decisions.

Reader/Writer Pool Configs

The reader/writer pools are really large memory mapped files that are divided into blocks. Each File represents a slab of memory that a BlockPool uses as the backing for the pool of blocks. A block is the basic structure to which the bytes read from a socket are stored and acted upon, or written to memory for the writer push on the socket. The Reader and Writer use isolated pools which can be configured separately and therefore do not share their memory. In this sense Rubris is memory-heavy oriented as in modern day machines memory is cheap and we can easily allocate in the multiple GBs to take advantage of this as long as we do not impact the heap (as this memory does not).

Multiple Instances and Mapped files

As Rubris maps its files using the name of the “serviceName” in the pattern rubris_reader_$MODULE_NUM_$Servicename-$SEQUENCE_$SET (e.g rubris_reader_0_prices_1_0.mem, rubris_reader_1_prices_1_0.mem) multiple instances on the same machine with the same serviceName will share the same mapped file unless the directory location is configured differently.

This is especially important as the SO_RESUEPORT, SO_REUSEADDRESS means that 2 instances can easily be run simultaneously (if a previous one has not shut down correctly) and will lead to apparent memory corruptions.

Blocks and modules

The pool config options are (per reader/writer per module):

As Rubris is primarily expected to be used as a messaging solution it is not expected that these limits need to be raised in normal practice. If you need very large file/data handling then it is strongly advised you run a normal web application along side Rubris as it is likely to be better suited to this task.

The above figures make each mapped file about 500Mb in length. MMAP means that not all of this file is held in memory initially, rather pages are brought in when required by the OS. In general once the entire file has been paged in memory you want it to stay there which means that the physical memory on the box should have capacity to hold all the memory pools without paging back to disk or the application may suffer significant jitter.

The memory is not part of Java’s heap so does not affect garbage collection.

Java’s limitations means that the largest practical mmap is 2GB so to support larger block sizes the MemorySlab will map multiple 1GB files to support the desired size or number of blocks if the number overruns the 1GB size.

For instance to support a default size of 256k and still have 8192 blocks per module, 2 files are required, similarly to support 16384 blocks and 128k blocks requires 2 files.

The block pool size is limited to a maximum of 2GB and there are careful trade offs to be made here. Large blocks while pushing more data per cycle, proportionally use up more time per user operation (as more data is potentially available for each read/write cycle). This increases the latency potentially for each user.

Reducing the block size decreases the time spent but also limits the size of the messages that can be sent and introduces more delay as very large messages can end up repeatedly being copied to larger buffers so increasing the write time.

The number of blocks available is predicated on partial writes/reads potentially occurring for each user. So for example if we have > 2048 users on a single module and the OS does not flush in its entirety the bytes pending for a large proportion of the users for that cycle the blocks will be retained and will not be available for other users until future writes release them.

This can then obviously lead to pool exhaustion which will fall back to trying to create new DirectMemory for each read/write which can be slow and eventually lead to memory issues for the application/OS.

While not all users will exhibit this behaviour and in normal operations relatively few blocks will be retained, one should be hesitant about having undersized pools as a network issue could affect the box as a whole causing an attempt to create many unpooled blocks which can lead to OOM issues.

The read and write blocks do not need to be sized equally. So if you have large outbound but small inbound then they can be sized accordingly. In general it is better to size the blocks according to your messages sizes and deal with very large occasional messages by setting the large block scaling factor (which assigns a much reduced number of larger blocks as a fallback) or setting a very large unpooled block size to deal with the outliers that overrun the large block fallback pool.

Scaling is expected to be achieved by increasing the module count rather than scaling up the counts within individual pools.

Socket Config

The Socket provides a large amount of configuration that are used to control both the accept behaviour, the port and interfaces the application listens on and some of the native elements.

The config options that are most relevant are:

Endpoint Config

endpoints (default 128)

By default the server can support up to 128 discrete endpoints. This can be extended by setting the endpoints config option to increase this number. However, if you do find you are approaching this number it may be worthwhile examining the granularity of the endpoint vs the topics used within the endpoints.

wildcard (default *)

character to use as wildcard in wildcard subscriptions

WriterConfig

SSLConfig

See the SSLConfig.md file for details

Transport configuration

AllowedOrigins

AllowedOrigins is used for 2 purposes:

The current config is the same for 2 purposes as it is assumed that the serving of the websocket JS/HTML code is the same that one would allow CORS requests from.

The origin supports wildcards and by default is “*”, which supports all origins except empty.

For real world purposes it is strongly suggested that wildcards are not used as they are difficult to restrict without attempts at exploitation.

== Transport configs per transport type

The transportConfigMap is a map keyed by transport type. Currently:

  1. ModuleConfig.HTTP
  2. ModuleConfig.WS
  3. ModuleConfig.EngineIOConfig

Each object is used to configure the transports individually. The WS implementation does use some of the HTTP config for the first part of the upgrade mechanism.

HTTPConfig

The HTTP config provides a few options:

WSConfig

The only config the WS currently supports is the allowed origin restrictions.

The requires origin/requires host settings can be used if proxy servers strip a header (will allow non-RFC compliant connectivity).

EngineIOConfig

The config is used to set the following for the Engine-IO client:

The settings above are used to construct the handshake data returned to the JS client, which Engine-IO then uses to set its ping timeout behaviour and whether to upgrade to Websockets or not.