Skip to content

Configure the key value store for the DPS toolkit

brandtol edited this page Sep 12, 2017 · 3 revisions

Configure the K/V store for the Streams DPS toolkit

Per default the configuration file is located in the applications etc directory and is named no-qsl-kv-store-servers.cfg. Any line starting with # is a comment line. You must correctly configure the type of the store and the parameters for the specific store.

In the very first non-comment regular line of this file, specify the name of the key value store product you will be using. It can ONLY be one of these NoSQL K/V data store product names supported by the DPS toolkit:

  • memcached
  • redis
  • cassandra
  • cloudant
  • hbase
  • mongo
  • couchbase
  • aerospike
  • redis-cluster

Below the line containing the K/V data store product name, please list your valid memcached or redis or cassandra or cloudant or hbase or mongo or couchbase or aerospike or redis-cluster server names or IP addresses.

Read below about the expected format of this for your chosen NoSQL data store product.

memcached

For memcached, you must list all the participating servers one per line (names or IP addresses). (e-g:)

Machine1
Machine2
Machine3

redis

For redis, (i.e. redis running in a non cluster mode) please specify one or more servers (name or IP address) one per line or if you are working on a single machine and if you prefer to use the Unix domain socket instead of TCP, you can simply specify unixsocket instead of a server name. For redis, if you decide to use the unixsocket, you must also ensure that your redis.conf file on the server side is configured properly for a unix domain socket pointing to /tmp/redis.sock file and the usual redis port set to 0.

IMPORTANT TIP ABOUT RUNNING MULTIPLE REDIS SERVERS (in a non cluster redis configuration):
(If you have a heavy workload of put/get requests, then using multiple Redis servers may give scaling to improve the overall throughput. If you are going to consider using multiple Redis servers that are not configured in a redis cluster mode, the dps toolkit will automatically do the client side partitioning to shard (i.e. spread) your data across those servers. You may also decide to run multiple Redis server instances on a single machine to take advantage of the available CPU cores. If you run multiple Redis servers on a single machine, then you must configure each server instance with a unique port in a separate redis.conf.X file where X is your Redis server instance number. [For example, if you run 3 instances on the same machine, you can have redis.conf.1, redis.conf.2, and redis.conf.3] If you are planning to run multiple Redis server instances on the same machine, you may also want to assign unique PID file names in their respective configuration files. Having done this, you can start each of those redis server instances by running a command from the src directory of your redis installation directory: ./redis-server ../redis.conf.X (substitute X with your redis server instance number). When there are multiple Redis servers running on a single machine, then you must properly specify below each ServerName:Port on a separate line. (e-g:)

Machine1:7001
Machine1:7002
Machine1:7003

When there are multiple Redis servers configured, using the dpsXXXXTTL APIs will provide a good scaling factor for your high volume put/get requests in comparison to using the APIs that are based on the user created named stores. You are encouraged to choose between the TTL and non-TTL based APIs according to your functional and scaling needs.

cassandra

In Cassandra, all servers assume an equal role. There is no special master or coordinator server. For Cassandra, you can list either all or just only one of the seed servers (names or IP addresses as it is specified in your cassandra.yaml file). (e-g:)

Machine1
Machine2
Machine3

cloudant

In Cloudant, you must have a personalized URL registered with the Cloudant public web offering or with a "Cloudant Local" on-premises private installation. Since you have to enter the user id and password below, it is better to create a user id specifically for working with the Streams dps toolkit. For the server name, you must enter a single line in the following format:

http://user:password@user.cloudant.com

if you are using the Cloudant service on the web (OR)

http://user:password@XXXXX

where XXXXX is a name or IP address of your on-premises Cloudant Local load balancer machine.

hbase

For HBase, you have to specify the HBase REST server address(es) in the format shown below. User id and password in the URL below should match the Linux user id and password on the machine where the HBase REST server program is running. If you have a multi-machine HBase cluster, then you can run the HBase REST server on multiple machines and configure one or more REST servers [one per line] below. Configuring multiple REST servers will let the DPS toolkit to spray (load balance) your heavy HBase workload to those multiple servers and that may be a factor in slightly improving the HBase read/write performance. REST server address format: http://user:password@HBase-REST-ServerNameOrIPAddress:port (e-g:)

http://user:password@Machine1:8080
http://user:password@Machine3:8080
http://user:password@Machine3:8080

mongo

For Mongo, you can specify a single server or a replica set (for redundancy and high availability using automatic fail-over) or a sharded cluster's query router mongos seed servers (for HA, load balancing, high throughput and fail-over). Please provide it in the following way for one of those three modes of MongoDB configuration. (e-g for single server, replica set and sharded cluster in that order.)

Machine1:27017
Machine1:27017,Machine1:27018,Machine1:27019/?replicaSet=YOUR_MONGO_REPLICA_SET_NAME
Machine1:27069,Machine2:27069,Machine3:27069

[If you are using a sharded cluster, then you must manually enable sharding for the ibm_dps database and he collections created by the DPS in Mongo.]

couchbase

For Couchbase, you can specify one or more server names. Please specify one server name per line. You must have already created a Couchbase admin user id and password using the Couchbase web console.
You must specify that admin user id and password as part of any one of the server names you will add below. (e-g:)

user:password@Machine1
Machine2
Machine3

aerospike

For Aerospike, you can specify one or more server names. Please specify one server name per line. You can optionally specify the Aerospike port number along with the server name. (e-g:)

Machine1:3000
Machine2
Machine3:3000

redis-cluster

For redis-cluster, you must install and configure Redis instances enabled with the clustering option in the redis.conf file. This HA feature is available only in Redis version 3 and above. In such cases, please specify the server names or IP addresses and a port numbers for at least one master and its slave node. It is recommended to specify all masters and slaves, for maximum availability. (e-g:)

Machine1-master:7001
Machine2-slave:7002
Machine3-master:7003
Machine4-slave:7004

In your SPL application, you need to check the return codes of the put and get functions and use the dpsReconnect() API method to reconnect to redis after the connection got lost.