Running a collection of distributed Traptor streams.
To run traptor in a distributed environment, you’ll need to figure out approximately what your collection needs are. The Twitter API offers different limits for different types of rules. As of this writing the following API limits are in place for the Public Streaming API.
- follow: 5000 rules
- track: 400 rules
- location: 25 rules
This means that you are limited by how many rules you can add per traptor application. For example, if you have 5,500 “follow” rules and 352 “track” rules, you will need 3 traptor connections (2 for “follow”, 1 for “track”). These should be different API keys with different connection IP addresses.
To handle a distributed deployment, you can use Ansible. Ansible lets you dynamically configure inventories based on roles to do semi-automated deployments.
Using the example from above, my Ansible inventory may look something like this:
[traptor-follow-nodes] server01 server02 [traptor-track-nodes] server03 traptor-location-nodes] server04 server05 [traptor-nodes:children] traptor-follow-nodes traptor-track-nodes traptor-location-nodes
The best way to manage a pool of API keys is in a
traptor-nodes groups_vars file. Since both
traptor-follow-nodes are children of
traptor-nodes, the API keys can be either by any traptor type. Continuing with the example above, the file might look like this:
--- traptor_kafka_topic: 'my_traptor' apikeys: - consumer_key: 'YOUR_INFO' consumer_secret: 'YOUR_INFO' access_token: 'YOUR_INFO' access_token_secret: 'YOUR_INFO' - consumer_key: 'YOUR_INFO' consumer_secret: 'YOUR_INFO' access_token: 'YOUR_INFO' access_token_secret: 'YOUR_INFO' - consumer_key: 'YOUR_INFO' consumer_secret: 'YOUR_INFO' access_token: 'YOUR_INFO' access_token_secret: 'YOUR_INFO'
traptor_kafka_topic is links to the traptor
localsettings template to override the default
traptor topic name with one of your choosing. The
apikeys dictionary contains 3 sets of API connection info, one for each traptor node.
Coming soon... how to set up Ansible tasks (link to sample code)
Redis PubSub for Automatic Rule Refresh¶
When your Twitter rule set changes, the Traptor to which rules have been either added or deleted can be automatically restarted. While running, Traptor continuously checks a Redis pubsub channel for a message for itself, in the following format:
An example message is:
In order to use this functionality, add a message as formatted above to the Redis pubsub channel for each Traptor for which the rules changed.