Production¶
Running a collection of distributed Traptor streams.
Planning¶
To run traptor in a distributed environment, you’ll need to figure out approximately what your collection needs are. The Twitter API offers different limits for different types of rules. As of this writing the following API limits are in place for the Public Streaming API.
- follow: 5000 rules
- track: 400 rules
- location: 25 rules
This means that you are limited by how many rules you can add per traptor application. For example, if you have 5,500 “follow” rules and 352 “track” rules, you will need 3 traptor connections (2 for “follow”, 1 for “track”). These should be different API keys with different connection IP addresses.
Ansible¶
To handle a distributed deployment, you can use Ansible. Ansible lets you dynamically configure inventories based on roles to do semi-automated deployments.
Inventory¶
Using the example from above, my Ansible inventory may look something like this:
[traptor-follow-nodes]
server01
server02
[traptor-track-nodes]
server03
traptor-location-nodes]
server04
server05
[traptor-nodes:children]
traptor-follow-nodes
traptor-track-nodes
traptor-location-nodes
Group_vars¶
The best way to manage a pool of API keys is in a traptor-nodes
groups_vars file. Since both traptor-track-nodes
and traptor-follow-nodes
are children of traptor-nodes
, the API keys can be either by any traptor type. Continuing with the example above, the file might look like this:
---
traptor_kafka_topic: 'my_traptor'
apikeys:
- consumer_key: 'YOUR_INFO'
consumer_secret: 'YOUR_INFO'
access_token: 'YOUR_INFO'
access_token_secret: 'YOUR_INFO'
- consumer_key: 'YOUR_INFO'
consumer_secret: 'YOUR_INFO'
access_token: 'YOUR_INFO'
access_token_secret: 'YOUR_INFO'
- consumer_key: 'YOUR_INFO'
consumer_secret: 'YOUR_INFO'
access_token: 'YOUR_INFO'
access_token_secret: 'YOUR_INFO'
The traptor_kafka_topic
is links to the traptor localsettings
template to override the default traptor
topic name with one of your choosing. The apikeys
dictionary contains 3 sets of API connection info, one for each traptor node.
Tasks¶
Coming soon… how to set up Ansible tasks (link to sample code)
Redis PubSub for Automatic Rule Refresh¶
When your Twitter rule set changes, the Traptor to which rules have been either added or deleted can be automatically restarted. While running, Traptor continuously checks a Redis pubsub channel for a message for itself, in the following format:
<traptor-type>:<traptor-id>
An example message is:
track:0
In order to use this functionality, add a message as formatted above to the Redis pubsub channel for each Traptor for which the rules changed.