Traptor API

class traptor.traptor.MyBirdyClient(consumer_key, consumer_secret, access_token, access_token_secret)
static get_json_object_hook(data)
class traptor.traptor.Traptor(redis_conn, pubsub_conn, heartbeat_conn, traptor_notify_channel='traptor-notify', rule_check_interval=60, traptor_type='track', traptor_id=0, apikeys=None, kafka_enabled='true', kafka_hosts='localhost:9092', kafka_topic='traptor', use_sentry='false', sentry_url=None, log_level='INFO', log_dir='/var/log/traptor', log_file_name='traptor.log', test=False)
_add_heartbeat_message_to_redis(*args, **kw)

Add a heartbeat message to Redis.

_add_iso_created_at(tweet_dict)

Add the created_at_iso to the tweet.

Parameters:tweet_dict – tweet in json format
Return tweet_dict:
 with created_at_iso field
_check_redis_pubsub_for_restart()

Subscribe to Redis PubSub and restart if necessary.

Check the Redis PubSub channel and restart Traptor if a message for this Traptor is found.

_create_birdy_stream()

Create a birdy twitter stream. If there is a TwitterApiError it will exit with status code 3. This was done to prevent services like supervisor from automatically restart the process causing the twitter API to get locked out.

Creates self.birdy_stream.

_create_kafka_producer(*args, **kw)

Create the Kafka producer

_create_rule_counter(rule_id)

Create a rule counter

Parameters:rule_id – id of the rule to create a counter for
Returns:stats_collector: StatsCollector rolling time window
_create_traptor_obj(tweet_dict)

Add the traptor dict and id to the tweet.

Parameters:tweet_dict – tweet in json format
Return tweet_dict:
 with additional traptor fields
_create_twitter_follow_stream(*args, **kw)

Create a Twitter follow stream.

_create_twitter_locations_stream(*args, **kw)

Create a Twitter locations stream.

_create_twitter_track_stream(*args, **kw)

Create a Twitter follow stream.

_delete_rule_counters()

Stop and then delete the existing rule counters.

_enrich_tweet(tweet)

Enrich the tweet with additional fields, rule matching and stats collection.

Return dict enriched_data:
 tweet dict with additional enrichments
Return dict tweet:
 non-tweet message with no additional enrichments
_find_rule_matches(tweet_dict)

Find a rule match for the tweet.

This code only expects there to be one match. If there is more than one, it will use the last one it finds since the first match will be overwritten.

Parameters:tweet_dict (dict) – The dictionary twitter object.
Returns:a dict with the augmented data fields.
_gen_kafka_failure()
_gen_kafka_success()
_get_locations_traptor_rule()

Get the locations rule.

Create a dict with the single rule the locations traptor collects on.

_get_redis_rules(*args, **kw)

Yields a traptor rule from redis. This function expects that the redis keys are set up like follows:

traptor-<traptor_type>:<traptor_id>:<rule_id>

For example,

traptor-follow:0:34

traptor-track:0:5

traptor-locations:0:2

For ‘follow’ twitter streaming, each traptor may only follow 5000 twitter ids, as per the Twitter API.

For ‘track’ twitter stream, each traptor may only track 400 keywords, as per the Twitter API.

For ‘locations’ twitter stream, each traptor may only track 25 bounding boxes, as per the Twitter API.

Returns:Yields a traptor rule from redis.
_increment_limit_message_counter(*args, **kw)

Increment the limit message counter

Parameters:limit_count – the integer value from the limit message
_increment_rule_counter(*args, **kw)

Increment a rule counter.

Parameters:rule_value – the value of the rule to increment the counter for
_main_loop()

Main loop for iterating through the twitter data.

This method iterates through the birdy stream, does any pre-processing, and adds enrichments to the data. If kafka is enabled it will write to the kafka topic defined when instantiating the Traptor class.

_make_limit_message_counter()

Make a limit message counter to track the values of incoming limit messages.

_make_rule_counters()

Make the rule counters to collect stats on the rule matches.

Returns:dict: rule_counters
_make_twitter_rules(rules)

Convert the rules from redis into a format compatible with the Twitter API.

Parameters:rules (list) – The rules are expected to be a list of dictionaries that comes from redis.
Returns:A str of twitter rules that can be loaded into the a birdy twitter stream.
_message_is_limit_message(message)

Check if the message is a limit message.

Parameters:message – message to check
Returns:True if yes, False if no
_message_is_tweet(message)

Check if the message is a tweet.

Parameters:message – message to check
Returns:True if yes, False if no
_send_enriched_data_to_kafka(*args, **kw)

” Send the enriched data to Kafka

Parameters:
  • tweet – the original tweet
  • enriched_data – the enriched data to send
_send_heartbeat_message()

Add an expiring key to Redis as a heartbeat on a timed basis.

_setup()

Set up Traptor.

Load everything up. Note that any arg here will override both default and custom settings.

_setup_birdy()

Set up a birdy twitter stream. If there is a TwitterApiError it will exit with status code 3. This was done to prevent services like supervisor from automatically restart the process causing the twitter API to get locked out.

Creates self.birdy_conn.

_setup_kafka()

Set up a Kafka connection.

static _tweet_time_to_iso(tweet_time)

Convert tweet created_at to ISO time format.

Parameters:tweet_time – created_at date of a tweet
Returns:A string of the ISO formatted time.
_wait_for_rules()

Wait for the Redis rules to appear

run()

Run method for running a traptor instance.

It sets up the logging, connections, grabs the rules from redis, and starts writing data to kafka if enabled.

traptor.traptor.main()

Command line interface to run a traptor instance.

Can pass it flags for debug levels and also –stdout mode, which means it will not write to kafka but stdout instread.