Consider an application that uses the Twitter Streaming API as an input to perform Natural Language Processing. The analysis could require a lot of computing resources, and we might want to control the average number of tweets per time. For the analysis, we might want to focus on tweets from people with many followers. Twitter streams are very volatile, so how can we control this? This article discusses the issue of limiting the average number of events and assumes that a stream of tweets is already in place.
We can break down the problem in three steps:
- Estimate the current rate of events
- Based on the rate, adjust the number of followers threshold
- Check if a tweet reaches the follower threshold or not
Let’s tackle them one by one.
The rate of accepted tweets is simple the number of tweets accepted per unit of time. This is easier said than computed. For this application, we don’t want to use a sliding window approach that could easily fail if the rate varies over many orders of magnitude. Instead, we estimate the rate based on exponential decay. The rate decays exponentially between tweet events and jumps every time we observe a new tweet. Programmatically, we can do this with
# Compute rate after delta seconds since last tweet rate = math.exp(-delta * damp_rate) * rate # For every accepted tweet rate += damp_rate
The algorithm has a free parameter:
damp_rate that controls the rate of
exponential decay. In turn, this means how fast the limiter can adapt to changes.
Control engineering has the solution to these kinds of problems. We can use a PID controller to adjust a threshold. Only tweets from users with more followers than the threshold are accepted by our system. The input of the controller is our rate estimate. The output of the controller is the number of followers threshold.
The last step is simple compared to the previous steps. We can implement the
filtering with an
Putting it together
The full workable example can be used with the tweetlimiter package.
from tweetlimiter import Limiter limiter = Limiter(target=1/60) # Your target rate in tweets per second
# For each new tweet decision = limiter.filter(TWEET_TIMESTAMP, NUMBER_OF_FOLLOWERS) if decision: # Perform clostly analysis pass
This might also interest you