Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added rate limit protection #254

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Added rate limit protection #254

wants to merge 1 commit into from

Conversation

g4cko
Copy link

@g4cko g4cko commented Nov 15, 2024

Added method to keep track of rate limits (returned via headers) for each method

Summary by Sourcery

Implement rate limit protection by adding methods to track and update rate limits for API requests, ensuring compliance with rate limits and preventing excessive requests.

New Features:

  • Introduce rate limit tracking for API requests by storing rate limit information for each URL.

Enhancements:

  • Add methods to check and update rate limits based on response headers.

Summary by CodeRabbit

  • New Features

    • Introduced a rate limiting mechanism to improve request management for the Twitter API.
    • Added methods to check and update rate limits, ensuring compliance with API usage restrictions.
  • Bug Fixes

    • Enhanced error handling for requests exceeding rate limits, providing clearer retry instructions.

Copy link

sourcery-ai bot commented Nov 15, 2024

Reviewer's Guide by Sourcery

This PR implements rate limit tracking and protection in the Twitter API client. The implementation adds a dictionary to store rate limit information per URL endpoint and introduces methods to check and update rate limits based on response headers. The rate limit check is integrated into the main request flow to prevent exceeding API limits.

Class diagram for rate limit tracking in Twitter API client

classDiagram
    class Client {
        +dict rate_limits
        +async rate_limit_check(url) bool
        +async rate_limit_update(url, response: Response) void
        +async request(method, url, **kwargs) tuple[dict | Any, Response]
    }
    note for Client "Added rate limit tracking and protection methods"
Loading

File-Level Changes

Change Details Files
Added rate limit tracking mechanism
  • Added rate_limits dictionary to store limit information per URL
  • Implemented rate_limit_check method to verify if requests can be made
  • Created rate_limit_update method to parse and store rate limit headers
twikit/client/client.py
Integrated rate limit protection into request flow
  • Added pre-request rate limit check to prevent exceeding limits
  • Updated rate limit information on successful responses
  • Enhanced error handling to update rate limits on 429 (Too Many Requests) responses
  • Added more detailed error message with retry time information
twikit/client/client.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

coderabbitai bot commented Nov 15, 2024

Walkthrough

The changes introduce a rate limiting mechanism to the Client class within twikit/client/client.py. A new dictionary attribute, rate_limits, is added to track rate limit information for different URLs. Two asynchronous methods, rate_limit_check and rate_limit_update, are implemented to manage these limits. The request method is modified to incorporate rate limit checks before making HTTP requests, raising an exception if limits are exceeded, and updating the rate limit data based on API responses.

Changes

File Change Summary
twikit/client/client.py - Added self.rate_limits = {} to store rate limit info.
- Introduced async def rate_limit_check(self, url) -> bool to check if requests can proceed.
- Introduced async def rate_limit_update(self, url, response: Response) -> None to update limits.
- Modified request method to include rate limit checks and updates.

Poem

In the land of code, where requests take flight,
A rabbit checks limits, both day and night.
With a hop and a skip, it knows when to wait,
For the Twitter API, it won’t tempt fate.
So here’s to the changes, both clever and bright,
Rate limits in place, making everything right! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @g4cko - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider documenting the default rate limit values (50 requests, 900 seconds) and their source/reasoning
  • The empty try-except block in rate_limit_update silently swallows all exceptions, which could hide important errors. Consider logging the exception or handling specific exception types
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -113,6 +115,34 @@ def __init__(

self.gql = GQLClient(self)
self.v11 = V11Client(self)
self.rate_limits = {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The rate_limits dictionary access should be protected against race conditions in async context

Consider using asyncio.Lock to synchronize access to self.rate_limits to prevent potential race conditions when multiple coroutines access it simultaneously.

self.rate_limits[url]["remaining"] = int(response.headers["x-rate-limit-remaining"])
if "x-rate-limit-reset" in response.headers:
self.rate_limits[url]["reset"] = float(response.headers["x-rate-limit-reset"])
except Exception:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid silently catching all exceptions in rate limit handling

Consider logging the exception or handling specific exception types to avoid masking potential issues with rate limit tracking.

self.rate_limits = {}


async def rate_limit_check(self, url) -> bool:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using a context manager pattern to handle rate limiting logic

The rate limiting logic can be simplified using a context manager pattern. This will:

  1. Reduce nesting in the request method
  2. Consolidate rate limit handling in one place
  3. Make error handling explicit
async def _handle_rate_limits(self, url: str) -> AsyncGenerator[None, Response]:
    if url not in self.rate_limits:
        self.rate_limits[url] = {"reset": time.time() + 900, "rate_limit_max": 50, "remaining": 50}
    elif time.time() > self.rate_limits[url]["reset"]:
        self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
    elif self.rate_limits[url]["remaining"] <= 0:
        raise TooManyRequests(f"Rate limit exceeded, retry after {self.rate_limits[url]['reset'] - time.time():.1f} seconds")

    try:
        yield
    finally:
        if response := yield:
            limits = self.rate_limits[url]
            limits["rate_limit_max"] = int(response.headers.get("x-rate-limit-limit", limits["rate_limit_max"]))
            limits["remaining"] = int(response.headers.get("x-rate-limit-remaining", limits["remaining"]))
            limits["reset"] = float(response.headers.get("x-rate-limit-reset", limits["reset"]))

async def request(self, method: str, url: str, auto_unlock: bool = True, raise_exception: bool = True, **kwargs):
    cookies_backup = self.get_cookies().copy()
    async with self._handle_rate_limits(url):
        response = await self.http.request(method, url, **kwargs)
        self._remove_duplicate_ct0_cookie()
        # ... rest of error handling code ...

This approach:

  • Combines rate limit checking and updating into a single method
  • Handles rate limit headers safely without silent exception handling
  • Reduces nesting in the request method
  • Makes the control flow easier to follow

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
twikit/client/client.py (5)

123-132: Refactor: Simplify the rate_limit_check method

The if-elif-else structure in the rate_limit_check method can be simplified by combining conditions using logical operators and returning the condition directly. This enhances code readability and conciseness.

Apply this diff to simplify the method:

 def rate_limit_check(self, url) -> bool:
     if url not in self.rate_limits:
         self.rate_limits[url] = {
             "reset": time.time() + 900,
             "rate_limit_max": 50,
             "remaining": 50
         }
         return True
-    elif time.time() > self.rate_limits[url]["reset"]:
-        self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
-        return True
-    elif self.rate_limits[url]["remaining"] > 0:
-        return True
-    else:
-        return False
+    if time.time() > self.rate_limits[url]["reset"]:
+        self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
+    return self.rate_limits[url]["remaining"] > 0
🧰 Tools
🪛 Ruff

126-129: Combine if branches using logical or operator

Combine if branches

(SIM114)


128-132: Return the condition self.rate_limits[url]['remaining'] > 0 directly

Replace with return self.rate_limits[url]['remaining'] > 0

(SIM103)


126-129: Refactor: Combine if branches using logical or operator

As suggested by the static analysis hint (SIM114), you can combine the if branches to simplify the code.

Apply this diff:

-    elif time.time() > self.rate_limits[url]["reset"]:
-        self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
-        return True
-    elif self.rate_limits[url]["remaining"] > 0:
+    elif time.time() > self.rate_limits[url]["reset"] or self.rate_limits[url]["remaining"] > 0:
         return True
🧰 Tools
🪛 Ruff

126-129: Combine if branches using logical or operator

Combine if branches

(SIM114)


128-132: Refactor: Return the condition directly

According to static analysis hint (SIM103), you can return the condition directly to make the code more concise.

Apply this diff:

-    elif self.rate_limits[url]["remaining"] > 0:
-        return True
-    else:
-        return False
+    return self.rate_limits[url]["remaining"] > 0
🧰 Tools
🪛 Ruff

128-132: Return the condition self.rate_limits[url]['remaining'] > 0 directly

Replace with return self.rate_limits[url]['remaining'] > 0

(SIM103)


136-144: Refactor: Avoid broadly catching Exception without handling

In the rate_limit_update method, catching all exceptions with except Exception: and then using pass can suppress unexpected errors, making debugging difficult. It's better to catch specific exceptions or log the exception details.

Apply this diff to handle exceptions appropriately:

 try:
     if "x-rate-limit-limit" in response.headers:
         self.rate_limits[url]["rate_limit_max"] = int(response.headers["x-rate-limit-limit"])
     if "x-rate-limit-remaining" in response.headers:
         self.rate_limits[url]["remaining"] = int(response.headers["x-rate-limit-remaining"])
     if "x-rate-limit-reset" in response.headers:
         self.rate_limits[url]["reset"] = float(response.headers["x-rate-limit-reset"])
-except Exception:
-    pass
+except KeyError as e:
+    # Handle missing header keys
+    print(f"Missing expected header in response: {e}")
+except ValueError as e:
+    # Handle invalid header values
+    print(f"Invalid header value: {e}")

135-135: Nitpick: Remove empty comment

There's an empty comment at line 135 that doesn't serve any purpose. Removing it can clean up the code.

Apply this diff:

-       #  
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f265840 and 0265b82.

⛔ Files ignored due to path filters (1)
  • twikit/__pycache__/__init__.cpython-313.pyc is excluded by !**/*.pyc
📒 Files selected for processing (1)
  • twikit/client/client.py (5 hunks)
🧰 Additional context used
🪛 Ruff
twikit/client/client.py

126-129: Combine if branches using logical or operator

Combine if branches

(SIM114)


128-132: Return the condition self.rate_limits[url]['remaining'] > 0 directly

Replace with return self.rate_limits[url]['remaining'] > 0

(SIM103)

Comment on lines +126 to +132
elif time.time() > self.rate_limits[url]["reset"]:
return True
elif self.rate_limits[url]["remaining"] > 0:
return True
else:

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical Issue: Reset rate limit data when reset time has passed

In the rate_limit_check method, when the current time exceeds the reset time (time.time() > self.rate_limits[url]["reset"]), the remaining count is not reset. This means that requests might be incorrectly blocked even after the reset time has passed because remaining may still be zero. To fix this, reset the rate limit data when the reset time has passed.

Apply this diff to reset the rate limit data:

 elif time.time() > self.rate_limits[url]["reset"]:
+    self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
     return True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif time.time() > self.rate_limits[url]["reset"]:
return True
elif self.rate_limits[url]["remaining"] > 0:
return True
else:
return False
elif time.time() > self.rate_limits[url]["reset"]:
self.rate_limits[url]["remaining"] = self.rate_limits[url]["rate_limit_max"]
return True
elif self.rate_limits[url]["remaining"] > 0:
return True
else:
return False
🧰 Tools
🪛 Ruff

126-129: Combine if branches using logical or operator

Combine if branches

(SIM114)


128-132: Return the condition self.rate_limits[url]['remaining'] > 0 directly

Replace with return self.rate_limits[url]['remaining'] > 0

(SIM103)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant