Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests 104 Error on tweets search in api 2.0 #140

Open
markowanga opened this issue Nov 5, 2021 · 2 comments
Open

Requests 104 Error on tweets search in api 2.0 #140

markowanga opened this issue Nov 5, 2021 · 2 comments
Labels

Comments

@markowanga
Copy link

Describe the bug
When I want to search tweets in api 2.0 I have problem with requests library. First few days I have no this error -- I have restarted me PC, and rebuild dockers. Here are logs:

swps_worker | 2021-11-05 06:29:57,447 [searchtweets.result_stream    ] INFO     using bearer token for authentication
swps_worker | 2021-11-05 06:29:57,447 [searchtweets.result_stream    ] DEBUG    sending request
swps_worker | 2021-11-05 06:29:57,450 [urllib3.connectionpool        ] DEBUG    Starting new HTTPS connection (1): api.twitter.com:443
swps_worker | 2021-11-05 06:29:58,540 [urllib3.connectionpool        ] DEBUG    https://api.twitter.com:443 "GET /2/tweets/search/all?query=%28%22%23absolwenci%22+OR+%22%23covid%22+OR+%22%23COVID-19%22+OR+%22%23Covid19%22+OR+%22%23doros%C5%82o%C5%9B%C4%87%22+OR+%22%23generacjaZ%22+OR+%22%23genX%22+OR+%22%23genZ%22+OR+%22%23koronawirus%22+OR+%22%23koronawiruspolska%22+OR+%22%23liceum%22+OR+%22%23lockdown%22+OR+%22%23matura%22+OR+%22%23matura2020%22+OR+%22%23matura2021%22+OR+%22%23millenialsi%22+OR+%22%23m%C5%82odzi%22+OR+%22%23pandemia%22+OR+%22%23pierwszami%C5%82o%C5%9B%C4%87%22+OR+%22%23praca2021%22+OR+%22%23pracazdalna%22+OR+%22%23rekrutacja2020%22+OR+%22%23rekrutacja2021%22+OR+%22%23rodzina%22+OR+%22%23siedznadupie%22+OR+%22%23solidarno%C5%9B%C4%87%22+OR+%22%23strajkkobiet%22+OR+%22%23studia2020%22+OR+%22%23studia2021%22+OR+%22%23studiazdalne%22+OR+%22%23szko%C5%82a%22+OR+%22%23zdalne%22+OR+%22%23zdalnenauczanie%22+OR+%22%23zostanwdomu%22%29+lang%3Apl&start_time=2020-03-03T00%3A00%3A00Z&end_time=2020-03-04T00%3A00%3A00Z&max_results=100&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Ctext%2Cwithheld&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics%2Calt_text&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&expansions=attachments.poll_ids%2Cattachments.media_keys%2Cauthor_id%2Centities.mentions.username%2Cgeo.place_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id HTTP/1.1" 200 61537
swps_worker | 2021-11-05 06:29:58,672 [searchtweets.result_stream    ] INFO     paging; total requests read so far: 1
swps_worker | 2021-11-05 06:30:00,674 [searchtweets.result_stream    ] DEBUG    sending request
swps_worker | Traceback (most recent call last):
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
swps_worker |     httplib_response = self._make_request(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
swps_worker |     six.raise_from(e, None)
swps_worker |   File "<string>", line 3, in raise_from
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
swps_worker |     httplib_response = conn.getresponse()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
swps_worker |     response.begin()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
swps_worker |     version, status, reason = self._read_status()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
swps_worker |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
swps_worker |   File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
swps_worker |     return self._sock.recv_into(b)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
swps_worker |     return self.read(nbytes, buffer)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
swps_worker |     return self._sslobj.read(len, buffer)
swps_worker | ConnectionResetError: [Errno 104] Connection reset by peer
swps_worker | 
swps_worker | During handling of the above exception, another exception occurred:
swps_worker | 
swps_worker | Traceback (most recent call last):
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
swps_worker |     resp = conn.urlopen(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
swps_worker |     retries = retries.increment(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
swps_worker |     raise six.reraise(type(error), error, _stacktrace)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
swps_worker |     raise value.with_traceback(tb)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
swps_worker |     httplib_response = self._make_request(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
swps_worker |     six.raise_from(e, None)
swps_worker |   File "<string>", line 3, in raise_from
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
swps_worker |     httplib_response = conn.getresponse()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
swps_worker |     response.begin()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
swps_worker |     version, status, reason = self._read_status()
swps_worker |   File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
swps_worker |     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
swps_worker |   File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
swps_worker |     return self._sock.recv_into(b)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
swps_worker |     return self.read(nbytes, buffer)
swps_worker |   File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
swps_worker |     return self._sslobj.read(len, buffer)
swps_worker | urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
swps_worker | 
swps_worker | During handling of the above exception, another exception occurred:
swps_worker | 
swps_worker | Traceback (most recent call last):
swps_worker |   File "app/main.py", line 34, in <module>
swps_worker |     single_work()
swps_worker |   File "app/main.py", line 25, in single_work
swps_worker |     worker_service.run()
swps_worker |   File "/app/app/application/worker_service.py", line 53, in run
swps_worker |     tweets = self._scrap_service.scrap(
swps_worker |   File "/app/app/infrastructure/official_twitter_scrap_service.py", line 54, in scrap
swps_worker |     tweets = collect_results(
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 467, in collect_results
swps_worker |     return list(rs.stream())
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 375, in stream
swps_worker |     self.execute_request()
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 415, in execute_request
swps_worker |     resp = request(session=self.session,
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 77, in retried_func
swps_worker |     raise exc
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 73, in retried_func
swps_worker |     resp = func(*args, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/searchtweets/result_stream.py", line 140, in request
swps_worker |     result = session.get(url, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
swps_worker |     return self.request('GET', url, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
swps_worker |     resp = self.send(prep, **send_kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
swps_worker |     r = adapter.send(request, **kwargs)
swps_worker |   File "/root/.cache/pypoetry/virtualenvs/swps-tweet-infrastructure-9TtSrW0h-py3.8/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
swps_worker |     raise ConnectionError(err, request=request)
swps_worker | requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

To Reproduce
I'm running library wrapped in this service:

class OfficialTwitterScrapService(ScrapService):
    _config_file: str
    _premium_search_args: Dict[str, Any]

    def __init__(self, config_file: str):
        self._config_file = config_file
        self._premium_search_args = load_credentials(self._config_file,
                                                     yaml_key="search_tweets_premium",
                                                     env_overwrite=False)

    def scrap(
            self,
            query: str,
            since: Arrow,
            until: Arrow
    ) -> List[RawJsonTwitterResponse]:
        logger.info(
            f'run scrap query :: {query}'
            f' | since :: {since.isoformat()}'
            f' | until :: {until.isoformat()}'
        )
        query = gen_request_parameters(
            query=query,
            granularity=None,
            results_per_call=100,
            start_time=self._get_string_time_from_arrow(since),
            end_time=self._get_string_time_from_arrow(until),
            expansions='attachments.poll_ids,attachments.media_keys,author_id,'
                       'entities.mentions.username,geo.place_id,in_reply_to_user_id,'
                       'referenced_tweets.id,referenced_tweets.id.author_id',
            media_fields='duration_ms,height,media_key,preview_image_url,type,url,width,'
                         'public_metrics,alt_text',
            place_fields='contained_within,country,country_code,full_name,geo,id,name,place_type',
            tweet_fields='attachments, author_id, context_annotations, conversation_id, created_at,'
                         ' entities, geo, id, in_reply_to_user_id, lang, public_metrics,'
                         ' possibly_sensitive, referenced_tweets, reply_settings, source,'
                         ' text, withheld'.replace(' ', ''),
            user_fields='created_at,description,entities,id,location,name,pinned_tweet_id,'
                        'profile_image_url,protected,public_metrics,url,username,verified,withheld'
        )
        tweets = collect_results(
            query,
            max_tweets=10_000_000,
            result_stream_args=self._premium_search_args
        )
        return [RawJsonTwitterResponse(json.dumps(it)) for it in tweets]

    @staticmethod
    def _get_string_time_from_arrow(time: Arrow) -> str:
        return time.isoformat()[:-9]

Expected behavior
I want to scrap tweets without errors

Environment

  • Ubuntu 20.10 -> docker image python:3.8
  • searchtweets-v2 = "^1.1.1"
@markowanga markowanga added the bug label Nov 5, 2021
@igorbrigadir
Copy link

Connection reset by peer can come up if you have some firewall blocking something, or another network related error - like SSL certs, or something else like that. If you're running it in a docker container - how exactly is the container running? is it docker swarm or just plain docker run or kubernetes or something else? Generally connection reset by peer is not something that comes up if there's a code or authentication error.

@markowanga
Copy link
Author

I'am running all in docker, but it works. After few hours error occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants