Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get Full Article Error #104

Open
satorugojos opened this issue Oct 11, 2024 · 1 comment
Open

Get Full Article Error #104

satorugojos opened this issue Oct 11, 2024 · 1 comment

Comments

@satorugojos
Copy link

from gnews import GNews
google_news = GNews(max_results=2)

news = google_news.get_news_by_topic("WORLD")

article = google_news.get_full_article(news[0]['url'])

print(article.text)

news value:
{'title': 'Live Updates: Nobel Peace Prize Is Awarded to Japanese Group of Atomic Bomb Survivors - The New York Times', 'description': 'Live Updates: Nobel Peace Prize Is Awarded to Japanese Group of Atomic Bomb Survivors The New York TimesNobel peace prize awarded to Japanese atomic bomb survivors’ group The GuardianJapanese atomic bomb survivors group wins Nobel Peace Prize AxiosJapanese atomic bomb survivors win Nobel Peace Prize BBC.com', 'published date': 'Fri, 11 Oct 2024 12:42:33 GMT', 'url': 'https://news.google.com/rss/articles/CBMiekFVX3lxTFBIcUMzdi11a0t6by1WdUpibEI2YWp4VUVCOEhkNUNNS3R0QXBiZXFIbHQ2ams4MTlqeXhyUkRhVmNZQWEtYVdUcEVna1VNNjNXQktIeWFBdGxoZXZ4T3d1WUFMdmV5QkRzXzlzMnQxcjZaYTcydjFscy13?oc=5&hl=en-US&gl=US&ceid=US:en', 'publisher': {'href': 'https://www.nytimes.com', 'title': 'The New York Times'}}

as a result of this code, article returns empty value, when I debug it, I see that RSS cannot get information from URL. Because when the RSS URL is clicked, it redirects to this link: https://www.nytimes.com/live/2024/10/11/world/nobel-peace-prize-winner

How do I solve this problem?

@Isaaq-Khader
Copy link

Isaaq-Khader commented Oct 14, 2024

Hey!

So the error you're seeing has been an ongoing issue with the GNews library. When you click the RSS URL, it will auto-direct you to the correct link. However in the code, it uses the RSS URL directly which is unable to provide you article's text.

This issue has followed the long journey of figuring out how to resolve the URL errors: #101

The tl;dr is to use this solution: https://gist.github.com/huksley/bc3cb046157a99cd9d1517b32f91a99e?permalink_comment_id=5224328#gistcomment-5224328

Essentially, you have to resolve the RSS URL on your own side and then get the full news article using the resolved URL. The solution provided in the link above gives a way to decode the RSS URL. If you decide to use the googlenewsdecoder, it would look like the following:

from gnews import GNews
from googlenewsdecoder import new_decoderv1

google_news = GNews(max_results=2)

news = google_news.get_news_by_topic("WORLD")

interval_time = 5 # this can change to any time, but 5 is recommended
decoded_url = new_decoderv1(news[0]['url'], interval=interval_time)

article = google_news.get_full_article(decoded_url['decoded_url'])

print(article.text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants