-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRCH-5469-spidermon #50
Conversation
CreateCustomFileReport works with Periodic Monitor to create the custom report
@IsabelLaurenceau what do you think about a way for us to easily disable or enable the email notifications based on environment? For instance I don't think we would usually want to send emails if there was an error locally or even from a future dev or staging environment. A blunt tool could be to set the value of SPIDERMON_ENABLED to an environment variable, perhaps default it to off and something has to be set to turn it on? |
@IsabelLaurenceau what did you end up using for email? did you configure your personal gmail to send these or something else? Were there any steps you followed? I know you said it took a while to figure it out so I'm hoping you can point us in the right direction at least. |
That makes sense to me. I think we could just set it to an environment variable like you said. I'm not sure if this is what you mean by having something set to turn it on but set that variable to false in all environments other than Prod. |
I personally use an apple email so I used my apple email credentials for SPIDERMON_EMAIL_SENDER, SPIDERMON_SMTP_HOST, SPIDERMON_SMTP_PORT, SPIDERMON_SMTP_USER, SPIDERMON_SMTP_PASSWORD. It depends on what email server you want to use. The tricky part I had with my apple email was that the password has to be an app password and not just your iCloud password (which is what I had assumed) and the smto_user has to be the entire email address including the domain. I also had to use port 587 with TLS |
In case it helps anyone else, I got the smtp setup by adding an app password https://myaccount.google.com/apppasswords to my fearless gmail account and then using these environment variables:
|
@IsabelLaurenceau Did you ever get this error when sending emails? The emails are sent but this error occurs. This looks like an open issue with spidermon: scrapinghub/spidermon#412
|
@IsabelLaurenceau |
I don't usually get more than one email. That is not expected behavior. Did you change any of the spidermon variables when this happened? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 🥇
Do we really need email support? I know it's really easy to be classified as a spam bot if we implement it wrong or send more emails than the "allowed" threshold.
I understand slack bot/notifications are hard to get approved, but I think it's so much more valuable than emails. *But, only because my email is already spammed out by datadog and newrelic 🙃
@@ -7,6 +7,10 @@ | |||
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html | |||
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html | |||
|
|||
import os | |||
from pathlib import Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this import isn't used and can be removed
@@ -0,0 +1,18 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this import isn't used and can be removed
dirname= os.path.dirname(__file__) | ||
body_html_template = os.path.join(dirname, 'actions', 'results.jinja') | ||
|
||
SPIDERMON_ENABLED = os.environ.get('SPIDERMON_ENABLED') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though scrapy's get_bool function https://github.com/scrapy/scrapy/blob/212e848402a63b43fe8b7204e19d47fa7c4f0cd9/scrapy/settings/__init__.py#L126 allows this to be really flexible, I think it would be more clear to add a default so that someone can understand that this is meant to be a boolean: os.environ.get('SPIDERMON_ENABLED', 'False')
for example. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, generally works as expected. If you wanted to fix the few small things I put in that would be great otherwise no problem, seems like we will have some more work to do on this before enabling it in prod anyway.
I'm also going to add in the comments, just for tracking purposes, three things that I think we all agree are out of scope for this PR:
- The snippet I slacked you about changing the email mime type
- The error I'm seeing when I send emails (although emails still get sent)
- Some more "proof" that multiple emails are getting sent but only one report file is present at the end of the run (perhaps being overwritten again and again).
For future use in making the emails more pretty: Spidermon uses a scrapy MailSender class: https://github.com/scrapinghub/spidermon/blob/de6921541f38613f62384efc682ed2a0282b08fa/spidermon/contrib/actions/email/smtp.py#L79 class CreateCustomEmailReport(SendSmtpEmail):
dirname = os.path.dirname(__file__)
body_html_template = os.path.join(dirname, "actions", "results.jinja")
def send_message(self, message, **kwargs):
mail_sender = MailSender(
smtphost=self.smtp_host,
mailfrom=self.sender,
smtpuser=self.smtp_user,
smtppass=self.smtp_password,
smtpport=self.smtp_port,
smtptls=self.smtp_enforce_tls,
smtpssl=self.smtp_enforce_ssl,
)
mail_sender.send(
to=self.to,
subject=message["Subject"],
body=message.as_string(),
cc=self.cc,
mimetype="text/html",
_callback=kwargs.get("_callback"),
) |
For future use in investigating discrepancy in file report output and number of emails sent: For this test I used the built in email mock setting https://spidermon.readthedocs.io/en/latest/actions/email-action.html#spidermon-email-fake so that I wasn't actually sending emails but I could log when they are being sent. I redirected logs to a file and then grep'ed for the number of times it sent an email... which was 11 here. At the end of this run I only saw one report file. (.venv) ➜ search_gov_crawler git:(SRCH-5469-Spidermon) ✗ scrapy crawl domain_spider -a allowed_domains=quotes.toscrape.com -a start_urls=https://quotes.toscrape.com/ > scrapy_log.txt 2>&1
(.venv) ➜ search_gov_crawler git:(SRCH-5469-Spidermon) ✗ grep -c "SendSmtpEmail... OK" scrapy_log.txt
11
(.venv) ➜ search_gov_crawler git:(SRCH-5469-Spidermon) ✗ ls -1 *spidermon_file_report.html | wc -l
1 |
For future use investigating bio_read errors while sending email as mentioned in #50 (comment) With default settings and email enabled from dself@fearless.com to daniel.self@gsa.gov:
|
Summary
Testing Instructions:
Prep:
Testing:
scrapy crawl domain_spider -a allowed_domains=quotes.toscrape.com/tag -a start_urls=https://quotes.toscrape.com
Post Merge Steps:
Future Work:
There are a few things I would like to do that I don't think really fit into the scope of this ticket:
Checklist
Please ensure you have addressed all concerns below before marking a PR "ready for review" or before requesting a re-review. If you cannot complete an item below, replace the checkbox with the⚠️
:warning:
emoji and explain why the step was not completed.Functionality Checks
You have merged the latest changes from the target branch (usually
main
) into your branch.Your primary commit message is of the format SRCH-#### <description> matching the associated Jira ticket.
PR title is either of the format SRCH-#### <description> matching the associated Jira ticket (i.e. "SRCH-123 implement feature X"), or Release - SRCH-####, SRCH-####, SRCH-#### matching the Jira ticket numbers in the release.
Automated checks pass. If Code Climate checks do not pass, explain reason for failures:
Process Checks