-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logging #2329
Add logging #2329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement so far! Left some comments
connectors/sources/confluence.py
Outdated
@@ -714,6 +726,7 @@ async def fetch_server_space_permission(self, space_key): | |||
return {} | |||
|
|||
url = URLS[SPACE_PERMISSION].format(space_key=space_key) | |||
self._logger.debug(f"Fetching permissions for '{space_key}' space") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be info
log-level as it indicates progress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be info.
connectors/sources/mssql.py
Outdated
self._logger.info( | ||
"Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rule is enabled." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._logger.info( | |
"Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rule is enabled." | |
) | |
self._logger.info( | |
"Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rules are enabled." | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also split the log message without concatenating two possible reasons with or
. This adds ambiguity IMO.
connectors/sources/mssql.py
Outdated
self._logger.debug( | ||
f"Streaming records from database for table: {table} or using query: {query}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the biggest fan of using or
inside a log statement as it adds ambiguity. I think we should have two distinct log messages depending on the value of query
. You could also move the assignment of the parameter for get_cursor
also into this condition to not check it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i've bifurcated this and made it dependant on the value of query
.
connectors/sources/mssql.py
Outdated
@@ -553,6 +568,9 @@ async def fetch_documents_from_query(self, tables, query): | |||
Yields: | |||
Dict: Document to be indexed | |||
""" | |||
self._logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be info level as it indicates progress?
connectors/sources/postgresql.py
Outdated
@@ -218,6 +219,9 @@ async def ping(self): | |||
async def get_tables_to_fetch(self, is_filtering=False): | |||
tables = configured_tables(self.tables) | |||
if is_wildcard(tables) or is_filtering: | |||
self._logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reasoning as in mssql.py
: I think we should've two distinct log messages instead of using one containing or
.
connectors/sources/postgresql.py
Outdated
@@ -537,6 +546,9 @@ async def fetch_documents_from_query(self, tables, query): | |||
Yields: | |||
Dict: Document to be indexed | |||
""" | |||
self._logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be info
level?
Please do not combine multiple connectors in same PR. As for the changes - can you include example logs for the sync before/after for each connector with INFO and separately with DEBUG log level? E.g.:
|
@artem-shelkovnikov We're planning to update the logging functionality across roughly 20 connectors. Let me know if you have differing opinion for raising four PRs each containing 5 connector. |
I think we can inspect the logs from E2E tests in Buildkite rather than creating four separate log files (including before/after logs for both INFO and DEBUG levels). Generating four files for approximately 20 connectors seems very time-consuming. |
That is my concern as well, but also reviewing logs for E2E tests in Buildkite side by side for each reviewer is time consuming, I think it's fair to at least attach the links to E2E test runs here for each connector before/after. Also E2E only sends DEBUG logs, which produces a lot more than INFO, manually filtering these outs is huge work as well. Without actually looking at logs generated we cannot tell if logs are good now or not. |
Indeed, your point sounds reasonable. How about a hybrid approach, where we generate a before/after and an Info/Debug log file for just one connector within each PR for all the 4 PRs. |
But I think also the point is that for some connectors (like S3/GCS/ABS) they will likely be similar, so for such connectors we can do that, but for different style connectors (S3 vs Jira vs Github) expected logs are gonna be different. We can try to break connectors by log simplicity into grouped PRs, but I'm also not super sure about this. Let me bring this up with the team |
We can organize our pull requests by grouping similar types of connectors together. For instance, this PR includes 3 database connectors, so we could produce before/after logs specifically for MSSQL as a representative. In a subsequent PR, if we bundle together connectors like S3, GCS, Box, OneDrive, and Dropbox, which are all storage services, we can create a single set of logs for one of them as an example. We can apply the same strategy for future PRs, streamlining our logging efforts by focusing on a representative connector from each group. |
I've opened the discussion with the team, we'll get back to you about it shortly |
Hey @moxarth-elastic, as mentioned in Slack, can we have logs in DEBUG log_level for each connector changed here to continue the review? |
@artem-shelkovnikov Let me upload the log files for the connectors of this PR, so you can continue your review |
@artem-shelkovnikov You can check the log file from this drive folder: https://drive.google.com/drive/folders/13-P99PjgjFdjI9hrmPsuhNst6Kdu5C0- |
Can you break this PR into smaller PRs per connector? I've checked the logs and will have feedback for each connector, addressing all those in one PR will be pretty tough IMO |
@artem-shelkovnikov Split this PR into smaller connector-wise PRs |
Closing in favour of other PRs |
Part of #2299
This PR includes logging for the following connectors:
Also, there are few changes in confluence as per this PR: #2326
Drive folder for log files with DEBUG log_level: https://drive.google.com/drive/folders/13-P99PjgjFdjI9hrmPsuhNst6Kdu5C0-
Checklists
Pre-Review Checklist
config.yml.example
)