Add logging #2329

moxarth-rathod · 2024-04-03T06:37:49Z

Part of #2299

This PR includes logging for the following connectors:

Azure Blob Storage
Confluence
MS SQL
Oracle
PostgreSQL

Also, there are few changes in confluence as per this PR: #2326

Drive folder for log files with DEBUG log_level: https://drive.google.com/drive/folders/13-P99PjgjFdjI9hrmPsuhNst6Kdu5C0-

Checklists

Pre-Review Checklist

this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
this PR has a meaningful title
this PR links to all relevant github issues that it fixes or partially addresses
if there is no GH issue, please create it. Each PR should have a link to an issue
this PR has a thorough description
Covered the changes with automated tests

timgrein

Nice improvement so far! Left some comments

timgrein · 2024-04-03T08:32:25Z

connectors/sources/confluence.py

@@ -714,6 +726,7 @@ async def fetch_server_space_permission(self, space_key):
            return {}

        url = URLS[SPACE_PERMISSION].format(space_key=space_key)
+        self._logger.debug(f"Fetching permissions for '{space_key}' space")


Should this be info log-level as it indicates progress?

Yes, this should be info.

timgrein · 2024-04-03T08:33:18Z

connectors/sources/mssql.py

+            self._logger.info(
+                "Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rule is enabled."
+            )


Suggested change

self._logger.info(

"Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rule is enabled."

)

self._logger.info(

"Fetching all tables as the configuration field 'tables' is set to '*' or advanced sync rules are enabled."

)

I think we should also split the log message without concatenating two possible reasons with or. This adds ambiguity IMO.

timgrein · 2024-04-03T08:35:20Z

connectors/sources/mssql.py

+        self._logger.debug(
+            f"Streaming records from database for table: {table} or using query: {query}"
+        )


I'm not the biggest fan of using or inside a log statement as it adds ambiguity. I think we should have two distinct log messages depending on the value of query. You could also move the assignment of the parameter for get_cursor also into this condition to not check it twice.

Sure, i've bifurcated this and made it dependant on the value of query.

timgrein · 2024-04-03T08:35:49Z

connectors/sources/mssql.py

@@ -553,6 +568,9 @@ async def fetch_documents_from_query(self, tables, query):
        Yields:
            Dict: Document to be indexed
        """
+        self._logger.debug(


Should this also be info level as it indicates progress?

timgrein · 2024-04-03T08:37:23Z

connectors/sources/postgresql.py

@@ -218,6 +219,9 @@ async def ping(self):
    async def get_tables_to_fetch(self, is_filtering=False):
        tables = configured_tables(self.tables)
        if is_wildcard(tables) or is_filtering:
+            self._logger.info(


Same reasoning as in mssql.py: I think we should've two distinct log messages instead of using one containing or.

timgrein · 2024-04-03T08:39:02Z

connectors/sources/postgresql.py

@@ -537,6 +546,9 @@ async def fetch_documents_from_query(self, tables, query):
        Yields:
            Dict: Document to be indexed
        """
+        self._logger.debug(


Should this be info level?

artem-shelkovnikov · 2024-04-03T08:55:50Z

Please do not combine multiple connectors in same PR.

As for the changes - can you include example logs for the sync before/after for each connector with INFO and separately with DEBUG log level? E.g.:

ABS Before INFO
ABS After INFO
ABS Before DEBUG
ABS After DEBUG
Confluence Before INFO
Confluence After INFO
etc...

moxarth-rathod · 2024-04-03T10:21:00Z

Please do not combine multiple connectors in same PR.

@artem-shelkovnikov We're planning to update the logging functionality across roughly 20 connectors.
During our weekly meeting, we discussed with @DianaJourdan that bundling the changes for every 5 connectors into a single pull request would streamline the process.
Since the changes are confined to the logger functionality, reviewing the updates for 5 connectors at a time should not be overly time-consuming or complicated. This approach avoids the complexity of simultaneously managing 20 open pull requests per connector.
Consequently, we will proceed by consolidating the updates into four separate pull requests, each encompassing 5 connectors.

Let me know if you have differing opinion for raising four PRs each containing 5 connector.

moxarth-rathod · 2024-04-03T11:01:22Z

can you include example logs for the sync before/after for each connector with INFO and separately with DEBUG log level?

I think we can inspect the logs from E2E tests in Buildkite rather than creating four separate log files (including before/after logs for both INFO and DEBUG levels). Generating four files for approximately 20 connectors seems very time-consuming.

artem-shelkovnikov · 2024-04-03T11:05:05Z

I think we can inspect the logs from E2E tests in Buildkite rather than creating four separate log files (including before/after logs for both INFO and DEBUG levels). Generating four files for approximately 20 connectors seems very time-consuming.

That is my concern as well, but also reviewing logs for E2E tests in Buildkite side by side for each reviewer is time consuming, I think it's fair to at least attach the links to E2E test runs here for each connector before/after. Also E2E only sends DEBUG logs, which produces a lot more than INFO, manually filtering these outs is huge work as well.

Without actually looking at logs generated we cannot tell if logs are good now or not.

parth-elastic · 2024-04-03T12:25:51Z

Without actually looking at logs generated we cannot tell if logs are good now or not.

Indeed, your point sounds reasonable. How about a hybrid approach, where we generate a before/after and an Info/Debug log file for just one connector within each PR for all the 4 PRs.
Since the logs for different connectors are likely to be similar, this method would enable us to check the log changes effectively with reduced effort. WDYT?

artem-shelkovnikov · 2024-04-03T12:32:52Z

Since the logs for different connectors are likely to be similar, this method would enable us to check the log changes effectively with reduced effort. WDYT?

But I think also the point is that for some connectors (like S3/GCS/ABS) they will likely be similar, so for such connectors we can do that, but for different style connectors (S3 vs Jira vs Github) expected logs are gonna be different. We can try to break connectors by log simplicity into grouped PRs, but I'm also not super sure about this. Let me bring this up with the team

parth-elastic · 2024-04-03T12:47:36Z

the point is that for some connectors (like S3/GCS/ABS) they will likely be similar, so for such connectors we can do that

We can organize our pull requests by grouping similar types of connectors together. For instance, this PR includes 3 database connectors, so we could produce before/after logs specifically for MSSQL as a representative. In a subsequent PR, if we bundle together connectors like S3, GCS, Box, OneDrive, and Dropbox, which are all storage services, we can create a single set of logs for one of them as an example. We can apply the same strategy for future PRs, streamlining our logging efforts by focusing on a representative connector from each group.

artem-shelkovnikov · 2024-04-03T12:49:32Z

I've opened the discussion with the team, we'll get back to you about it shortly

artem-shelkovnikov · 2024-04-11T12:44:31Z

Hey @moxarth-elastic, as mentioned in Slack, can we have logs in DEBUG log_level for each connector changed here to continue the review?

moxarth-rathod · 2024-04-12T05:38:17Z

Hey @moxarth-elastic, as mentioned in Slack, can we have logs in DEBUG log_level for each connector changed here to continue the review?

@artem-shelkovnikov Let me upload the log files for the connectors of this PR, so you can continue your review

moxarth-rathod · 2024-04-12T09:31:47Z

@artem-shelkovnikov You can check the log file from this drive folder: https://drive.google.com/drive/folders/13-P99PjgjFdjI9hrmPsuhNst6Kdu5C0-

artem-shelkovnikov · 2024-05-07T14:45:47Z

Can you break this PR into smaller PRs per connector? I've checked the logs and will have feedback for each connector, addressing all those in one PR will be pretty tough IMO

moxarth-rathod · 2024-05-08T12:47:37Z

Can you break this PR into smaller PRs per connector? I've checked the logs and will have feedback for each connector, addressing all those in one PR will be pretty tough IMO

@artem-shelkovnikov Split this PR into smaller connector-wise PRs
ABS - #2530
Confluence - #2526
Oracle - #2528
MSSQL - #2527
PostgreSQL - #2529

artem-shelkovnikov · 2024-07-23T11:40:24Z

Closing in favour of other PRs

moxarth-rathod added 2 commits April 3, 2024 11:47

Add logs in connectors

9836026

add test cases for confluence

3369b43

moxarth-rathod requested a review from a team April 3, 2024 06:37

github-actions bot added auto-backport v8.14.0.0 labels Apr 3, 2024

moxarth-rathod added azure-blob-storage generic-database confluence team:external mssql postgresql labels Apr 3, 2024

timgrein reviewed Apr 3, 2024

View reviewed changes

address suggestions

2af6064

moxarth-rathod requested a review from timgrein April 4, 2024 05:34

moxarth-rathod mentioned this pull request Apr 4, 2024

Add logging for cloud based connectors #2339

Closed

7 tasks

update if confition for advanced sync rule

03ab12b

sync with main

753995e

artem-shelkovnikov requested a review from a team April 15, 2024 16:29

resolve conflicts

df1d1fa

artem-shelkovnikov closed this Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logging #2329

Add logging #2329

moxarth-rathod commented Apr 3, 2024 •

edited

Loading

timgrein left a comment

timgrein Apr 3, 2024

moxarth-rathod Apr 3, 2024

timgrein Apr 3, 2024

timgrein Apr 3, 2024

timgrein Apr 3, 2024

moxarth-rathod Apr 3, 2024

timgrein Apr 3, 2024

timgrein Apr 3, 2024

timgrein Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

moxarth-rathod commented Apr 3, 2024

moxarth-rathod commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

parth-elastic commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

parth-elastic commented Apr 3, 2024 •

edited

Loading

artem-shelkovnikov commented Apr 3, 2024

artem-shelkovnikov commented Apr 11, 2024

moxarth-rathod commented Apr 12, 2024

moxarth-rathod commented Apr 12, 2024

artem-shelkovnikov commented May 7, 2024

moxarth-rathod commented May 8, 2024

artem-shelkovnikov commented Jul 23, 2024

Add logging #2329

Add logging #2329

Conversation

moxarth-rathod commented Apr 3, 2024 • edited Loading

Part of #2299

Checklists

Pre-Review Checklist

timgrein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artem-shelkovnikov commented Apr 3, 2024

moxarth-rathod commented Apr 3, 2024

moxarth-rathod commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

parth-elastic commented Apr 3, 2024

artem-shelkovnikov commented Apr 3, 2024

parth-elastic commented Apr 3, 2024 • edited Loading

artem-shelkovnikov commented Apr 3, 2024

artem-shelkovnikov commented Apr 11, 2024

moxarth-rathod commented Apr 12, 2024

moxarth-rathod commented Apr 12, 2024

artem-shelkovnikov commented May 7, 2024

moxarth-rathod commented May 8, 2024

artem-shelkovnikov commented Jul 23, 2024

moxarth-rathod commented Apr 3, 2024 •

edited

Loading

parth-elastic commented Apr 3, 2024 •

edited

Loading