-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure Blob Storage] Add logs #2530
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one additional thing I noticed is that in get_docs
we have
async def get_docs(self, filtering=None):
...
async for container_data in self.get_container(container_list=self.containers):
if container_data:
...
I don't have the context to know why we need if container_data
but if it's needed I would log a debug
statement that in else
clause that we are skipping container because it's empty(?) / doesn't have data?
@@ -212,6 +216,7 @@ async def get_container(self, container_list): | |||
Yields: | |||
dictionary: Container document with name & metadata | |||
""" | |||
self._logger.debug("Fetching containers") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in get_blob
we have log level INFO
, maybe worth changing log level INFO
here as well to keep it consistent?
@@ -185,6 +185,7 @@ async def get_content(self, blob, timestamp=None, doit=None): | |||
if not self.can_file_be_downloaded(file_extension, filename, file_size): | |||
return | |||
|
|||
self._logger.debug(f"Downloading content for file: {filename}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a duplicate line - you already have "f"Downloading content for blob: {blob_name} from {container_name} container"" that happens right after
@@ -247,6 +252,7 @@ async def get_blob(self, container): | |||
Yields: | |||
dictionary: Formatted blob document | |||
""" | |||
self._logger.info(f"Fetching blobs for '{container['name']}' container") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._logger.info(f"Fetching blobs for '{container['name']}' container") | |
self._logger.info(f"Fetching blobs for container '{container['name']}'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Would be also cool to log how many blobs are found in the container. Also would be nice to report progress in format of "Extracted X out of Y blobs for container {container}" reported every 100 lines and in the end of sync.
- Log lines "Generating connection string." should be DEBUG
- Log lines "Successfully connected to the Azure Blob Storage" should be DEBUG
- Need a DEBUG log line saying which containers were found, then which ones will be synced in the end after all validations for it
This condition was added due this
|
@moxarth-elastic could you share updated example DEBUG output, before we hit ✅ |
Here is the updated log file: https://drive.google.com/file/d/1HJ36WDN87P3q4fCeQGu8FiQUrh9texD-/view?usp=drive_link |
Buildkite test this |
self._logger.info( | ||
f"Fetched {blob_count} blobs from '{container['name']}' container" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go into a finally
statement so that it's always shown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can have logs of actual requests to the ABS?
Part Of #2299
Adding more logs in Azure Blob Storage connector.
Log file: https://drive.google.com/file/d/1RMuieFYk-llbqZUj__eYJbnUY5W2wq7K/view?usp=drive_link
Updated file: https://drive.google.com/file/d/1dVEpn6ccoz5uKAgzSrahtDp1vyifJDIh/view?usp=drive_link
Checklists
Pre-Review Checklist
config.yml.example
)v7.13.2
,v7.14.0
,v8.0.0
)