-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create Logs SDK LoggerProvider #1517
feat: Create Logs SDK LoggerProvider #1517
Conversation
rescue StandardError => e | ||
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR) | ||
Export::FAILURE | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not a rescue
in the TracerProvider#force_flush method, but the method documentation states "Export::FAILURE
should be returned if a non-specific failure occurred." I'm assuming a non-specific failure would be any StandardError
instance.
- Is there a condition to return
Export::FAILURE
on non-specific errors inTracerProvider#force_flush
that I'm missing? - Similar documentation exists for the
LoggerProvider
/TracerProvider#shutdown
methods. If we choose to add arescue
here, should the#shutdown
methods also have a similarrescue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be assuming that processors will handle their own errors, but that might be optimistic. I would be fine with rescuing from StandardError
in force_flush
and shutdown
across the board. I'll let others chime in if they have opinions.
# {LogRecordProcessor} to add to this LoggerProvider. | ||
def add_log_record_processor(log_record_processor) | ||
@mutex.synchronize do | ||
@log_record_processors = log_record_processors.dup.push(log_record_processor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from TracerProvider#add_span_processor
. @fallwith and I were chatting about this LOC. He was asking about the use of .dup.push
here.
Does dup.push
provide atomicity in a way that the push
method alone couldn't deliver? Or is there a different reason dup.push
was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain, but @fbogsany probably knows. I think it has to do with mutating the processors
array. Looking at TracerProvider
a span gets a reference to processors
on start. If a span processor is added after a span is started, and the processors
array is mutated, a span will see a processor that wasn't there when it was started. I'm not sure if this would happen in the real world, or if this is what we're protecting against, but that is a reason why we might prefer dup.push
over push
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly the reason - each SDK Span
instance has a reference to the list of span processors, and that list should not change between the span calling on_start
and on_finish
. IDK if that applies in your case - I need to read the rest of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth considering whether it should be valid to add log record processors after @stopped
. In the TracerProvider
, we warn and don't allow that.
# Copyright The OpenTelemetry Authors | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This matches the pattern in OpenTelemetry::SDK::Trace::Export
The ExportError class is not present because it was not needed for the LoggerProvider. It will be added once it's used in subsequent PRs.
@@ -0,0 +1,22 @@ | |||
# frozen_string_literal: true | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a boilerplate/placeholder so the LogRecordProcessor methods can be called. They will be implemented in a later PR.
@@ -0,0 +1,33 @@ | |||
# frozen_string_literal: true | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Logger SDK will be fully written in a separate PR. The code here represents what's necessary to fulfill the LoggerProvider SDK spec.
@@ -24,15 +24,18 @@ Gem::Specification.new do |spec| | |||
spec.require_paths = ['lib'] | |||
spec.required_ruby_version = '>= 3.0' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating dependencies to more recent versions. This came about when I realized the SimpleCov version that was installed didn't include a feature I wanted to use.
# Copyright The OpenTelemetry Authors | ||
# | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specs for SDK LoggerProvider and SDK TracerProvider are extremely similar.
Given this, the code is very similar to the already implemented TracerProvider in the SDK gem.
name ||= '' | ||
version ||= '' | ||
|
||
OpenTelemetry.logger.warn(EMPTY_NAME_ERROR) if name.empty? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where an invalid name (null or empty string) is specified, a working Logger MUST be returned as a fallback rather than returning null or throwing an exception, its name SHOULD keep the original invalid value, and a message reporting that the specified value is invalid SHOULD be logged.
source
This logic is drawn from TracerProvider#tracer. The null or empty string will not stop InstrumentationScope from being generated during Logger initialization.
OpenTelemetry.logger.warn(EMPTY_NAME_ERROR) if name.empty? | ||
|
||
@mutex.synchronize do | ||
OpenTelemetry::SDK::Logs::Logger.new(name, version, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input provided by the user MUST be used to create an InstrumentationScope instance which is stored on the created Logger.
The name and version here are passed to the Logger to create the InstrumentationScope.
# | ||
# @param [LogRecordProcessor] log_record_processor The | ||
# {LogRecordProcessor} to add to this LoggerProvider. | ||
def add_log_record_processor(log_record_processor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoggerProvider MAY provide methods to update the configuration. If configuration is updated (e.g., adding a LogRecordProcessor), the updated configuration MUST also apply to all already returned Loggers (i.e. it MUST NOT matter whether a Logger was obtained from the LoggerProvider before or after the configuration change). Note: Implementation-wise, this could mean that Logger instances have a reference to their LoggerProvider and access configuration only via this reference.
Every logger instance has a reference to the logger provider that created it. (see L50) This gives loggers access to the log record processors.
# @param [optional Numeric] timeout An optional timeout in seconds. | ||
# @return [Integer] Export::SUCCESS if no error occurred, Export::FAILURE if | ||
# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred. | ||
def shutdown(timeout: nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: https://opentelemetry.io/docs/specs/otel/logs/sdk/#shutdown
- Only called once by setting
@stopped
totrue
L92 - Provides a way to let the caller know whether it succeeded, failed, or timed out by returning
OpenTelemetry::SDK::Logs::EXPORT
constants (L81, L87, L93) - Implements a timeout through calls to
OpenTelemetry::Common::Utilities.timeout_timestamp
andOpenTelemetry::Common::Utilities.maybe_timeout
(L84, L86) - Invokes shutdown on all the processors (L85 - 89)
# Immediately export all {LogRecord}s that have not yet been exported | ||
# for all the registered {LogRecordProcessor}s. | ||
# | ||
# This method should only be called in cases where it is absolutely | ||
# necessary, such as when using some FaaS providers that may suspend | ||
# the process after an invocation, but before the {LogRecordProcessor} | ||
# exports the completed {LogRecord}s. | ||
# | ||
# @param [optional Numeric] timeout An optional timeout in seconds. | ||
# @return [Integer] Export::SUCCESS if no error occurred, Export::FAILURE if | ||
# a non-specific failure occurred, Export::TIMEOUT if a timeout occurred. | ||
def force_flush(timeout: nil) | ||
@mutex.synchronize do | ||
return Export::SUCCESS if @stopped | ||
|
||
start_time = OpenTelemetry::Common::Utilities.timeout_timestamp | ||
results = log_record_processors.map do |processor| | ||
remaining_timeout = OpenTelemetry::Common::Utilities.maybe_timeout(timeout, start_time) | ||
return Export::TIMEOUT if remaining_timeout&.zero? | ||
|
||
processor.force_flush(timeout: remaining_timeout) | ||
end | ||
|
||
results.max || Export::SUCCESS | ||
end | ||
rescue StandardError => e | ||
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR) | ||
Export::FAILURE | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec reference: https://opentelemetry.io/docs/specs/otel/logs/sdk/#forceflush
Out of time atm, but I'll circle back with more 1:1 details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I have a few minor comments.
# {LogRecordProcessor} to add to this LoggerProvider. | ||
def add_log_record_processor(log_record_processor) | ||
@mutex.synchronize do | ||
@log_record_processors = log_record_processors.dup.push(log_record_processor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uncertain, but @fbogsany probably knows. I think it has to do with mutating the processors
array. Looking at TracerProvider
a span gets a reference to processors
on start. If a span processor is added after a span is started, and the processors
array is mutated, a span will see a processor that wasn't there when it was started. I'm not sure if this would happen in the real world, or if this is what we're protecting against, but that is a reason why we might prefer dup.push
over push
.
rescue StandardError => e | ||
OpenTelemetry.handle_error(exception: e, message: FORCE_FLUSH_ERROR) | ||
Export::FAILURE | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be assuming that processors will handle their own errors, but that might be optimistic. I would be fine with rescuing from StandardError
in force_flush
and shutdown
across the board. I'll let others chime in if they have opinions.
28d4f3a
to
3c0dd57
Compare
@mwear - Re: this comment I've updated #shutdown to include error handling to match #force flush in: 3c0dd57 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @kaylareopelle!
* Add OpenTelemetry::SDK::Logs::Export constants * Create OpenTelemetry::SDK::Logs::LoggerProvider * Implement LoggerProvider#logger * Implement LoggerProvider#shutdown * Implement LoggerProvider#force_flush * Create no-op LogRecordProcessor
7f4b1f1
to
de7f36c
Compare
Thanks, @mwear! I've rebased the branch into one commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we gain a better understanding of duping the log processors...
module Logs | ||
# The SDK implementation of OpenTelemetry::Logs::LoggerProvider. | ||
class LoggerProvider < OpenTelemetry::Logs::LoggerProvider | ||
attr_reader :resource, :log_record_processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_record_processors
need safe concurrent access. By exposing this as a reader, external callers may gain access to the array in an unsafe way.
This attribute should not be exposed as a public value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also not exposed in the spec.
attr_reader :resource, :log_record_processors | |
attr_reader :resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes yet - I just want to block so I have a chance to read it closely.
module OpenTelemetry | ||
module SDK | ||
module Logs | ||
# The export module contains result codes for LoggerProvider#force_flush |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should also be result codes for exporters - their use shouldn't just relate to these two methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. Updated to be more general in: d63a4e9
# Called when a {LogRecord} is emitted. Subsequent calls are not | ||
# permitted after shutdown is called. | ||
# @param [LogRecord] log_record The emitted {LogRecord} | ||
# @param [Context] context The resolved Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "resolved Context" here? Who does the resolution? (Note that I haven't read this part of the spec or all of this PR, or the logs API, so maybe this is answered clearly somewhere else - just asking as a naive reviewer.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "resolved Context" here is coming from the Logs SDK spec description for the method arguments.
https://opentelemetry.io/docs/specs/otel/logs/sdk/#onemit
The description has a bit more detail to include that might clear things up. I've updated the description to include the rest of the text in: 5528277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure who does the resolution. I'm also not quite sure how we'll use this argument.
I'll consult some of the implementations in other languages and come back with more info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fbogsany - I can't find any examples where the context
argument is used in other languages.
- Python doesn't include it anywhere.
- PHP and Java have it as an argument but do not use it in the
onEmit
functions. - Javascript doesn't include it as an argument in their Simple or Batch processors, but the duck type includes it as an argument.
So I don't have a great answer about who does the resolution. I didn't end up needing this argument in the Simple or Batch processors because I used the SpanContext to pull trace_id
, span_id
, and trace_flags
. Those are the only context-related attributes in the Logs Data Model.
Do you have a sense of what approach you'd prefer Ruby take?
module Logs | ||
# The SDK implementation of OpenTelemetry::Logs::Logger | ||
class Logger < OpenTelemetry::Logs::Logger | ||
attr_reader :instrumentation_scope, :logger_provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why we expose these attributes. In the Tracer
and Meter
, they're private, so the inconsistency is a little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attr_reader
call has been removed. I'm not sure why I initially broke from Tracer
and Meter
.
# @api private | ||
# | ||
# Returns a new {OpenTelemetry::SDK::Logs::Logger} instance. This should | ||
# not be called directly. New loggers should be created using | ||
# {LoggerProvider#logger}. | ||
# | ||
# @param [String] name Instrumentation package name | ||
# @param [String] version Instrumentation package version | ||
# @param [LoggerProvider] logger_provider The {LoggerProvider} that | ||
# initialized the logger | ||
# | ||
# @return [OpenTelemetry::SDK::Logs::Logger] | ||
def initialize(name, version, logger_provider) | ||
@instrumentation_scope = InstrumentationScope.new(name, version) | ||
@logger_provider = logger_provider | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @api private
comment is our standard way of marking "internal" methods. There is no practical way to enforce it.
# @return [OpenTelemetry::SDK::Logs::LoggerProvider] | ||
def initialize( | ||
resource: OpenTelemetry::SDK::Resources::Resource.create, | ||
log_record_processors: [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing the log_record_processors
at creation time is MAY in the spec. We don't do that in the MeterProvider
(for metric_readers
) or the TracerProvider
(for span_processors
). Instead, we rely on the add_...
methods to add them individually. I would start with not exposing it during creation. We can always add it later if that's a common pattern (and then consider adding to TracerProvider
and MeterProvider
as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to remove the arg from the initialize method. I'll double-check about its commonality in other languages. I've also removed the public reader. See: 37befe6
# @param [optional String] version Instrumentation package version | ||
# | ||
# @return [OpenTelemetry::SDK::Logs::Logger] | ||
def logger(name = nil, version = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This signature is inconsistent between TracerProvider
and MeterProvider
. In the latter, we used:
def meter(name, version: nil)
so name
is required and version
is a keyword arg. IIRC we're going to have to add one or two more args at some point 🔜 , so keywords for optional args are vastly preferred.
We should go back and fix this for TracerProvider
. I think that can be done in a backwards-compatible fashion.
For MeterProvider
, we opted to require name
. If a user explicitly passes nil
or ""
, we can handle it, but those are both considered invalid - I'd rather have an explicitly invalid required arg than an invalid default arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. I prefer the keyword args, I just defaulted to what was in TracerProvider since that was already stable.
Updated in: ca7c15f
# {LogRecordProcessor} to add to this LoggerProvider. | ||
def add_log_record_processor(log_record_processor) | ||
@mutex.synchronize do | ||
@log_record_processors = log_record_processors.dup.push(log_record_processor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth considering whether it should be valid to add log record processors after @stopped
. In the TracerProvider
, we warn and don't allow that.
rescue StandardError => e | ||
OpenTelemetry.handle_error(exception: e, message: UNEXPECTED_ERROR_MESSAGE % __method__) | ||
Export::FAILURE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the TracerProvider
and MeterProvider
we expect the processors/readers to not raise
from #shutdown
or #force_flush
and instead only return one of the 3 supported result codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Fixed in: a619d36
Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>
Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>
@fbogsany, checking back in here! Have your concerns been addressed? |
Fulfills the Logs SDK specs for Logger Provider
The Logger Provider implementation is drawn mainly from the Tracer Provider implementation, as they have very similar specs.
If merged, this PR would fulfill the following elements on the spec-compliance-matrix for Logs
Here are some other implementations of LoggerProvider for comparison:
Resolves #1483