-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add promql-to-scrape #74
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
FROM golang:1.21-alpine | ||
|
||
WORKDIR /usr/src/app | ||
|
||
COPY go.mod go.sum ./ | ||
RUN go mod download && go mod verify | ||
|
||
COPY . . | ||
RUN go build -v -o /usr/local/bin/promql-to-scrape ./cmd/promql-to-scrape/main.go | ||
|
||
ENTRYPOINT ["/usr/local/bin/promql-to-scrape"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# promql-to-scrape | ||
|
||
This basic application is meant to provide an example for how one could use the Temporal Cloud Observability endpoint to expose a typical Prometheus `/metrics` endpoint. | ||
|
||
**This example is provided as-is, without support. It is intended as reference material only.** | ||
|
||
## How to Use | ||
|
||
Grab your client cert and key and place them at `client.crt`, `tls.key`, and your Temporal Cloud account number that has the observability endpoint enabled. | ||
|
||
``` | ||
go mod tidy | ||
go build -o promql-to-scrape cmd/promql-to-scrape/main.go | ||
./promql-to-scrape -client-cert client.crt -client- | ||
key tls.key -prom-endpoint https://<account>.tmprl.cloud/prometheus --config-file examples/config.yaml --debug | ||
~~~ | ||
time=2023-11-16T17:43:20.260-06:00 level=DEBUG msg="successful metric retrieval" time=3.529039083s | ||
``` | ||
|
||
This means you can now hit http://localhost:9001/metrics on your machine and see your metrics. | ||
|
||
### Important Usability Information | ||
|
||
**Important:** When you go to scrape this, you should do so with a **60s** scrape interval, unless you are meaningfully modifying this code. The example queries all assume a 1 minute rate and you'll want these to be equal. | ||
|
||
**Very Important:** The data you will see here is approximately 1 minute delayed (should you conform to the guidance above). Due to the aggregation that happens before metrics are presented to you, it's necessary for us to send the queries from this application to look 60 seconds in the past. Otherwise data aggregation would not be complete, and there would be no results for each query. | ||
|
||
## Deployment | ||
|
||
Some example Kubernetes manifests are provided in the `/examples` directory. Filling in your certificates and account should get you going pretty quickly. | ||
|
||
## Generating Config | ||
|
||
There is a second binary you can build that can help you build a default configuration of queries to scrape and export. | ||
|
||
``` | ||
go build -o genconfig cmd/genconfig/main.go | ||
./genconfig -client-cert client.crt -client-key tls.key -prom-endpoint https://<account>.tmprl.cloud/prometheus | ||
... | ||
``` | ||
|
||
This will generate an example config at `config.yaml` that you may use. It looks for all the existing metrics and generates a reasonable query for you to export. | ||
- For counters, a `rate(counter[1m])` | ||
- For gauges, it simply queries for `gauge` | ||
- For histograms, it does a p99 aggregated by `temporal_namespace` and `operation`. `histogram_quantile(0.99, sum(rate(metric[1m])) by (le, operation, temporal_namespace)` | ||
|
||
Modify at your own risk. You may find you'd like to add a global latency across all namespaces for instance. You can add those queries to your config file. |
84 changes: 84 additions & 0 deletions
84
cloud/observability/promql-to-scrape/cmd/genconfig/main.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
package main | ||
|
||
import ( | ||
"flag" | ||
"fmt" | ||
"log" | ||
"os" | ||
"sort" | ||
|
||
"github.com/temporalio/samples-server/cloud/observability/promql-to-scrape/internal" | ||
|
||
"gopkg.in/yaml.v3" | ||
) | ||
|
||
func main() { | ||
set := flag.NewFlagSet("app", flag.ExitOnError) | ||
promURL := set.String("prom-endpoint", "", "Required Prometheus API endpoint for the server eg. https://<account>.tmprl.cloud/prometheus") | ||
serverRootCACert := set.String("server-root-ca-cert", "", "Optional path to root server CA cert") | ||
clientCert := set.String("client-cert", "", "Required path to client cert") | ||
clientKey := set.String("client-key", "", "Required path to client key") | ||
serverName := set.String("server-name", "", "Optional server name to use for verifying the server's certificate") | ||
insecureSkipVerify := set.Bool("insecure-skip-verify", false, "Skip verification of the server's certificate and host name") | ||
|
||
if err := set.Parse(os.Args[1:]); err != nil { | ||
log.Fatalf("failed parsing args: %s", err) | ||
} else if *clientCert == "" || *clientKey == "" { | ||
log.Fatalf("-client-cert and -client-key are required") | ||
} | ||
|
||
client, err := internal.NewAPIClient( | ||
internal.APIConfig{ | ||
TargetHost: *promURL, | ||
ServerRootCACert: *serverRootCACert, | ||
ClientCert: *clientCert, | ||
ClientKey: *clientKey, | ||
ServerName: *serverName, | ||
InsecureSkipVerify: *insecureSkipVerify, | ||
}, | ||
) | ||
if err != nil { | ||
log.Fatalf("Failed to create Prometheus client: %s", err) | ||
} | ||
|
||
counters, gauges, histograms, err := client.ListMetrics("temporal_cloud_v0") | ||
if err != nil { | ||
log.Fatalf("Failed to pull metric names: %s", err) | ||
} | ||
fmt.Println(counters) | ||
fmt.Println(gauges) | ||
fmt.Println(histograms) | ||
|
||
conf := internal.Config{} | ||
|
||
for _, counter := range counters { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: fmt.Sprintf("%s:rate1m", counter), | ||
Query: fmt.Sprintf("rate(%s[1m])", counter), | ||
}) | ||
} | ||
for _, gauge := range gauges { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: gauge, | ||
Query: gauge, | ||
}) | ||
} | ||
for _, histogram := range histograms { | ||
conf.Metrics = append(conf.Metrics, internal.Metric{ | ||
MetricName: fmt.Sprintf("%s:histogram_quantile_p99_1m", histogram), | ||
Query: fmt.Sprintf("histogram_quantile(0.99, sum(rate(%s[1m])) by (le, operation, temporal_namespace))", histogram), | ||
}) | ||
} | ||
|
||
sort.Sort(internal.ByMetricName(conf.Metrics)) | ||
|
||
yamlData, err := yaml.Marshal(&conf) | ||
if err != nil { | ||
log.Fatalf("error marshalling yaml: %v", err) | ||
} | ||
|
||
err = os.WriteFile("config.yaml", yamlData, 0644) | ||
if err != nil { | ||
log.Fatalf("error: %v", err) | ||
} | ||
} |
59 changes: 59 additions & 0 deletions
59
cloud/observability/promql-to-scrape/cmd/promql-to-scrape/main.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
package main | ||
|
||
import ( | ||
"flag" | ||
"log" | ||
"os" | ||
|
||
"github.com/temporalio/samples-server/cloud/observability/promql-to-scrape/internal" | ||
|
||
"golang.org/x/exp/slog" | ||
) | ||
|
||
func main() { | ||
set := flag.NewFlagSet("promql-to-scrape", flag.ExitOnError) | ||
promURL := set.String("prom-endpoint", "", "Required Prometheus API endpoint for the server eg. https://<account>.tmprl.cloud/prometheus") | ||
configFile := set.String("config-file", "", "Config file for promql-to-scrape") | ||
serverRootCACert := set.String("server-root-ca-cert", "", "Optional path to root server CA cert") | ||
clientCert := set.String("client-cert", "", "Required path to client cert") | ||
clientKey := set.String("client-key", "", "Required path to client key") | ||
serverName := set.String("server-name", "", "Optional server name to use for verifying the server's certificate") | ||
insecureSkipVerify := set.Bool("insecure-skip-verify", false, "Skip verification of the server's certificate and host name") | ||
serverAddr := set.String("bind", "0.0.0.0:9001", "address:port to expose the metrics server on") | ||
debugLogging := set.Bool("debug", false, "Toggle debug logging") | ||
|
||
if err := set.Parse(os.Args[1:]); err != nil { | ||
log.Fatalf("failed parsing args: %v", err) | ||
} else if *clientCert == "" || *clientKey == "" || *configFile == "" || *promURL == "" { | ||
log.Fatalf("-client-cert, -client-key, -config-file, -prom-endpoint are required") | ||
} | ||
|
||
logLevel := slog.LevelInfo | ||
if *debugLogging { | ||
logLevel = slog.LevelDebug | ||
} | ||
h := slog.NewTextHandler(os.Stderr, &slog.HandlerOptions{Level: logLevel}) | ||
slog.SetDefault(slog.New(h)) | ||
|
||
client, err := internal.NewAPIClient( | ||
internal.APIConfig{ | ||
TargetHost: *promURL, | ||
ServerRootCACert: *serverRootCACert, | ||
ClientCert: *clientCert, | ||
ClientKey: *clientKey, | ||
ServerName: *serverName, | ||
InsecureSkipVerify: *insecureSkipVerify, | ||
}, | ||
) | ||
if err != nil { | ||
log.Fatalf("failed to create Prometheus client: %v", err) | ||
} | ||
|
||
conf, err := internal.LoadConfig(*configFile) | ||
if err != nil { | ||
log.Fatalf("failed to load config file: %v", err) | ||
} | ||
|
||
s := internal.NewPromToScrapeServer(client, conf, *serverAddr) | ||
s.Start() | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
metrics: | ||
- metric_name: temporal_cloud_v0_frontend_service_error_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_frontend_service_pending_requests | ||
query: temporal_cloud_v0_frontend_service_pending_requests | ||
- metric_name: temporal_cloud_v0_frontend_service_request_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_request_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_sync_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_sync_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_timeout_count[1m]) | ||
- metric_name: temporal_cloud_v0_resource_exhausted_error_count:rate1m | ||
query: rate(temporal_cloud_v0_resource_exhausted_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_action_success_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_action_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_buffer_overruns_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_buffer_overruns_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_missed_catchup_window_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_missed_catchup_window_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_bucket:histogram_quantile_p99_1m | ||
query: histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[1m])) by (le, operation, temporal_namespace)) | ||
- metric_name: temporal_cloud_v0_service_latency_count:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_sum:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_sum[1m]) | ||
- metric_name: temporal_cloud_v0_state_transition_count:rate1m | ||
query: rate(temporal_cloud_v0_state_transition_count[1m]) | ||
- metric_name: temporal_cloud_v0_total_action_count:rate1m | ||
query: rate(temporal_cloud_v0_total_action_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_cancel_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_cancel_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_continued_as_new_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_continued_as_new_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_failed_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_failed_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_success_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_terminate_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_terminate_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_timeout_count[1m]) |
49 changes: 49 additions & 0 deletions
49
cloud/observability/promql-to-scrape/examples/configmap.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: promql-to-scrape-config | ||
data: | ||
config.yaml: | | ||
metrics: | ||
- metric_name: temporal_cloud_v0_frontend_service_error_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_frontend_service_pending_requests | ||
query: temporal_cloud_v0_frontend_service_pending_requests | ||
- metric_name: temporal_cloud_v0_frontend_service_request_count:rate1m | ||
query: rate(temporal_cloud_v0_frontend_service_request_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_success_sync_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_success_sync_count[1m]) | ||
- metric_name: temporal_cloud_v0_poll_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_poll_timeout_count[1m]) | ||
- metric_name: temporal_cloud_v0_resource_exhausted_error_count:rate1m | ||
query: rate(temporal_cloud_v0_resource_exhausted_error_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_action_success_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_action_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_buffer_overruns_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_buffer_overruns_count[1m]) | ||
- metric_name: temporal_cloud_v0_schedule_missed_catchup_window_count:rate1m | ||
query: rate(temporal_cloud_v0_schedule_missed_catchup_window_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_bucket:histogram_quantile_p99_1m | ||
query: histogram_quantile(0.99, sum(rate(temporal_cloud_v0_service_latency_bucket[1m])) by (le, operation, temporal_namespace)) | ||
- metric_name: temporal_cloud_v0_service_latency_count:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_count[1m]) | ||
- metric_name: temporal_cloud_v0_service_latency_sum:rate1m | ||
query: rate(temporal_cloud_v0_service_latency_sum[1m]) | ||
- metric_name: temporal_cloud_v0_state_transition_count:rate1m | ||
query: rate(temporal_cloud_v0_state_transition_count[1m]) | ||
- metric_name: temporal_cloud_v0_total_action_count:rate1m | ||
query: rate(temporal_cloud_v0_total_action_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_cancel_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_cancel_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_continued_as_new_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_continued_as_new_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_failed_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_failed_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_success_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_success_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_terminate_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_terminate_count[1m]) | ||
- metric_name: temporal_cloud_v0_workflow_timeout_count:rate1m | ||
query: rate(temporal_cloud_v0_workflow_timeout_count[1m]) |
47 changes: 47 additions & 0 deletions
47
cloud/observability/promql-to-scrape/examples/deployment.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: promql-to-scrape | ||
labels: | ||
app: promql-to-scrape | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: promql-to-scrape | ||
template: | ||
metadata: | ||
labels: | ||
app: promql-to-scrape | ||
spec: | ||
containers: | ||
- name: promql-to-scrape | ||
image: ghcr.io/temporalio/promql-to-scrape:7c0e91a | ||
args: | ||
- --client-cert=/var/run/secrets/ca_crt | ||
- --client-key=/var/run/secrets/ca_key | ||
- --prom-endpoint=https://<account>.tmprl.cloud/prometheus | ||
- --config-file=/etc/promql-to-scrape/config.yaml | ||
- --debug | ||
ports: | ||
- containerPort: 9001 | ||
volumeMounts: | ||
- name: secrets | ||
mountPath: /var/run/secrets | ||
readOnly: true | ||
- name: config-volume | ||
mountPath: /etc/promql-to-scrape | ||
resources: | ||
limits: | ||
cpu: "100m" | ||
memory: "256Mi" | ||
volumes: | ||
- name: secrets | ||
secret: | ||
secretName: promql-to-scrape-secrets | ||
- name: config-volume | ||
configMap: | ||
name: promql-to-scrape-config | ||
items: | ||
- key: config.yaml | ||
path: config.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
apiVersion: v1 | ||
kind: Secret | ||
type: Opaque | ||
metadata: | ||
name: promql-to-scrape-secrets | ||
labels: | ||
app: promql-to-scrape | ||
data: | ||
ca_crt: "<cert | base64>" | ||
ca_key: "<key | base64>" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
module github.com/temporalio/samples-server/cloud/observability/promql-to-scrape | ||
|
||
go 1.21 | ||
|
||
require ( | ||
github.com/prometheus/client_golang v1.17.0 | ||
github.com/prometheus/common v0.45.0 | ||
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa | ||
gopkg.in/yaml.v3 v3.0.1 | ||
) | ||
|
||
require ( | ||
github.com/json-iterator/go v1.1.12 // indirect | ||
github.com/kr/text v0.2.0 // indirect | ||
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect | ||
github.com/modern-go/reflect2 v1.0.2 // indirect | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Could share the duplicate args between commands.