Kubernetes

opsverse's kubernetes agent https //docs opsverse io/install the opsverse agent#veqxm is used to collect logs and several out of the box metrics from your kubernetes cluster the agent also enables the collection of apm traces by default follow the instructions documented here https //docs opsverse io/install the opsverse agent#veqxm to run the agent on a kubernetes cluster however, there may be additional tweaks you want to make for your specific environment this page will show common config changes you can make to the agent's values yaml and re run the agent based on these instructions https //docs opsverse io/install the opsverse agent#veqxm metrics faqs how can i fine tune my metrics configuration? moving the metrics scrape config to the property victoria metrics agent config scrape configs in your values yaml file will enable you to customize it the default metrics scrape config is victoria metrics agent config scrape configs \ job name 'kubernetes apiservers' kubernetes sd configs \ role endpoints scheme https tls config ca file /var/run/secrets/kubernetes io/serviceaccount/ca crt insecure skip verify true bearer token file /var/run/secrets/kubernetes io/serviceaccount/token \# keep only the default/kubernetes service endpoints for the https port this \# will add targets for each api server which kubernetes adds an endpoint to \# the default/kubernetes service relabel configs \ source labels \[ meta kubernetes namespace, meta kubernetes service name, meta kubernetes endpoint port name] action keep regex default;kubernetes;https \ job name 'kubernetes nodes' \# default to scraping over https if required, just disable this or change to \# `http` scheme https \# this tls & bearer token file config is used to connect to the actual scrape \# endpoints for cluster components this is separate to discovery auth \# configuration because discovery & scraping are two separate concerns in \# prometheus the discovery auth config is automatic if prometheus runs inside \# the cluster otherwise, more config options have to be provided within the \# \<kubernetes sd config> tls config ca file /var/run/secrets/kubernetes io/serviceaccount/ca crt \# if your node certificates are self signed or use a different ca to the \# master ca, then disable certificate verification below note that \# certificate verification is an integral part of a secure infrastructure \# so this should only be disabled in a controlled environment you can \# disable certificate verification by uncommenting the line below \# insecure skip verify true bearer token file /var/run/secrets/kubernetes io/serviceaccount/token kubernetes sd configs \ role node relabel configs \ action labelmap regex meta kubernetes node label ( +) \ target label address replacement kubernetes default svc 443 \ source labels \[ meta kubernetes node name] regex ( +) target label metrics path replacement /api/v1/nodes/$1/proxy/metrics metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \ job name 'kubernetes nodes cadvisor' \# default to scraping over https if required, just disable this or change to \# `http` scheme https \# this tls & bearer token file config is used to connect to the actual scrape \# endpoints for cluster components this is separate to discovery auth \# configuration because discovery & scraping are two separate concerns in \# prometheus the discovery auth config is automatic if prometheus runs inside \# the cluster otherwise, more config options have to be provided within the \# \<kubernetes sd config> tls config ca file /var/run/secrets/kubernetes io/serviceaccount/ca crt \# if your node certificates are self signed or use a different ca to the \# master ca, then disable certificate verification below note that \# certificate verification is an integral part of a secure infrastructure \# so this should only be disabled in a controlled environment you can \# disable certificate verification by uncommenting the line below \# insecure skip verify true bearer token file /var/run/secrets/kubernetes io/serviceaccount/token kubernetes sd configs \ role node \# this configuration will work only on kubelet 1 7 3+ \# as the scrape endpoints for cadvisor have changed \# if you are using older version you need to change the replacement to \# replacement /api/v1/nodes/$1 4194/proxy/metrics \# more info here https //github com/coreos/prometheus operator/issues/633 relabel configs \ action labelmap regex meta kubernetes node label ( +) \ target label address replacement kubernetes default svc 443 \ source labels \[ meta kubernetes node name] regex ( +) target label metrics path replacement /api/v1/nodes/$1/proxy/metrics/cadvisor metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \# scrape config for service endpoints \# \# the relabeling allows the actual service scrape endpoint to be configured \# via the following annotations \# \# `prometheus io/scrape` only scrape services that have a value of `true` \# `prometheus io/scheme` if the metrics endpoint is secured then you will need \# to set this to `https` & most likely set the `tls config` of the scrape config \# `prometheus io/path` if the metrics path is not `/metrics` override this \# `prometheus io/port` if the metrics are exposed on a different port to the \# service then set this appropriately \ job name 'kubernetes service endpoints' kubernetes sd configs \ role endpoints relabel configs \ source labels \[ meta kubernetes service annotation prometheus io scrape] action keep regex true \ source labels \[ meta kubernetes service annotation prometheus io scheme] action replace target label scheme regex (https?) \ source labels \[ meta kubernetes service annotation prometheus io path] action replace target label metrics path regex ( +) \ source labels \[ address , meta kubernetes service annotation prometheus io port] action replace target label address regex (\[^ ]+)(? \d+)?;(\d+) replacement $1 $2 \ action labelmap regex meta kubernetes service label ( +) \ source labels \[ meta kubernetes namespace] action replace target label kubernetes namespace \ source labels \[ meta kubernetes service name] action replace target label kubernetes name \ source labels \[ meta kubernetes pod node name] action replace target label kubernetes node metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \# scrape config for slow service endpoints; same as above, but with a larger \# timeout and a larger interval \# \# the relabeling allows the actual service scrape endpoint to be configured \# via the following annotations \# \# `prometheus io/scrape slow` only scrape services that have a value of `true` \# `prometheus io/scheme` if the metrics endpoint is secured then you will need \# to set this to `https` & most likely set the `tls config` of the scrape config \# `prometheus io/path` if the metrics path is not `/metrics` override this \# `prometheus io/port` if the metrics are exposed on a different port to the \# service then set this appropriately \ job name 'kubernetes service endpoints slow' kubernetes sd configs \ role endpoints relabel configs \ source labels \[ meta kubernetes service annotation prometheus io scrape slow] action keep regex true \ source labels \[ meta kubernetes service annotation prometheus io scheme] action replace target label scheme regex (https?) \ source labels \[ meta kubernetes service annotation prometheus io path] action replace target label metrics path regex ( +) \ source labels \[ address , meta kubernetes service annotation prometheus io port] action replace target label address regex (\[^ ]+)(? \d+)?;(\d+) replacement $1 $2 \ action labelmap regex meta kubernetes service label ( +) \ source labels \[ meta kubernetes namespace] action replace target label kubernetes namespace \ source labels \[ meta kubernetes service name] action replace target label kubernetes name \ source labels \[ meta kubernetes pod node name] action replace target label kubernetes node metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \ job name 'prometheus pushgateway' honor labels true kubernetes sd configs \ role service relabel configs \ source labels \[ meta kubernetes service annotation prometheus io probe] action keep regex pushgateway \# example scrape config for probing services via the blackbox exporter \# \# the relabeling allows the actual service scrape endpoint to be configured \# via the following annotations \# \# `prometheus io/probe` only probe services that have a value of `true` \ job name 'kubernetes services' metrics path /probe params module \[http 2xx] kubernetes sd configs \ role service relabel configs \ source labels \[ meta kubernetes service annotation prometheus io probe] action keep regex true \ source labels \[ address ] target label param target \ target label address replacement blackbox \ source labels \[ param target] target label instance \ action labelmap regex meta kubernetes service label ( +) \ source labels \[ meta kubernetes namespace] target label kubernetes namespace \ source labels \[ meta kubernetes service name] target label kubernetes name metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \# example scrape config for pods \# \# the relabeling allows the actual pod scrape endpoint to be configured via the \# following annotations \# \# `prometheus io/scrape` only scrape pods that have a value of `true` \# `prometheus io/path` if the metrics path is not `/metrics` override this \# `prometheus io/port` scrape the pod on the indicated port instead of the default of `9102` \ job name 'kubernetes nodes' kubernetes sd configs \ role pod relabel configs \ source labels \[ meta kubernetes pod annotation prometheus io scrape] action keep regex true \ source labels \[ meta kubernetes pod annotation prometheus io path] action replace target label metrics path regex ( +) \ source labels \[ address , meta kubernetes pod annotation prometheus io port] action replace regex (\[^ ]+)(? \d+)?;(\d+) replacement $1 $2 target label address \ action labelmap regex meta kubernetes pod label ( +) \ source labels \[ meta kubernetes namespace] action replace target label kubernetes namespace \ source labels \[ meta kubernetes pod name] action replace target label kubernetes pod name \ source labels \[ meta kubernetes pod phase] regex pending|succeeded|failed action drop metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop \# example scrape config for pods which should be scraped slower an useful example \# would be stackriver exporter which queries an api on every scrape of the pod \# \# the relabeling allows the actual pod scrape endpoint to be configured via the \# following annotations \# \# `prometheus io/scrape slow` only scrape pods that have a value of `true` \# `prometheus io/path` if the metrics path is not `/metrics` override this \# `prometheus io/port` scrape the pod on the indicated port instead of the default of `9102` \ job name 'kubernetes pods slow' kubernetes sd configs \ role pod relabel configs \ source labels \[ meta kubernetes pod annotation prometheus io scrape slow] action keep regex true \ source labels \[ meta kubernetes pod annotation prometheus io path] action replace target label metrics path regex ( +) \ source labels \[ address , meta kubernetes pod annotation prometheus io port] action replace regex (\[^ ]+)(? \d+)?;(\d+) replacement $1 $2 target label address \ action labelmap regex meta kubernetes pod label ( +) \ source labels \[ meta kubernetes namespace] action replace target label kubernetes namespace \ source labels \[ meta kubernetes pod name] action replace target label kubernetes pod name \ source labels \[ meta kubernetes pod phase] regex pending|succeeded|failed action drop metric relabel configs \ source labels \[ name ] regex 'go |coredns ' action drop \ regex 'id|helm sh chart|app kubernetes io managed by|controller revision hash|pod template generation' action labeldrop how can i ignore an entire namespace? follow the instructions in the previous faq to setup your scrape configs for customization you can ignore specific namespaces by adding the following snippet to the relabel configs section of the relevant scrape jobs \ action drop regex \<namespace to be dropped> source labels \ meta kubernetes namespace the following example snippet drops all metrics from the namespaces default and test \ action drop regex default|test source labels \ meta kubernetes namespace this strategy can be used with any of the other labels attached to your metrics how can i collect metrics from just the specified namespaces? follow the instructions in the previous faq to setup your scrape configs for customization you can mention specific namespaces by adding the following snippet to the relabel configs section of the relevant scrape jobs \ action keep regex \<namespace to be included> source labels \ meta kubernetes namespace the following example snippet includes just default and prod namespaces and drops metrics from all other namespaces \ action keep regex default|prod source labels \ meta kubernetes namespace this strategy can be used with any of the other labels attached to your metrics how can i prevent a specific service/pod from getting scraped? by default, opsverse agent scrapes all services/pods which have the prometheus io/scrape true annotation if you would like to prevent scraping a service/pod with this annotation you should remove this annotation from the service/pod alternatively, you can add an additional annotation opsverse io/scrape ignore "true" to the service/pod and that will prevent opsverse agent from scraping the service/pod how can i change the scrape interval for metrics? follow the instructions in the previous faq to setup your scrape configs for customization you can change the scrape interval for metrics by adding the following snippet to the victoria metrics agent config property of your values yaml file victoria metrics agent config global scrape interval \<scrape interval> scrape interval can take values like 1s , 1m , 1h for seconds, minutes, and hours respectively logs faqs how can i fine tune my logs configuration? moving the logs scrape config to the property daemonset config in your values yaml file will enable you to customize it the default logs scrape config is integrations agent enabled true scrape interval 15s relabel configs \ source labels \[agent hostname] action replace target label node node exporter enabled true scrape interval 15s relabel configs \ source labels \[agent hostname] action replace target label node prometheus remote write \ url http //devopsnow vmagent 8429/api/v1/write loki positions directory /tmp/positions yaml configs \ name default clients \ url https //{{ values logs host }}/loki/api/v1/push basic auth username {{ values logs username }} password {{ values logs password }} scrape configs \ job name kubernetes pods name kubernetes sd configs \ role pod pipeline stages \ cri {} \ docker {} {{ if values logs multiline snippet }} {{ values logs multiline snippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages dropsnippet }} {{ values logs pipelinestages dropsnippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages maskprivateinfosnippet }} {{ values logs pipelinestages maskprivateinfosnippet | nindent 10 }} {{ end }} relabel configs \ source labels \ meta kubernetes pod label name target label service \ action replace replacement {{ values cluster name }} target label cluster \ source labels \ meta kubernetes pod node name target label host \ action drop regex "" source labels \ service \ action labelmap regex meta kubernetes pod label ( +) \ action replace replacement $1 separator / source labels \ meta kubernetes namespace \ service target label job \ action replace source labels \ meta kubernetes namespace target label namespace \ action replace source labels \ meta kubernetes pod name target label pod \ action replace source labels \ meta kubernetes pod container name target label container \ replacement /var/log/pods/ $1/ log separator / source labels \ meta kubernetes pod uid \ meta kubernetes pod container name target label path \ job name kubernetes pods app kubernetes sd configs \ role pod pipeline stages \ cri {} \ docker {} {{ if values logs multiline snippet }} {{ values logs multiline snippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages dropsnippet }} {{ values logs pipelinestages dropsnippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages maskprivateinfosnippet }} {{ values logs pipelinestages maskprivateinfosnippet | nindent 10 }} {{ end }} relabel configs \ action replace replacement {{ values cluster name }} target label cluster \ action drop regex + source labels \ meta kubernetes pod label name \ source labels \ meta kubernetes pod label app target label service \ source labels \ meta kubernetes pod node name target label host \ action drop regex "" source labels \ service \ action labelmap regex meta kubernetes pod label ( +) \ action replace replacement $1 separator / source labels \ meta kubernetes namespace \ service target label job \ action replace source labels \ meta kubernetes namespace target label namespace \ action replace source labels \ meta kubernetes pod name target label pod \ action replace source labels \ meta kubernetes pod container name target label container \ replacement /var/log/pods/ $1/ log separator / source labels \ meta kubernetes pod uid \ meta kubernetes pod container name target label path \ job name kubernetes pods direct controllers kubernetes sd configs \ role pod pipeline stages \ cri {} \ docker {} {{ if values logs multiline snippet }} {{ values logs multiline snippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages dropsnippet }} {{ values logs pipelinestages dropsnippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages maskprivateinfosnippet }} {{ values logs pipelinestages maskprivateinfosnippet | nindent 10 }} {{ end }} relabel configs \ action replace replacement {{ values cluster name }} target label cluster \ action drop regex + separator "" source labels \ meta kubernetes pod label name \ meta kubernetes pod label app \ action drop regex '\[0 9a z ]+ \[0 9a f]{8,10}' source labels \ meta kubernetes pod controller name \ source labels \ meta kubernetes pod controller name target label service \ source labels \ meta kubernetes pod node name target label host \ action drop regex "" source labels \ service \ action labelmap regex meta kubernetes pod label ( +) \ action replace replacement $1 separator / source labels \ meta kubernetes namespace \ service target label job \ action replace source labels \ meta kubernetes namespace target label namespace \ action replace source labels \ meta kubernetes pod name target label pod \ action replace source labels \ meta kubernetes pod container name target label container \ replacement /var/log/pods/ $1/ log separator / source labels \ meta kubernetes pod uid \ meta kubernetes pod container name target label path \ job name kubernetes pods indirect controller kubernetes sd configs \ role pod pipeline stages \ cri {} \ docker {} {{ if values logs multiline snippet }} {{ values logs multiline snippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages dropsnippet }} {{ values logs pipelinestages dropsnippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages maskprivateinfosnippet }} {{ values logs pipelinestages maskprivateinfosnippet | nindent 10 }} {{ end }} relabel configs \ action replace replacement {{ values cluster name }} target label cluster \ action drop regex + separator "" source labels \ meta kubernetes pod label name \ meta kubernetes pod label app \ action keep regex '\[0 9a z ]+ \[0 9a f]{8,10}' source labels \ meta kubernetes pod controller name \ action replace regex (\[0 9a z ]+) \[0 9a f]{8,10} source labels \ meta kubernetes pod controller name target label service \ source labels \ meta kubernetes pod node name target label host \ action drop regex "" source labels \ service \ action labelmap regex meta kubernetes pod label ( +) \ action replace replacement $1 separator / source labels \ meta kubernetes namespace \ service target label job \ action replace source labels \ meta kubernetes namespace target label namespace \ action replace source labels \ meta kubernetes pod name target label pod \ action replace source labels \ meta kubernetes pod container name target label container \ replacement /var/log/pods/ $1/ log separator / source labels \ meta kubernetes pod uid \ meta kubernetes pod container name target label path \ job name kubernetes pods static kubernetes sd configs \ role pod pipeline stages \ cri {} \ docker {} {{ if values logs multiline snippet }} {{ values logs multiline snippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages dropsnippet }} {{ values logs pipelinestages dropsnippet | nindent 10 }} {{ end }} {{ if values logs pipelinestages maskprivateinfosnippet }} {{ values logs pipelinestages maskprivateinfosnippet | nindent 10 }} {{ end }} relabel configs \ action replace replacement {{ values cluster name }} target label cluster \ action drop regex "" source labels \ meta kubernetes pod annotation kubernetes io config mirror \ action replace source labels \ meta kubernetes pod label component target label service \ source labels \ meta kubernetes pod node name target label host \ action drop regex "" source labels \ service \ action labelmap regex meta kubernetes pod label ( +) \ action replace replacement $1 separator / source labels \ meta kubernetes namespace \ service target label job \ action replace source labels \ meta kubernetes namespace target label namespace \ action replace source labels \ meta kubernetes pod name target label pod \ action replace source labels \ meta kubernetes pod container name target label container \ replacement /var/log/pods/ $1/ log separator / source labels \ meta kubernetes pod annotation kubernetes io config mirror \ meta kubernetes pod container name target label path how can i ignore an entire namespace? you can ignore specific namespaces by adding the following snippet to the pipelinestages inside logs in your agent values file logs pipelinestages dropsnippet | \ drop source "namespace" expression "\<regex of namespaces to be dropped>" the following example snippet drops all logs from the namespaces default and test logs pipelinestages dropsnippet | \ drop source "namespace" expression "default|test" this strategy can be used with any of the other labels attached to your logs how can i collect logs from just the specified namespaces? follow the instructions in the previous faq to set up your logs scraping configs for customization you can mention specific namespaces by adding the following snippet to the relabel configs section of the relevant scrape jobs \ action keep regex \<namespace to be included> source labels \ meta kubernetes namespace the following example snippet includes just default and prod namespaces and drops logs from all other namespaces \ action keep regex default|prod source labels \ meta kubernetes namespace this strategy can be used with any of the other labels attached to your logs how can i ignore all the log lines based on a regex? you can drop all the log lines based on regex by adding the following snippet to the pipelinestages inside logs in your agent values file logs pipelinestages dropsnippet | \ drop expression "regex" this strategy can be used with any of the other labels attached to your logs this can be used in multiple scenarios below are few examples on how the regex based log dropping can be used example #1 if there is a need to drop all the log lines that are of level debug , the following example snippet can be used to drop all the log lines of level debug logs pipelinestages dropsnippet | \ drop expression " debug " example #2 if there is a need to drop all the log lines of a specific api, the following example snippet can be used to drop all the log lines of the specific api logs pipelinestages dropsnippet | \ drop expression "\<api path regex>" what is the multi line snippet field in my values file? sometimes, applications may write a multiple line event into a log file we want these to be treated as a single log event, so this block identifies the timestamp as the first line of a multi line log event this should suffice for the majority of use cases, but if your organization uses a different convention, the regex can be updated in this block if you want better multi line support how can i hide sensitive information from my logs? sometimes you may come across sentive information like passwords, credit card numbers, etc visible in your logs you might want to hide these or replace them with some other text, characters or remove them altogether this can be done by adding the maskprivateinfosnippet snippet to the pipelinestages inside logs in your agent values file as shown below the below codeblock illustrates a few example usecases yaml # replace block 1 replaces the password string following the word password with " " \# replace block 2 to obfuscate sensitive data, you can combine the replace stage with the hash template method \# replace block 3 the given expression will remove the string following the numbers 11 11 11 11 logs pipelinestages maskprivateinfosnippet | \ replace expression "password (\\\s+)" # a regex re2 regular expression replace " " # value to which the captured group will be replaced \ replace \# creditcard expression '((? \d\[ ] ?){13,16})' replace ' creditcard {{ value | hash "salt" }} ' \ replace expression "11 11 11 11 (\\\s+\\\s)" replace "" to know more about replace and how to define/configure the block, refer this documentation how can i add new labels to logs based on the contents of the log lines? this can be achieved by using the regex pipeline stage at the agent here is an example of extracting the value of duration from a log line and adding it as an additional label \ match selector '{filename= " mongo log"}' stages \ regex expression '\s(?p\<duration>\d+)ms$' \ labels duration this section is defined under the pipeline stages sections of your log scrape configs traces faqs what if i want to change the default sampling rate of the opentelemetry collector? by default, we ingest 100% of traces exported to the opentelemetry collector while this provides the most complete picture of the system being monitored, it can lead to increased ingestion costs if you wish to sample a given percentage of traces instead, you can add an otelcollector tracesamplepercentage key to your values yaml when installing/updating the agent for example, otelcollector enabled true tracesamplepercentage 50 how can i use tail sampling with the opentelemetry collector? tail sampling is where the decision to sample a trace happens after all the spans in a request have been completed the tail sampling processor samples traces based on a set of defined policies while tail sampling provides you the option to filter your traces based on specific criteria of the system being monitored, it can lead to increased memory usage to implement tail sampling in the opentelemetry collector, add the following yaml snippet to the agent's values yaml under otelcollector config and run the agent based on these instructions https //docs opsverse io/install the opsverse agent#veqxm otelcollector enabled true tail sampling enabled true config decision wait 10s num traces 10000 expected new traces per sec 250 policies | \[ { name always sample policy, type always sample }, ] the config section is configurable adjust the values and policies as per your requirements the configurable config consists of decision wait the desired wait time from the arrival of the first span of trace until the decision about sampling it or not is evaluated num traces the number of traces kept in memory typically most of the data of a trace is released after a sampling decision is taken expected new traces per sec this sets the expected number of new traces sent to the tail sampling processor per second this helps with allocating data structures with closer to the actual usage size policies policies are used to make sampling decisions the default policy is set to always sample which samples all traces multiple policies can be configured you can refer to this documentation https //github com/open telemetry/opentelemetry collector contrib/blob/main/processor/tailsamplingprocessor/readme md for more information on tail sampling and to explore sample policies https //github com/open telemetry/opentelemetry collector contrib/blob/main/processor/tailsamplingprocessor/testdata/tail sampling config yaml

Organizing Observability Telemetry with Standard Metadata

Individual VMs