Kubernetes
OpsVerse's Kubernetes agent is used to collect logs and several out-of-the box metrics from your Kubernetes cluster. The agent also enables the collection of APM traces by default. Follow the instructions documented here to run the agent on a Kubernetes cluster.
However, there may be additional tweaks you want to make for your specific environment. This page will show common config changes you can make to the agent's values.yaml and re-run the agent based on these instructions.
Moving the metrics scrape config to the property victoria-metrics-agent.config.scrape_configs in your values.yaml file will enable you to customize it. The default metrics scrape config is:
Follow the instructions in the previous FAQ to setup your scrape configs for customization. You can ignore specific namespaces by adding the following snippet to the relabel_configs section of the relevant scrape jobs:
The following example snippet drops all metrics from the namespaces default and test:
This strategy can be used with any of the other labels attached to your metrics.
Follow the instructions in the previous FAQ to setup your scrape configs for customization. You can mention specific namespaces by adding the following snippet to the relabel_configs section of the relevant scrape jobs:
The following example snippet includes just default and prod namespaces and drops metrics from all other namespaces.
This strategy can be used with any of the other labels attached to your metrics.
By default, OpsVerse Agent scrapes all services/pods which have the prometheus.io/scrape: true annotation. If you would like to prevent scraping a service/pod with this annotation you should remove this annotation from the service/pod.
Alternatively, you can add an additional annotation opsverse.io/scrape-ignore: "true" to the service/pod and that will prevent OpsVerse Agent from scraping the service/pod.
Follow the instructions in the previous FAQ to setup your scrape configs for customization. You can change the scrape interval for metrics by adding the following snippet to the victoria-metrics-agent.config property of your values.yaml file:
scrape_interval can take values like 1s, 1m, 1h for seconds, minutes, and hours respectively.
Moving the logs scrape config to the property daemonSet.config in your values.yaml file will enable you to customize it. The default logs scrape config is:
You can ignore specific namespaces by adding the following snippet to the pipelineStages inside logs in your agent values file :
The following example snippet drops all logs from the namespaces default and test:
This strategy can be used with any of the other labels attached to your logs.
Follow the instructions in the previous FAQ to set up your logs scraping configs for customization. You can mention specific namespaces by adding the following snippet to the relabel_configs section of the relevant scrape jobs:
The following example snippet includes just default and prod namespaces and drops logs from all other namespaces.
This strategy can be used with any of the other labels attached to your logs.
You can drop all the log lines based on regex by adding the following snippet to the pipelineStages inside logs in your agent values file :
This strategy can be used with any of the other labels attached to your logs.
This can be used in multiple scenarios. Below are few examples on how the regex-based log dropping can be used:
Example #1: If there is a need to drop all the log lines that are of level DEBUG, the following example snippet can be used to drop all the log lines of level DEBUG:
Example #2: If there is a need to drop all the log lines of a specific API, the following example snippet can be used to drop all the log lines of the specific API:
Sometimes, applications may write a multiple-line event into a log file. We want these to be treated as a single log event, so this block identifies the timestamp as the first line of a multi-line log event.
This should suffice for the majority of use cases, but if your organization uses a different convention, the regex can be updated in this block if you want better multi-line support.
Sometimes you may come across sentive information like passwords, credit card numbers, etc visible in your logs. You might want to hide these or replace them with some other text, characters or remove them altogether.
This can be done by adding the maskPrivateInfoSnippet snippet to the pipelineStages inside logs in your agent values file as shown below. The below codeblock illustrates a few example usecases.
To know more about replace and how to define/configure the block, refer this documentation.
This can be achieved by using the regex pipeline stage at the agent. Here is an example of extracting the value of duration from a log line and adding it as an additional label:
This section is defined under the pipeline_stages sections of your log scrape configs.
By default, we ingest 100% of traces exported to the OpenTelemetry collector. While this provides the most complete picture of the system being monitored, it can lead to increased ingestion costs. If you wish to sample a given percentage of traces instead, you can add an otelcollector.traceSamplePercentage key to your values.yaml when installing/updating the agent.
For example,
Tail sampling is where the decision to sample a trace happens after all the spans in a request have been completed. The tail sampling processor samples traces based on a set of defined policies.
While tail sampling provides you the option to filter your traces based on specific criteria of the system being monitored, it can lead to increased memory usage.
To implement tail sampling in the OpenTelemetry collector, add the following YAML snippet to the agent's values.yaml under otelcollector config and run the agent based on these instructions.
The config section is configurable. Adjust the values and policies as per your requirements.
The configurable config consists of:
- decision_wait: The desired wait time from the arrival of the first span of trace until the decision about sampling it or not is evaluated.
- num_traces: The number of traces kept in memory. Typically most of the data of a trace is released after a sampling decision is taken.
- expected_new_traces_per_sec: This sets the expected number of new traces sent to the tail sampling processor per second. This helps with allocating data structures with closer to the actual usage size.
- policies: Policies are used to make sampling decisions. The default policy is set to always_sample which samples all traces. Multiple policies can be configured.
You can refer to this documentation for more information on tail sampling and to explore sample policies.