Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
aos
GitHub Repository: aos/grafana-agent
Path: blob/main/docs/sources/static/configuration/metrics-config.md
4096 views
---
title: metrics_config weight: 200 aliases: - ../../configuration/prometheus-config/ - ../../configuration/metrics-config/
---

metrics_config

The metrics_config block is used to define a collection of metrics instances. Each instance defines a collection of Prometheus-compatible scrape_configs and remote_write rules. Most users will only need to define one instance.

# Configures the optional scraping service to cluster agents. [scraping_service: <scraping_service_config>] # Configures the gRPC client used for agents to connect to other # clustered agents. [scraping_service_client: <scraping_service_client_config>] # Configure values for all Prometheus instances. [global: <global_config>] # Configure the directory used by instances to store their WAL. # # The Grafana Agent assumes that all folders within wal_directory are managed by # the agent itself. This means if you are using a PVC, you must point # wal_directory to a subdirectory of the PVC mount. [wal_directory: <string> | default = ""] # Configures how long ago an abandoned (not associated with an instance) WAL # may be written to before being eligible to be deleted [wal_cleanup_age: <duration> | default = "12h"] # Configures how often checks for abandoned WALs to be deleted are performed. # A value of 0 disables periodic cleanup of abandoned WALs [wal_cleanup_period: <duration> | default = "30m"] # Allows to disable HTTP Keep-Alives when scraping; the Agent will only use # outgoing each connection for a single request. [http_disable_keepalives: <boolean> | default = false] # Allows to configure the maximum amount of time an idle Keep-Alive connection # can remain idle before closing itself. Zero means no limit. # The setting is ignored when `http_disable_keepalives` is enabled. [http_idle_conn_timeout: <duration> | default = "5m"] # The list of Prometheus instances to launch with the agent. configs: [- <metrics_instance_config>] # If an instance crashes abnormally, how long should we wait before trying # to restart it. 0s disables the backoff period and restarts the agent # immediately. [instance_restart_backoff: <duration> | default = "5s"] # How to spawn instances based on instance configs. Supported values: shared, # distinct. [instance_mode: <string> | default = "shared"]

scraping_service_config

The scraping_service block configures the [scraping service]({{< relref "scraping-service/" >}}), an operational mode where configurations are stored centrally in a KV store and a cluster of agents distributes discovery and scrape load between nodes.

# Whether to enable scraping service mode. When enabled, local configs # cannot be used. [enabled: <boolean> | default = false] # Note these next 3 configuration options are confusing. Due to backwards compatibility the naming # is less than ideal. # How often should the agent manually refresh the configuration. Useful for if KV change # events are not sent by an agent. [reshard_interval: <duration> | default = "1m"] # The timeout for configuration refreshes. This can occur on cluster events or # on the reshard interval. A timeout of 0 indicates no timeout. [reshard_timeout: <duration> | default = "30s"] # The timeout for a cluster reshard events. A timeout of 0 indicates no timeout. [cluster_reshard_event_timeout: <duration> | default = "30s"] # Configuration for the KV store to store configurations. kvstore: <kvstore_config> # When set, allows configs pushed to the KV store to specify configuration # fields that can read secrets from files. # # This is disabled by default. When enabled, a malicious user can craft an # instance config that reads arbitrary files on the machine the Agent runs # on and sends its contents to a specically crafted remote_write endpoint. # # If enabled, ensure that no untrusted users have access to the Agent API. [dangerous_allow_reading_files: <boolean>] # Configuration for how agents will cluster together. lifecycler: <lifecycler_config>

kvstore_config

The kvstore_config block configures the KV store used as storage for configurations in the scraping service mode.

# Which underlying KV store to use. Can be either consul or etcd [store: <string> | default = ""] # Key prefix to store all configurations with. Must end in /. [prefix: <string> | default = "configurations/"] # Configuration for a Consul client. Only applies if store # is "consul" consul: # The hostname and port of Consul. [host: <string> | duration = "localhost:8500"] # The ACL Token used to interact with Consul. [acltoken: <string>] # The HTTP timeout when communicating with Consul [httpclienttimeout: <duration> | default = 20s] # Whether or not consistent reads to Consul are enabled. [consistentreads: <boolean> | default = true] # Configuration for an ETCD v3 client. Only applies if # store is "etcd" etcd: # The ETCD endpoints to connect to. endpoints: - <string> # The Dial timeout for the ETCD connection. [dial_tmeout: <duration> | default = 10s] # The maximum number of retries to do for failed ops to ETCD. [max_retries: <int> | default = 10]

lifecycler_config

The lifecycler_config block configures the lifecycler; the component that Agents use to cluster together.

# Configures the distributed hash ring storage. ring: # KV store for getting and sending distributed hash ring updates. kvstore: <kvstore_config> # Specifies when other agents in the clsuter should be considered # unhealthy if they haven't sent a heartbeat within this duration. [heartbeat_timeout: <duration> | default = "1m"] # Number of tokens to generate for the distributed hash ring. [num_tokens: <int> | default = 128] # How often agents should send a heartbeat to the distributed hash # ring. [heartbeat_period: <duration> | default = "5s"] # How long to wait for tokens from other agents after generating # a new set to resolve collisions. Useful only when using a gossip # KV store. [observe_period: <duration> | default = "0s"] # Period to wait before joining the ring. 0s means to join immediately. [join_after: <duration> | default = "0s"] # Minimum duration to wait before marking the agent as ready to receive # traffic. Used to work around race conditions for multiple agents exiting # the distributed hash ring at the same time. [min_ready_duration: <duration> | default = "1m"] # Network interfaces to resolve addresses defined by other agents # registered in distributed hash ring. [interface_names: <string array> | default = ["eth0", "en0"]] # Duration to sleep before exiting. Ensures that metrics get scraped # before the process quits. [final_sleep: <duration> | default = "30s"] # File path to store tokens. If empty, tokens will not be stored during # shutdown and will not be restored at startup. [tokens_file_path: <string> | default = ""] # Availability zone of the host the agent is running on. Default is an # empty string which disables zone awareness for writes. [availability_zone: <string> | default = ""]

scraping_service_client_config

The scraping_service_client_config block configures how clustered Agents will generate gRPC clients to connect to each other.

grpc_client_config: # Maximum size in bytes the gRPC client will accept from the connected server. [max_recv_msg_size: <int> | default = 104857600] # Maximum size in bytes the gRPC client will sent to the connected server. [max_send_msg_size: <int> | default = 16777216] # Whether messages should be gzipped. [use_gzip_compression: <boolean> | default = false] # The rate limit for gRPC clients; 0 means no rate limit. [rate_limit: <float64> | default = 0] # gRPC burst allowed for rate limits. [rate_limit_burst: <int> | default = 0] # Controls if when a rate limit is hit whether the client should # retry the request. [backoff_on_ratelimits: <boolean> | default = false] # Configures the retry backoff when backoff_on_ratelimits is # true. backoff_config: # The minimum delay when backing off. [min_period: <duration> | default = "100ms"] # The maximum delay when backing off. [max_period: <duration> | default = "10s"] # The number of times to backoff and retry before failing. [max_retries: <int> | default = 10]

global_config

The global_config block configures global values for all launched Prometheus instances.

# How frequently should Prometheus instances scrape. [scrape_interval: duration | default = "1m"] # How long to wait before timing out a scrape from a target. [scrape_timeout: duration | default = "10s"] # A list of static labels to add for all metrics. external_labels: { <string>: <string> } # Default set of remote_write endpoints. If an instance doesn't define any # remote_writes, it will use this list. remote_write: - [<remote_write>]

Note: For more information on remote_write, refer to the Prometheus documentation

metrics_instance_config

The metrics_instance_config block configures an individual metrics instance, which acts as its own mini Prometheus-compatible agent, though without support for the TSDB.

# Name of the instance. Must be present. Will be added as a label to agent # metrics. name: string # Whether this agent instance should only scrape from targets running on the # same machine as the agent process. [host_filter: <boolean> | default = false] # Relabel configs to apply against discovered targets. The relabeling is # temporary and just used for filtering targets. host_filter_relabel_configs: [ - <relabel_config> ... ] # How frequently the WAL truncation process should run. Every iteration of # the truncation will checkpoint old series and remove old samples. If data # has not been sent within this window, some of it may be lost. # # The size of the WAL will increase with less frequent truncations. Making # truncations more frequent reduces the size of the WAL but increases the # chances of data loss when remote_write is failing for longer than the # specified frequency. [wal_truncate_frequency: <duration> | default = "60m"] # The minimum amount of time that series and samples should exist in the WAL # before being considered for deletion. The consumed disk space of the WAL will # increase by making this value larger. # # Setting this value to 0s is valid, but may delete series before all # remote_write shards have been able to write all data, and may cause errors on # slower machines. [min_wal_time: <duration> | default = "5m"] # The maximum amount of time that series and samples may exist within the WAL # before being considered for deletion. Series that have not received writes # since this period will be removed, and all samples older than this period will # be removed. # # This value is useful in long-running network outages, preventing the WAL from # growing forever. # # Must be larger than min_wal_time. [max_wal_time: <duration> | default = "4h"] # Deadline for flushing data when a Prometheus instance shuts down # before giving up and letting the shutdown proceed. [remote_flush_deadline: <duration> | default = "1m"] # When true, writes staleness markers to all active series to # remote_write. [write_stale_on_shutdown: <boolean> | default = false] # A list of scrape configuration rules. scrape_configs: - [<scrape_config>] # A list of remote_write targets. remote_write: - [<remote_write>]

Note: More information on the following types can be found on the Prometheus website: