Amazon Web Services (AWS)

collecting aws cloudwatch logs logs generated by aws managed services like rds , elasticache , mks etc are available only in cloudwatch logs these logs can be pulled into the opsverse observability stack using the log shipper lambda function this enables you to bring all your logs to one central system this lambda function is delivered as a terraform script please download this zip file https //opsverse public s3 amazonaws com/integrations/aws lambda/lambda zip and unzip it on a computer from where you can run terraform structure when unzipped, it should look like this /lambda/ ├── iam tf ├── main tf ├── triggers tf └── variables tf running the lambda to run the lambda, please provide the necessary variables present in the variables tf run the lambda script using the following two commands terraform plan terraform apply here's a sample run module "logshipper lambda" { source = "/path/to/extracted/dir/lambda" create execution role = true aws region = "us east 1" loki endpoint = "https //example com" loki username = "testuser" loki password = "testpass" \# list of log groups to ship to loki cw log groups = \[ "/aws/rds/instance/database 1/slowquery", "/aws/rds/instance/database 2/slowquery" ] aws account id = "1234567890123" } please run once for each region you need this for the first time you run it in your account, make sure to have create execution role = true set remove this in subsequent runs of different regions in the same account (because iam resources are global, not region specific there's no need to re create them in subsequent runs) summary of what it creates a lambda function ( log shipper ) by pulling our function from s3 the supported aws regions are us east 1 , us west 2 , eu central 1 and ap south 1 the corresponding roles and policies to allow the function to both allow the function specific permissions needed on the log groups passed in create its own log group (for debug/output) a subscription (trigger) of each of the log groups to the newly created lambda function# aws cloudwatch metrics collecting aws cloudwatch metrics metrics generated by aws managed services like rds , elasticache , mks etc are available only in aws cloudwatch opsverse agent can pull those metrics and bring them into the same metrics backend enabling you to visualize all your metrics in one place tag your aws resources set the following tags in aws for the resources you want to monitor this allows you to easily identify the aws metrics from your resources and correlate them with the rest of the telemetry data rds key opsverse database name value cloudfront key opsverse monitor value setting up authentication following types of authentication can be set up for the cloudwatch agent configure using an aws role configure using aws access key id and aws secret access key configure using pre created secret in which aws credentials are stored service account based authentication using aws access and secret keys you can configure aws access key id and aws secret access key in the values file under aws as a method of authentication the chart values when configured with these values shall create a secret which will be used for authentication using existing secret copy the following kubernetes secret template into a file named cloudwatch secret yaml and update the access and secret keys aws access key id is assumed to be in a field called access key , aws secret access key is assumed to be in a field called secret key , and the session token , if it exists, is assumed to be in a field called security token apiversion v1 kind secret metadata name aws cloudwatch agent namespace devopsnow type opaque data access key \<base 64 encoded aws access key id> secret key \<base 64 encoded aws secret access key> use the following command to create the secret kubectl apply f cloudwatch secret yaml n devopsnow this shall create a secret named aws cloudwatch agent in devopsnow namespace you can now use this secret configuration in the values file under aws service account based auth prerequisites 1 check if your aws eks cluster already has an associated openid connect provider url to do this navigate to the overview section of you aws eks cluster in the aws console and check for openid connect provider url if one is not associated with the cluster, use the following aws doc to do so authenticating users via oidc provider 2\ navigate to access management > identity providers in aws iam check for an entry corresponding to you oidc provider url if not present, add an identity provider for the correspoding oidc provider url using the below aws doc creating oidc provider step 1 creating a custom policy navigate to iam > access management > policies and create a new iam policy using the following json { "version" "2012 10 17", "statement" \[ { "sid" "statement1", "effect" "allow", "action" \[ "tag\ getresources", "cloudwatch\ getmetricdata", "cloudwatch\ getmetricstatistics", "cloudwatch\ listmetrics", "apigateway\ get", "aps\ listworkspaces", "autoscaling\ describeautoscalinggroups", "dms\ describereplicationinstances", "dms\ describereplicationtasks", "ec2\ describetransitgatewayattachments", "ec2\ describespotfleetrequests", "shield\ listprotections", "storagegateway\ listgateways", "storagegateway\ listtagsforresource", "iam\ listaccountaliases" ], "resource" " " } ] } step 2 creating a new iam role navigate to access management > roles in aws iam create a new role of type web identity using the identity provider corresponding to your oidc provider url set the audience as sts amazonaws com select the policy that we created in step 1 to attach permissions to the role and create the role by giving an appropriate name step 3 verify trust relationships navigate to the trust relationships section inside your role verify and update the trust relationships using the below json { "version" "2012 10 17", "statement" \[ { "effect" "allow", "principal" { "federated" "arn\ aws\ iam \<aws account id>\ oidc provider/\<openid connect url>" }, "action" "sts\ assumerolewithwebidentity", "condition" { "stringlike" { "\<openid connect url>\ aud" "sts amazonaws com", "\<openid connect url>\ sub" "system\ serviceaccount cloudwatch agent" } } } ] } do not include the https prefix when using the oidc url from the eks console configuring values yaml and installing the chart copy the template in a cloudwatch values yaml named file and configure the values appropriately if using aws role, aws access keys, precreated secret or the service account for authentication, update the values file accordingly only one of the authentication methods needs to be configured please remove the other authentication configurations from the values podannotations prometheus io/path /metrics prometheus io/port 5000 prometheus io/scrape true \# use one of the following methods to authenticate with aws aws \# configure using an aws role role \<role> \# or \# configure using aws access key id and aws secret access key (not recommended for prod) aws access key id \<edit add your access key> aws secret access key \<edit add your access secret> \# or \# configure using a existing secret secret name aws cloudwatch agent includessessiontoken false \# or \# configure using a service account serviceaccount create true annotations eks amazonaws com/role arn arn\ aws\ iam <12 digit aws account id>\ role/\<cloudwatch metrics role name> name cloudwatch agent config | apiversion v1alpha1 sts region us east 1 discovery exportedtagsonmetrics rds \ opsverse database name cloudfront \ opsverse monitor jobs \ type rds searchtags \ key opsverse database name value \<edit add rds tag value here> length 60 period 60 regions \ us east 1 metrics \ name cpuutilization statistics \ average \ name binlogdiskusage statistics \ average \ name burstbalance statistics \ average \ name cpucreditusage statistics \ average \ name databaseconnections statistics \ average \ name diskqueuedepth statistics \ average \ name ebsbytebalance statistics \ average \ name networkreceivethroughput statistics \ average \ name failedsqlserveragentjobscount statistics \ average \ name maximumusedtransactionids statistics \ average \ name freeablememory statistics \ average \ name freestoragespace statistics \ average \ name networktransmitthroughput statistics \ average \ name oldestreplicationslotlag statistics \ average \ name readiops statistics \ average \ name readlatency statistics \ average \ name readthroughput statistics \ average \ name replicalag statistics \ average \ name replicationslotdiskusage statistics \ average \ name swapusage statistics \ average \ name transactionlogsdiskusage statistics \ average \ name transactionlogsgeneration statistics \ average \ name writeiops statistics \ average \ name writelatency statistics \ average \ name writethroughput statistics \ average \ type cloudfront searchtags \ key opsverse monitor value \<edit add cloudfront tag value here> length 300 period 300 regions \ us east 1 metrics \ name 4xxerrorrate statistics \ average \ name 401errorrate statistics \ average \ name 403errorrate statistics \ average \ name 404errorrate statistics \ average \ name 5xxerrorrate statistics \ average \ name 502errorrate statistics \ average \ name 503errorrate statistics \ average \ name 504errorrate statistics \ average \ name bytesdownloaded statistics \ sum \ name bytesuploaded statistics \ sum \ name cachehitrate statistics \ average \ name originlatency statistics \ percentile \ name requests statistics \ sum \ name totalerrorrate statistics \ average \ type amazonmwaa length 60 period 60 regions \ us east 1 metrics \ name slamissed statistics \ average \ name failedslacallback statistics \ average \ name updates statistics \ average \ name orphaned statistics \ average \ name failedcelerytaskexecution statistics \ average \ name filepathqueueupdatecount statistics \ average \ name criticalsectionbusy statistics \ average \ name dagbagsize statistics \ average \ name dagcallbackexceptions statistics \ average \ name failedslaemailattempts statistics \ average \ name taskinstancefinished statistics \ average \ name jobend statistics \ average \ name jobheartbeatfailure statistics \ average \ name jobstart statistics \ average \ name managerstalls statistics \ average \ name operatorfailures statistics \ average \ name operatorsuccesses statistics \ average \ name othercallbackcount statistics \ average \ name processes statistics \ average \ name schedulerheartbeat statistics \ average \ name startedtaskinstances statistics \ average \ name slacallbackcount statistics \ average \ name taskskilledexternally statistics \ average \ name tasktimeouterror statistics \ average \ name taskinstancecreatedusingoperator statistics \ average \ name taskinstancepreviouslysucceeded statistics \ average \ name taskinstancefailures statistics \ average \ name taskinstancesuccesses statistics \ average \ name taskremovedfromdag statistics \ average \ name taskrestoredtodag statistics \ average \ name triggerssucceeded statistics \ average \ name triggersfailed statistics \ average \ name triggersblockedmainthread statistics \ average \ name triggerheartbeat statistics \ average \ name taskinstancecreatedusingoperator statistics \ average \ name zombieskilled statistics \ average \ name dagfilerefresherror statistics \ average \ name importerrors statistics \ average \ name exceptionfailures statistics \ average \ name executedtasks statistics \ average \ name infrafailures statistics \ average \ name loadedtasks statistics \ average \ name totalparsetime statistics \ average \ name triggereddagruns statistics \ average \ name triggersrunning statistics \ average \ name pooldeferredslots statistics \ average \ name dagfileprocessinglastrunsecondsago statistics \ average \ name openslots statistics \ average \ name orphanedtasksadopted statistics \ average \ name orphanedtaskscleared statistics \ average \ name pokedexceptions statistics \ average \ name pokedsuccess statistics \ average \ name pokedtasks statistics \ average \ name poolfailures statistics \ average \ name poolstarvingtasks statistics \ average \ name poolopenslots statistics \ average \ name poolqueuedslots statistics \ average \ name poolrunningslots statistics \ average \ name processortimeouts statistics \ average \ name queuedtasks statistics \ average \ name runningtasks statistics \ average \ name tasksexecutable statistics \ average \ name taskspending statistics \ average \ name tasksrunning statistics \ average \ name tasksstarving statistics \ average \ name taskswithoutdagrun statistics \ average \ name collectdbdags statistics \ average \ name criticalsectionduration statistics \ average \ name criticalsectionqueryduration statistics \ average \ name dagdependencycheck statistics \ average \ name dagdurationfailed statistics \ average \ name dagdurationsuccess statistics \ average \ name dagfileprocessinglastduration statistics \ average \ name dagscheduledelay statistics \ average \ name firsttaskschedulingdelay statistics \ average \ name schedulerloopduration statistics \ average \ name taskinstanceduration statistics \ average \ name taskinstancequeuedduration statistics \ average \ name taskinstancescheduledduration statistics \ average run the following command in a kubernetes cluster helm upgrade install cloudwatch agent n devopsnow create namespace cloudwatch agent \\ \ repo https //registry devopsnow\ io/chartrepo/public \\ f cloudwatch values yaml

Prometheus-Compatible Metrics EndEndpoints

Amazon Managed Workflows for Apache Airflow (MWAA)