Integrating the OKA Suite with your AWS cluster

Efficiently running an HPC infrastructure is complex, and often lacks the proper tools to track down and get insights on how the users are behaving and how the cluster is responding to the demand. This is even more complicated when using Cloud clusters. Due to their transient and dynamic nature, information about instances types, location and costs are important assets to monitor, especially when you pay for what you use. The task of managing and presenting these metrics becomes increasingly difficult when the scale of the infrastructure grows or undergoes changes over time, which is a common situation in Cloud environments.

“Standard” metrics and information provided by the job scheduler might not be sufficient to efficiently manage the Cloud clusters. For example, tracking the cost of running jobs becomes even more important in order to monitor and manage your budget, and redistribute the costs to your users/departments. Due to the wide variety of compute instance types in the cloud, it can also be interesting to track on which instance types the jobs have run, in order to check their performances and associated costs, and further improve the placement and instance selection of the jobs.

OKA offers many benefits to monitor your jobs, and deep-dive into how your clusters behave and how they are being used by your end-users. Accessing in OKA information about the cloud is straightforward, if you have configured your environment properly. In this article, we present a simple integration that can be made in a Slurm cluster in AWS to retrieve the type of instances jobs runs on, their pricing information (on-demand/spot, per hour price…), the AWS region, and virtually any information about the AWS environment you are using.
The scripts provided below are given as examples, and can easily be adapted to retrieve more detailed information, or be adapted to work with other job schedulers (e.g., LSF, PBS…).

There are many ways to create a cluster in AWS, the details are out of the scope of this article, but you can use for example AWS ParallelCluster or CCME.

Note: the solution presented here is extracted from CCME where it is available out of the box.

The principle depicted here is very simple, and relies on two components:

A Slurm epilog script that will gather information about the AWS environment on which the job runs, and store this information as comma separated values (CSV) in the Comment field of the job. The gathered information are:
- instance type
- instance id of the “main” job node
- availability zone
- region
- instance price
- cost type: ondemand or spot
- tenancy: shared, reserved…
An OKA Data Enhancer that will parse the values of the Comment field, and store them as additional information with each job.

The principle depicted here can also easily adapted to other Cloud providers. For example, you could follow the indications presented in the Azure integration with Slurm presented here in the “Granular Cost Control” section.

Slurm epilog script

This Slurm epilog script retrieves information about the instance type and its pricing when the job ends,
and stores the information in the Comment field of the job in sacct. The user provided comments are kept, and the information are added at the end after a semicolon. The format of the Comment field is the following:

Copy to clipboard

The following packages are required and should be available on all the nodes of the cluster:

jq
awscli

Also note that this solution requires that Slurm has been configured to keep accounting information about the jobs. See Slurm documentation to configure accounting manually, or if you are using AWS ParallelCluster you can follow this guide.

As the epilog script contacts AWS APIs to gather the information, it needs to run on instances having (at least) the following policy attached to the role of the instance:

Copy to clipboard

Script

Copy to clipboard

#!/bin/bash
declare logFile="/var/log/ccme.slurmepilog.log"
touch "${logFile}"; chmod -v 600 "${logFile}"
exec  > >(awk '{printf "[%s] %s\n", strftime("%FT%T%z"), $0; fflush()}' >>"${logFile}" || true)
exec 2>&1;
# Uncomment the following line for debug traces
#set -x

slurm_env="/opt/slurm/etc/slurm.sh"
if [[ -f "${slurm_env}" ]]; then
  # shellcheck source=/dev/null
  . "${slurm_env}"
fi

declare -A regions
regions["af-south-1"]="Africa (Cape Town)"
regions["ap-east-1"]="Asia Pacific (Hong Kong)"
regions["ap-east-1"]="Asia Pacific (Hong Kong)"
regions["ap-northeast-1"]="Asia Pacific (Tokyo)"
regions["ap-northeast-2"]="Asia Pacific (Seoul)"
regions["ap-south-1"]="Asia Pacific (Mumbai)"
regions["ap-south-2"]="Asia Pacific (Hyderabad)"
regions["ap-southeast-1"]="Asia Pacific (Singapore)"
regions["ap-southeast-2"]="Asia Pacific (Sydney)"
regions["ap-southeast-3"]="Asia Pacific (Jakarta)"
regions["ap-southeast-4"]="Asia Pacific (Melbourne)"
regions["ca-central-1"]="Canada (Central)"
regions["eu-central-1"]="EU (Frankfurt)"
regions["eu-central-2"]="Europe (Zurich)"
regions["eu-north-1"]="EU (Stockholm)"
regions["eu-south-1"]="Europe (Milan)"
regions["eu-south-2"]="Europe (Spain)"
regions["eu-west-1"]="EU (Ireland)"
regions["eu-west-2"]="EU (London)"
regions["eu-west-3"]="EU (Paris)"
regions["me-central-1"]="Middle East (UAE)"
regions["me-south-1"]="Middle East (Bahrain)"
regions["sa-east-1"]="South America (Sao Paulo)"
regions["us-east-1"]="US East (N. Virginia)"
regions["us-east-2"]="US East (Ohio)"
regions["us-west-1"]="US West (N. California)"
regions["us-west-2"]="US West (Oregon)"

# Gather Information about job environment on AWS
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
insttype=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" -v http://169.254.169.254/latest/meta-data/instance-type)
instid=$(curl -s  -H "X-aws-ec2-metadata-token: ${TOKEN}" -v http://169.254.169.254/latest/meta-data/instance-id)
az=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" -v http://169.254.169.254/latest/meta-data/placement/availability-zone)
region=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" -v http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region || true)

# We try to detect if we use spot through AWS APIs
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-purchasing-options.html#check-instance-lifecycle
# Use the following describe-instances command:
# aws ec2 describe-instances --instance-ids i-1234567890abcdef0
# - If the instance is running on a Dedicated Host, the output contains the following information: "Tenancy": "host"
# - If the instance is a Dedicated Instance, the output contains the following information: "Tenancy": "dedicated"
# - If the instance is a Spot Instance, the output contains the following information: "InstanceLifecycle": "spot"
# - If the instance is a Scheduled Instance, the output contains the following information: "InstanceLifecycle": "scheduled"
# - Otherwise, the output does not contain InstanceLifecycle.
lifecycle=$(aws --region="${region}" ec2 describe-instances --instance-ids "${instid}" | jq -r ".Reservations[0].Instances[0].InstanceLifecycle" || true)
tenancy=$(aws --region="${region}" ec2 describe-instances --instance-ids "${instid}" | jq -r ".Reservations[0].Instances[0].Placement.Tenancy" || true)

costtype="ondemand"  # Default value
if [[ "${lifecycle}" == "spot" ]]; then
  costtype="spot"
elif [[ "${lifecycle}" == "" ]]; then
  costtype="ondemand"
fi

# Get instance price
if [[ "${costtype}" == "ondemand" ]]; then
  if [[ "${tenancy}" == "default" ]]; then
    tenancy="shared"
  fi

filters=(
  "Type=TERM_MATCH,Field=instanceType,Value=${insttype}"
  "Type=TERM_MATCH,Field=location,Value=${regions[${region}]}"
  "Type=TERM_MATCH,Field=operatingSystem,Value=Linux"
  "Type=TERM_MATCH,Field=preInstalledSw,Value=NA"
  "Type=TERM_MATCH,Field=capacitystatus,Value=Used"
  "Type=TERM_MATCH,Field=tenancy,Value=${tenancy}"
  )
  # Warning: if tenancy==host (reserved), then the price will be $0.00
  instprice=$(aws --region=us-east-1 pricing get-products --service-code AmazonEC2 --filter "${filters[@]}" | jq -rc '.PriceList[0]' | jq -r '[.terms.OnDemand[].priceDimensions[].pricePerUnit.USD][0]' || true)
elif [[ "${costtype}" == "spot" ]]; then
  instprice=$(aws --region "${region}" ec2 describe-spot-price-history --availability-zone "${az}" --instance-types "${insttype}" --start-time "$(date '+%Y-%m-%dT%H:%M:%S')" --product-descriptions "Linux/UNIX" | jq -r '.SpotPriceHistory[0].SpotPrice' || true)
else
  # Currently we do not manage other types of pricing
  instprice=0
fi

# shellcheck disable=SC2154
comment=$(scontrol show job "${SLURM_JOBID}" | grep 'Comment=' | awk -F'Comment=' '{print $2}' || true)
comment+=":PricingInfo=${insttype};${instid};${az};${region};${instprice};${costtype};${tenancy}"

echo "Setting comment for job ${SLURM_JOBID}: ${comment}"

sacctmgr -i -Q modify job where JobID="${SLURM_JOBID}" set Comment="${comment}"

Installation

Copy the epilog script to a folder accessible on all nodes, e.g., /shared_nfs/slurm/slurm-epilog.sh, and give it execution rights: chmod +x /shared_nfs/slurm/slurm-epilog.sh
Edit /etc/slurm/slurm.conf (on all nodes), and set the Epilog option to /shared_nfs/slurm/slurm-epilog.sh
Reconfigure Slurm daemons: scontrol reconfigure, or restart them: systemctl restart slurmd

Then submit a job. Once finished, check that in the output of sacct you have in the Comment field the expected output:
sacct --format "jobid,comment".

OKA Data Enhancer

A Data Enhancer needs to be created and configured in OKA in order to parse the additional data gathered by the Slurm Epilog script. We propose here an example Data Enhancer that you can adapt to your needs (included in comment is the generation of “fake” data if you wish to test it first):

Copy to clipboard

import logging
    import numpy as np

logger = logging.getLogger("oka_main")

class EnhancerAWSSLURMFeature():
        VERSION = "1.0.0"

def parse_comment(comment):
            # :pricinginfo=${instance type};${instance id};${availability zone};${region};${price};${spot|ondemand};{tenancy}
            # e.g., :pricinginfo=c5n.18xlarge;i-0cd3c13fa4599d4d5;eu-west-1b;eu-west-1;1.241400;spot;shared
            scsv = comment.split(':pricinginfo=')[1].split(';')
            insttype = scsv[0]
            instid = scsv[1]
            az = scsv[2]
            region = scsv[3]
            instprice = float(scsv[4]) if scsv[4].lower() not in ["na", "nan"] else 0.0
            costtype = scsv[5]
            tenancy = scsv[6]
            return insttype, instid, az, region, instprice, costtype, tenancy

def run(self, data, **kwargs):
            try:
                # Uncomment the following lines for testing with fake data
                # from random import random, randint
                # azs = [["a", "b", "c"][randint(0,2)] for y in range(len(data))]
                # prices = [random() for y in range(len(data))]
                # inst = [["c5n.18xlarge", "c5n.9xlarge", "c5n.4xlarge", "c5.4xlarge", "g4dn.8xlarge"][randint(0,4)] for y in range(len(data))]
                # data.loc[:, "Comment"] = ["comment:pricinginfo={};id12;eu-west-1{};eu-west-1;{};spot;shared".format(inst[x], azs[x], prices[x]) for x in range(len(data))]

# Fill missing values with a set of default values
                data["Comment"].fillna(":pricinginfo=NA;NA;NA;NA;0;ondemand;shared", inplace=True)

data.loc[:, "instance_type"], data.loc[:, "instance_id"], data.loc[:, "availability_zone"], data.loc[:, "region"], instprice, data.loc[:, "price_type"], data.loc[:, "tenancy"] = zip(*data["Comment"].apply(EnhancerAWSSLURMFeature.parse_comment))

duration = (data["End"].astype(np.datetime64) - data["Start"].astype(np.datetime64)).astype('timedelta64[s]')/3600.0
                data.loc[:, "Cost"] = instprice * duration * data["Allocated_Nodes"]

except Exception as e:
                logger.error(f"Cost information not available: {e}")

Installation

Please refer to the Data Enhancer section for explanations on how to install and configure this Data Enhancer in the ingestion pipeline.

Accessing AWS information in OKA

The information ingested through the Data Enhancer are then available in OKA in multiple plugins and through the filters.
We present below a few examples of where the information can be accessed and used to analyze your workloads:

AWS costs in KPI

Filters allow to select workloads based on the information gathered from AWS.

Detailed information in Plugin Consumers

Cost per job status in Plugin State to detect waste.

Conclusion

This article presented a simple integration approach for a Slurm cluster in AWS. By leveraging a Slurm epilog script and an OKA Data Enhancer, valuable information about the AWS environment can be retrieved and analyzed.

By utilizing the integrated AWS information in OKA, administrators gain access to various plugins and filters for analyzing and visualizing workloads. This enables better cost management, granular control, and identification of wasteful practices.

Overall, the integration of an AWS cluster with OKA empowers administrators to optimize their HPC infrastructure, gain insights into resource utilization and costs, and make data-driven decisions for efficient cluster management in cloud environments.