OKA Core

Simulation is strategic, it provides competitive advantages to industries, it helps moving scientific research forward and with the explosion of data and artificial intelligence, it is becoming essential to our lives… Efficiently running an HPC infrastructure is complex, and often lacks the proper tools to track down and get insights on how the users are behaving and how the cluster is responding to the demand.

UCit have packaged its HPC and machine learning expertise in a software tool which assists HPC system administrators to be even more effective. OKA Core provides an extensible platform that presents the state of your HPC infrastructure through simple and comprehensible dashboards. Whether you need high level KPIs to report the cluster usage, or low-level information to track down the origin of an issue – OKA Core gives you the right level of details.

OKA Core is the backbone of the OKASuite

Identify Atypical User Behaviors

Did you spot that novice user submitting bursts of jobs in the last 2 days?

Or that user who has less than 10% of his jobs that end correctly?

Improve Cluster Quality of Service

How long do your jobs spends in queue compared to their actual runtime?

Do you have a high proportion of failed/cancelled/timeout jobs?

Limit Waste of Compute Resources

What resources are left unused, while requested by your users?

How many of your jobs could run on cheaper nodes?

Plan Future Cluster Evolution

When do you have peak capacity needs that require additional resources?

How do you dimension your future clusters size?

Try OKA Core for free now

The all-in-one platform to optimize the use of your HPC resources and become more efficient.

Features

Number of jobs and core-hours consumed per job status

Allocated cores through time, and number of jobs allocated per node

Submission frequency, slowdown, interarrival, number of active jobs

Number of users active on the cluster (running or requesting jobs)

Number of cores, memory, nodes… used by the jobs

Detailed information about jobs grouped along multiple categories

Detection and detailed analysis of resubmitted jobs

Cluster state (Optimal, Acceptable, Contention, Congestion), and jobs life cycle