Skip to main content
VESSL Cloud enforces resource limits on Workspaces and Jobs based on the resource spec selected at launch. This page describes how those limits are enforced and how the platform behaves when a workload exceeds them.

Limits enforced

ResourceWhere it is defined
CPUResource spec (for example, 8 vCPUs)
Memory (RAM)Resource spec (for example, 64 GiB)
GPU memory (VRAM)GPU model (for example, H100 80 GiB)
Ephemeral local storageAllocated per container at launch
Resource specs are selected when you create a Workspace or Job. To review available specs, see Create a workspace or Create a job.

Enforcement behavior

Limit exceededWhat happens
MemoryContainer is killed (OOMKilled)
Ephemeral local storageContainer is removed from the node
CPUWorkload runs at reduced speed but continues
GPU memory (VRAM)Process receives a CUDA out-of-memory (OOM) error; behavior after the error depends on the application code

After a container is killed

When a container is killed (Memory or Ephemeral local storage limit), recovery depends on the workload type.
WorkloadRecovery behavior
WorkspaceThe platform automatically restarts the container on a healthy node. Cluster storage and Object storage volumes are re-attached. Temporary data and in-memory state are lost.
JobThe Job is marked as failed. There is no automatic retry. Submit a new Job to try again.

Data persistence after a violation

Data locationAfter a violation
Container memory stateLost
Temporary local storageLost
Cluster storage volumesPreserved
Object storage volumesPreserved
Data on Cluster storage and Object storage volumes is unaffected by resource-limit violations. Store any data that must survive a violation on one of these volumes.

Where to check usage

Track resource usage in real time and review historical patterns: