VESSL Cloud Documentation

VESSL Cloud enforces resource limits on Workspaces and Jobs based on the resource spec selected at launch. This page describes how those limits are enforced and how the platform behaves when a workload exceeds them.

Limits enforced

Resource	Where it is defined
CPU	Resource spec (for example, 8 vCPUs)
Memory (RAM)	Resource spec (for example, 64 GiB)
GPU memory (VRAM)	GPU model (for example, H100 80 GiB)
Ephemeral local storage	Allocated per container at launch

Resource specs are selected when you create a Workspace or Job. To review available specs, see Create a workspace or Create a job.

Enforcement behavior

Limit exceeded	What happens
Memory	Container is killed (`OOMKilled`)
Ephemeral local storage	Container is removed from the node
CPU	Workload runs at reduced speed but continues
GPU memory (VRAM)	Process receives a CUDA out-of-memory (OOM) error; behavior after the error depends on the application code

After a container is killed

When a container is killed (Memory or Ephemeral local storage limit), recovery depends on the workload type.

Workload	Recovery behavior
Workspace	The platform automatically restarts the container on a healthy node. Cluster storage and Object storage volumes are re-attached. Temporary data and in-memory state are lost.
Job	The Job is marked as `failed`. There is no automatic retry. Submit a new Job to try again.

Data persistence after a violation

Data location	After a violation
Container memory state	Lost
Temporary local storage	Lost
Cluster storage volumes	Preserved
Object storage volumes	Preserved

Data on Cluster storage and Object storage volumes is unaffected by resource-limit violations. Store any data that must survive a violation on one of these volumes.

Where to check usage

Track resource usage in real time and review historical patterns:

Monitor workspace metrics — GPU, VRAM, CPU, memory, and network usage
Monitor job metrics — same metrics for Jobs
Workspace logs — review OOMKilled and eviction events
Job logs — review failure causes

Credit exhaustion policy — what happens when your credit balance reaches zero
Container Compute service terms — contractual terms for resource limits

Load data into a volume

Managed software stack

⌘I

​Limits enforced

​Enforcement behavior

​After a container is killed

​Data persistence after a violation

​Where to check usage

​Related

Limits enforced

Enforcement behavior

After a container is killed

Data persistence after a violation

Where to check usage

Related