VESSL Cloud enforces resource limits on Workspaces and Jobs based on the resource spec selected at launch. This page describes how those limits are enforced and how the platform behaves when a workload exceeds them.
Limits enforced
| Resource | Where it is defined |
|---|
| CPU | Resource spec (for example, 8 vCPUs) |
| Memory (RAM) | Resource spec (for example, 64 GiB) |
| GPU memory (VRAM) | GPU model (for example, H100 80 GiB) |
| Ephemeral local storage | Allocated per container at launch |
Enforcement behavior
| Limit exceeded | What happens |
|---|
| Memory | Container is killed (OOMKilled) |
| Ephemeral local storage | Container is removed from the node |
| CPU | Workload runs at reduced speed but continues |
| GPU memory (VRAM) | Process receives a CUDA out-of-memory (OOM) error; behavior after the error depends on the application code |
After a container is killed
When a container is killed (Memory or Ephemeral local storage limit), recovery depends on the workload type.
| Workload | Recovery behavior |
|---|
| Workspace | The platform automatically restarts the container on a healthy node. Cluster storage and Object storage volumes are re-attached. Temporary data and in-memory state are lost. |
| Job | The Job is marked as failed. There is no automatic retry. Submit a new Job to try again. |
Data persistence after a violation
| Data location | After a violation |
|---|
| Container memory state | Lost |
| Temporary local storage | Lost |
| Cluster storage volumes | Preserved |
| Object storage volumes | Preserved |
Data on Cluster storage and Object storage volumes is unaffected by resource-limit violations. Store any data that must survive a violation on one of these volumes.
Where to check usage
Track resource usage in real time and review historical patterns: