Skip to main content
A Job runs a command to completion on a specified GPU or CPU resource. Unlike workspaces, jobs are non-interactive — they execute, produce output, and terminate automatically. Jobs are ideal for:
  • Model training and fine-tuning
  • Batch inference and evaluation
  • Data preprocessing pipelines
  • Hyperparameter sweeps (submit multiple jobs in parallel)
VESSL Cloud Jobs list page showing job name, status, requested resources, duration, and creator

Jobs vs Workspaces

JobWorkspace
InteractionNon-interactive (runs a command)Interactive (SSH, JupyterLab)
LifecycleStarts → runs → completes automaticallyStays running until you pause or terminate
BillingOnly while runningWhile running or paused (at reduced rate)
Best forTraining, batch processing, sweepsDevelopment, debugging, exploration

Creating a job

You can create jobs from the VESSL Cloud console or the CLI. From the console: Navigate to Jobs in the sidebar and click Create Job. Select a cluster, resource spec, container image, and enter the command to run. From the CLI:
vesslctl job create \
  --name my-training-job \
  --cluster <cluster-name> \
  --resource-spec <spec-name> \
  --image quay.io/vessl-ai/torch:2.9.1-cuda13.0.1-py3.13-slim \
  --cmd "python train.py --epochs 10"
Run vesslctl cluster list and vesslctl resource-spec list to see available clusters and GPU specs. See vesslctl job for the full CLI reference.

Monitoring jobs

Once a job is submitted, you can monitor its progress from the Jobs list page. Each job shows its status, resources (GPU type and count), duration, and creator. Click on a job to view detailed logs and resource utilization. From the CLI:
vesslctl job list --state running
vesslctl job show <job-slug>

Job statuses

StatusMeaning
schedulingWaiting for resources to become available. The job shows a reason like Waiting for GPU capacity while it queues.
runningYour command is actively executing on the allocated resources.
completedThe command exited successfully (exit code 0). Output in mounted volumes is preserved.
failedThe command exited with a non-zero code, or the container crashed (for example, OOMKilled). Check logs to debug.
terminatedYou manually cancelled the job before it finished.

Persist job output

Jobs run in ephemeral containers — anything written outside a mounted volume disappears when the job ends. Attach at least one persistent volume so your outputs survive.
  • Object storage (--object-volume): Shared across clusters, ideal for final artifacts like trained models and evaluation metrics. Mount at a dedicated path such as /output.
  • Cluster storage (--cluster-volume): Fast in-cluster storage, ideal for intermediate checkpoints during long training. Mount at /workspace or similar.
vesslctl job create \
  --name my-training-job \
  --resource-spec <spec-name> \
  --image quay.io/vessl-ai/torch:2.9.1-cuda13.0.1-py3.13-slim \
  --object-volume <output-volume-slug>:/output \
  --cmd "python train.py --output /output"
Temporary storage is cleared when a job ends, even after completed. If your training script writes to /tmp or the current directory without a mounted volume, the results are lost.

View job logs

Stream logs while a job runs or after it finishes:
vesslctl job logs <job-slug> --follow   # stream in real time
vesslctl job logs <job-slug> --limit 500  # last 500 lines
Logs are also available in the Jobs detail page under the Logs tab.

Cancel a running job

Terminate a job from the console (kebab menu → Terminate) or from the CLI:
vesslctl job terminate <job-slug>
Cancellation stops compute billing immediately. Data already written to mounted volumes is preserved.

Next steps