Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cloud.vessl.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

A job runs a single vesslctl job create command in a container and exits when the command finishes. Use it for training, inference, and batch processing workloads. Submission is CLI-only — the console shows status, logs, and metrics. New to the CLI? Start with the CLI Quickstart.

Prerequisites

  • Authenticated vesslctl: Run vesslctl auth login to complete the browser OAuth flow. See the CLI Quickstart for the full setup.
  • Available credit balance: Job creation is blocked when the balance is zero or negative. Add a payment method and top up from Billing.
  • (Optional) Persistent volume for outputs: Create an Object storage or Cluster storage volume ahead of time if you need to keep model checkpoints or other artifacts.

Submit

Provide a cluster, resource spec, container image, command to run, and any volumes or environment variables you need.
vesslctl job create \
  --name my-training-job \
  --resource-spec <spec-slug> \
  --image quay.io/vessl-ai/torch:2.9.1-cuda13.0.1-py3.13-slim \
  --cmd "python train.py --epochs 10" \
  --env WANDB_API_KEY=<your-key> \
  --object-volume <volume-slug>:/output \
  --tag training-run-2026-04
Run vesslctl cluster list and vesslctl resource-spec list to discover available clusters and GPU specs. See vesslctl job create for the full flag reference.

Persist job output

Jobs run in ephemeral containers — anything written outside a mounted volume disappears when the job ends. Attach at least one persistent volume so your outputs survive.
  • Object storage (--object-volume): Shared across clusters, ideal for final artifacts like trained models and evaluation metrics. Mount at a dedicated path such as /output.
  • Cluster storage (--cluster-volume): Fast in-cluster storage, ideal for intermediate checkpoints during long training. Mount at /workspace or similar.
vesslctl job create \
  --name my-training-job \
  --resource-spec <spec-slug> \
  --image quay.io/vessl-ai/torch:2.9.1-cuda13.0.1-py3.13-slim \
  --object-volume <output-volume-slug>:/output \
  --cmd "python train.py --output /output"
Temporary storage is cleared when a job ends, even after succeeded. If your training script writes to /tmp or the current directory without a mounted volume, the results are lost.

Reuse a job configuration

Export an existing job’s configuration as JSON and resubmit it later:
vesslctl job export my-job-abc123 > job-config.json
vesslctl job create --file job-config.json

Submit a job from inside a workspace

Workspaces ship with vesslctl pre-authenticated via a workload token, so you can iterate on a script in JupyterLab and submit the same code as a batch job from the same shell. See vesslctl workspace for details.

See also