Running a Pipeline

SeqDesk supports two launch contexts:

Study pipelines run across the selected samples of a study.
Order pipelines run on the linked sequencing files of samples in an order.

In both cases, SeqDesk prepares the package inputs automatically and executes the workflow either locally or on a SLURM cluster.

Common Prerequisites

Before running a pipeline:

Pipelines must be enabled in admin settings (pipelines.enabled: true)
The execution environment must be configured (local or SLURM)
You need the FACILITY_ADMIN role

Some packages require linked reads. For example, MAG, FASTQ Checksum, and FastQC require FASTQ files to already be linked. Simulate Reads is the exception because it generates read files instead of consuming existing ones.

Launching a Study Pipeline

Study pipelines are the right choice for workflows that combine multiple samples into larger analyses, reports, or submission jobs.

Open the study

Navigate to the study that contains your samples. Go to the Pipelines tab.

Select a pipeline

Choose from the available pipelines (e.g., MAG). Each pipeline shows its description and requirements.

Configure parameters

Adjust pipeline-specific settings:

MAG Pipeline options:

Parameter	Default	Description
Stub Mode	false	Test mode — runs fast without actual analysis
Skip MEGAHIT	false	Skip the MEGAHIT assembler
Skip SPAdes	true	Skip the SPAdes assembler
Skip Prokka	true	Skip gene annotation
Skip CONCOCT	true	Skip CONCOCT binning
Skip Bin QC	false	Skip bin quality control
Skip GTDB-Tk	false	Skip taxonomy classification

Select samples

Choose which samples from the study to include. All selected samples must have reads assigned.

Launch

Confirm and start the run. SeqDesk:

Generates a samplesheet CSV from your samples and reads
Creates a run directory (e.g., MAG-20240126-001/)
Builds the Nextflow execution command
Starts the pipeline (locally or via SLURM)

Launching an Order Pipeline

Order pipelines are the right choice for sample-level sequencing utilities such as read simulation, checksum validation, and read QC.

Open the order

Navigate to the order you want to work on. Use the sequencing or pipeline area for that order, depending on the package and your current workflow.

Review linked sequencing files

Check whether the samples already have linked FASTQ files. This is required for packages such as FASTQ Checksum and FastQC. If the order has no reads yet, you can start with Simulate Reads.

Select an order pipeline

Choose the package you want to run for that order. The current built-in order catalog includes Simulate Reads, FASTQ Checksum, and FastQC.

Configure parameters

Order pipelines typically have narrower configuration than study pipelines. Examples:

Pipeline	Example parameters
Simulate Reads	Mode, read count, read length, replace existing files
FASTQ Checksum	Usually no additional configuration
FastQC	Usually no additional configuration

Launch

Confirm and start the run. SeqDesk:

Generates the package inputs from order samples and linked reads
Creates a run directory for the package
Builds the Nextflow execution command
Starts the pipeline and tracks the run
Resolves artifacts and validated Read writeback after completion

Run Number Format

Each run gets a unique number: {PIPELINE}-{YYYYMMDD}-{NNN} (e.g., MAG-20240126-001).

Input Generation

SeqDesk auto-generates the package inputs that Nextflow expects. The exact file shape depends on the package, but the source data always comes from canonical SeqDesk records.

Scope	Typical generated input
Study pipeline	A study-level samplesheet built from selected samples and their reads
Order pipeline	A samplesheet or manifest generated from order samples and linked reads

For the MAG pipeline, each row contains:

Column	Source
sample	Sample alias or ID
group	Sample group (from study)
short_reads_1	Path to R1 FASTQ file
short_reads_2	Path to R2 FASTQ file

The generated input is saved in the run directory, typically as samplesheet.csv or a package-specific manifest file.

Execution Modes

Local

Nextflow runs directly on the SeqDesk server. Suitable for testing and small datasets.

SLURM

Nextflow submits jobs to a SLURM cluster. Configure in admin settings:

Setting	Default	Description
Queue	`default`	SLURM partition name
Cores	4	CPUs per job
Memory	`16GB`	Memory per job
Time Limit	24h	Maximum run time
Additional Options	—	Extra SLURM flags

The SLURM job ID is tracked in the queueJobId field for status monitoring.

Run Directory Structure

Each run creates a directory under the configured pipelineRunDir. The exact outputs differ by package, but the common execution files are similar:


{PIPELINE}-{YYYYMMDD}-{NNN}/
├── script.sh           # Generated Nextflow command
├── samplesheet.csv     # Or another generated package input
├── cluster_config.cfg  # Nextflow configuration
├── nextflow.log        # Execution log
├── trace.txt           # Process trace (TSV)
├── output              # stdout
├── error               # stderr
└── ...                 # Package-specific outputs and artifacts