Available Pipelines

SeqDesk ships with built-in support for study pipelines and order pipelines. Study pipelines run across the samples of a study. Order pipelines operate on linked sequencing files inside an order and are typically used for simulation, validation, and QC.

Study Pipelines

MAG (Metagenome-Assembled Genomes)

Pipeline: nf-core/mag v3.0.0 Purpose: Assembly and binning of metagenomes Input: Paired-end FASTQ reads Role required: FACILITY_ADMIN

What MAG Does

The MAG pipeline takes raw metagenomic sequencing reads and produces:

Quality-controlled reads — trimmed and host-removed
Assembled contigs — via MEGAHIT and/or SPAdes
Genome bins — via MetaBAT2, MaxBin2, and/or CONCOCT
Refined bins — via DAS Tool
Quality scores — completeness and contamination via CheckM
Taxonomy — classification via GTDB-Tk
QC summary — MultiQC report

Configuration

Parameter	Default	Description
`stubMode`	false	Test mode (fast, no real analysis)
`skipMegahit`	false	Skip MEGAHIT assembler
`skipSpades`	true	Skip SPAdes assembler
`skipProkka`	true	Skip Prokka gene annotation
`skipConcoct`	true	Skip CONCOCT binning
`skipBinQc`	false	Skip bin quality control
`skipGtdb`	false	Skip GTDB-Tk taxonomy
`skipGunc`	false	Skip GUNC contamination check
`gtdbDb`	—	Path to GTDB-Tk database (optional)

Outputs

Output	Location	Description
Assemblies	`Assembly/MEGAHIT/`	Per-sample contig files (`.contigs.fa.gz`)
Bins	`GenomeBinning/DASTool/bins/`	Refined genome bins (`.fa`)
CheckM	`GenomeBinning/QC/`	Completeness and contamination TSV
GTDB-Tk	`Taxonomy/GTDB-Tk/`	Taxonomy classification TSV
MultiQC	`multiqc/`	Aggregate QC HTML report

Assemblies and bins are automatically parsed and linked to samples in the database after the run completes.

SubMG (ENA Submission Pipeline)

Pipeline: ttubb/submg v1.0.0 Purpose: Automated ENA submission of reads, assemblies, and bins Input: Samples with reads, assemblies, and optionally bins Role required: FACILITY_ADMIN

What SubMG Does

SubMG automates the submission of sequencing data to the European Nucleotide Archive (ENA). It handles the full submission workflow:

Validate inputs — checks study/sample prerequisites and ENA credentials
Generate config — creates SubMG YAML manifests and helper files
Submit — executes submg submit for each manifest
Parse receipts — reads ENA responses and stores accession numbers

Prerequisites

Before running SubMG, ensure:

The study has an ENA study accession (PRJEB...)
Samples have taxonomy IDs assigned
Reads are linked to samples
ENA credentials are configured (see ENA Credentials)

Configuration

Parameter	Default	Description
`skipChecks`	true	Skip pre-submission validation
`submitBins`	true	Include genome bins in the submission
`condaEnv`	`submg`	Conda environment name
`assemblySoftware`	`MEGAHIT`	Assembler used (for ENA metadata)
`completenessSoftware`	`CheckM`	QC software used
`binningSoftware`	`MetaBAT2`	Binner used

Outputs

After a successful run, accession numbers are stored back in the database:

Sample accessions — ERS/SAMEA numbers
Read accessions — ERX/ERR numbers
Assembly accessions — linked to Assembly records
Bin accessions — linked to Bin records

MetaxPath (Pathogen Profiling)

Pipeline: hzi-bifo/MetaxPath v0.1.0 Purpose: Long-read clinical metagenomics for pathogen identification, virulence, and AMR detection Input: Long-read FASTQ files (Oxford Nanopore or PacBio) Role required: FACILITY_ADMIN

What MetaxPath Does

MetaxPath is designed for clinical metagenomics on long-read sequencing data. It performs:

Human-read filtering — removes host contamination
Taxonomic profiling — via Metax and Sylph
Assembly — via Flye (or configurable assemblers)
Virulence factor prediction — identifies pathogenic gene markers
AMR detection — predicts antibiotic resistance genes
Reporting — generates HTML reports and species abundance dotplots

Supported Sequencers

Oxford Nanopore (MinION, GridION, PromethION)
PacBio (Sequel, Revio)

Configuration

Parameter	Default	Description
`sequencer`	`Nanopore`	Sequencing platform (`Nanopore` or `PacBio`)
`assemblers`	`flye`	Comma-separated assembler list
`threads`	20	CPU threads per process
`topn`	50	Number of top species in reports
`skipSylph`	false	Skip Sylph profiling
`skipVirulence`	false	Skip virulence factor prediction
`skipAmr`	false	Skip AMR detection

Database paths (must be configured by admin):

Parameter	Description
`metaxDb`	Metax database prefix (without .json)
`metaxDmpDir`	NCBI taxonomy dump directory
`kraken2Db`	Kraken2 database path
`sylphDb`	Sylph database path
`refIndex`	Host reference minimap2 index

Outputs

Output	Scope	Description
Profile with VFs/AMRs	Per sample	Merged taxonomic profile with virulence and AMR annotations
Top-N HTML report	Per study	Combined species abundance report
Readcount stats	Per study	Combined readcount summary
Dotplots	Per study	Species abundance visualizations (PDF)

Reads QC (Quality Overview)

Pipeline: reads-qc v0.1.0 Purpose: Per-sample FASTQ statistics with an HTML summary report Input: Linked sample reads (any scope; runs at study level) Role required: FACILITY_ADMIN

Reads QC computes read count, base count, average quality, and GC content for each sample’s FASTQ files and rolls them up into a study-level HTML overview. It’s a lighter alternative to per-sample FastQC when all you need is a quick comparison across the samples in a study. macOS ARM local runs are supported.

Main outputs:

Per-sample read statistics (counts, bases, quality, GC%)
Study-level HTML summary report
Study-level TSV with per-sample metrics

Study Demo Report

Pipeline: study-demo-report v0.1.0 Purpose: Deterministic HTML, Markdown, and TSV outputs for testing pipeline integration Input: Study + samples Role required: FACILITY_ADMIN

Study Demo Report is a smoke-test pipeline that produces deterministic outputs without any real bioinformatics work. Use it to verify that pipeline execution, weblog ingestion, output parsing, and the Assemblies/Results UI all hang together end-to-end — without burning CPU on real analysis. Useful in CI and as a first run after configuring a new install. macOS ARM local runs are supported.

Main outputs:

Study-scope HTML report
Markdown summary
TSV per-sample table

Order Pipelines

Simulate Reads

Pipeline: simulate-reads v0.2.0 Purpose: Generate dummy FASTQ files for selected order samples Input: Order samples Role required: FACILITY_ADMIN

Simulate Reads generates synthetic FASTQ files and links them back to canonical Read records. It is mainly useful for demos, smoke tests, and exercising downstream order-scoped QC workflows.

Main outputs:

Generated FASTQ files linked to Read records
Read counts written back to canonical read fields
Run-level simulation summary TSV

FASTQ Checksum

Pipeline: fastq-checksum v0.1.0 Purpose: Compute MD5 checksums for linked FASTQ files Input: Linked order FASTQ files Role required: FACILITY_ADMIN

FASTQ Checksum computes canonical MD5 checksums for linked read files and stores them back on the corresponding Read records for downstream validation and submission workflows.

Main outputs:

checksum1 / checksum2 on Read records
Run-level checksum summary TSV

FastQC

Pipeline: fastqc v0.1.0 Purpose: Run read quality control on linked FASTQ files Input: Linked order FASTQ files Role required: FACILITY_ADMIN

FastQC runs per-sample QC against linked order reads, publishes HTML reports and zip archives, and stores selected summary metrics back onto the canonical Read record.

Main outputs:

Per-sample FastQC HTML reports
Per-sample FastQC zip archives
Read counts and average quality metrics on Read records
Run-level FastQC summary TSV

Adding More Pipelines

SeqDesk supports adding custom pipelines through a package structure. See Adding Custom Pipelines for details on creating your own pipeline integrations.