Available Pipelines
SeqDesk ships with built-in support for three bioinformatics pipelines. Each pipeline is a Nextflow workflow designed for a specific analysis type.
MAG (Metagenome-Assembled Genomes)
Pipeline: nf-core/mag v3.4.0 Purpose: Assembly and binning of metagenomes Input: Paired-end FASTQ reads Role required: FACILITY_ADMIN
What MAG Does
The MAG pipeline takes raw metagenomic sequencing reads and produces:
- Quality-controlled reads — trimmed and host-removed
- Assembled contigs — via MEGAHIT and/or SPAdes
- Genome bins — via MetaBAT2, MaxBin2, and/or CONCOCT
- Refined bins — via DAS Tool
- Quality scores — completeness and contamination via CheckM
- Taxonomy — classification via GTDB-Tk
- QC summary — MultiQC report
Configuration
| Parameter | Default | Description |
|---|---|---|
stubMode | false | Test mode (fast, no real analysis) |
skipMegahit | false | Skip MEGAHIT assembler |
skipSpades | true | Skip SPAdes assembler |
skipProkka | true | Skip Prokka gene annotation |
skipConcoct | true | Skip CONCOCT binning |
skipBinQc | false | Skip bin quality control |
skipGtdb | false | Skip GTDB-Tk taxonomy |
skipGunc | false | Skip GUNC contamination check |
gtdbDb | — | Path to GTDB-Tk database (optional) |
Outputs
| Output | Location | Description |
|---|---|---|
| Assemblies | Assembly/MEGAHIT/ | Per-sample contig files (.contigs.fa.gz) |
| Bins | GenomeBinning/DASTool/bins/ | Refined genome bins (.fa) |
| CheckM | GenomeBinning/QC/ | Completeness and contamination TSV |
| GTDB-Tk | Taxonomy/GTDB-Tk/ | Taxonomy classification TSV |
| MultiQC | multiqc/ | Aggregate QC HTML report |
Assemblies and bins are automatically parsed and linked to samples in the database after the run completes.
SubMG (ENA Submission Pipeline)
Pipeline: ttubb/submg Purpose: Automated ENA submission of reads, assemblies, and bins Input: Samples with reads, assemblies, and optionally bins Role required: FACILITY_ADMIN
What SubMG Does
SubMG automates the submission of sequencing data to the European Nucleotide Archive (ENA). It handles the full submission workflow:
- Validate inputs — checks study/sample prerequisites and ENA credentials
- Generate config — creates SubMG YAML manifests and helper files
- Submit — executes
submg submitfor each manifest - Parse receipts — reads ENA responses and stores accession numbers
Prerequisites
Before running SubMG, ensure:
- The study has an ENA study accession (
PRJEB...) - Samples have taxonomy IDs assigned
- Reads are linked to samples
- ENA credentials are configured (see ENA Credentials)
Configuration
| Parameter | Default | Description |
|---|---|---|
skipChecks | true | Skip pre-submission validation |
submitBins | true | Include genome bins in the submission |
condaEnv | submg | Conda environment name |
assemblySoftware | MEGAHIT | Assembler used (for ENA metadata) |
completenessSoftware | CheckM | QC software used |
binningSoftware | MetaBAT2 | Binner used |
Outputs
After a successful run, accession numbers are stored back in the database:
- Sample accessions — ERS/SAMEA numbers
- Read accessions — ERX/ERR numbers
- Assembly accessions — linked to Assembly records
- Bin accessions — linked to Bin records
MetaxPath (Pathogen Profiling)
Pipeline: hzi-bifo/MetaxPath v0.1.0 Purpose: Long-read clinical metagenomics for pathogen identification, virulence, and AMR detection Input: Long-read FASTQ files (Oxford Nanopore or PacBio) Role required: FACILITY_ADMIN
What MetaxPath Does
MetaxPath is designed for clinical metagenomics on long-read sequencing data. It performs:
- Human-read filtering — removes host contamination
- Taxonomic profiling — via Metax and Sylph
- Assembly — via Flye (or configurable assemblers)
- Virulence factor prediction — identifies pathogenic gene markers
- AMR detection — predicts antibiotic resistance genes
- Reporting — generates HTML reports and species abundance dotplots
Supported Sequencers
- Oxford Nanopore (MinION, GridION, PromethION)
- PacBio (Sequel, Revio)
Configuration
| Parameter | Default | Description |
|---|---|---|
sequencer | Nanopore | Sequencing platform (Nanopore or PacBio) |
assemblers | flye | Comma-separated assembler list |
threads | 20 | CPU threads per process |
topn | 50 | Number of top species in reports |
skipSylph | false | Skip Sylph profiling |
skipVirulence | false | Skip virulence factor prediction |
skipAmr | false | Skip AMR detection |
Database paths (must be configured by admin):
| Parameter | Description |
|---|---|
metaxDb | Metax database prefix (without .json) |
metaxDmpDir | NCBI taxonomy dump directory |
kraken2Db | Kraken2 database path |
sylphDb | Sylph database path |
refIndex | Host reference minimap2 index |
Outputs
| Output | Scope | Description |
|---|---|---|
| Profile with VFs/AMRs | Per sample | Merged taxonomic profile with virulence and AMR annotations |
| Top-N HTML report | Per study | Combined species abundance report |
| Readcount stats | Per study | Combined readcount summary |
| Dotplots | Per study | Species abundance visualizations (PDF) |
Adding More Pipelines
SeqDesk supports adding custom pipelines through a package structure. See Adding Custom Pipelines for details on creating your own pipeline integrations.