Skip to Content
Pipelines & AnalysisAvailable Pipelines

Available Pipelines

SeqDesk ships with built-in support for study pipelines and order pipelines. Study pipelines run across the samples of a study. Order pipelines operate on linked sequencing files inside an order and are typically used for simulation, validation, and QC.

Study Pipelines

MAG (Metagenome-Assembled Genomes)

Pipeline: nf-core/mag v3.0.0 Purpose: Assembly and binning of metagenomes Input: Paired-end FASTQ reads Role required: FACILITY_ADMIN

What MAG Does

The MAG pipeline takes raw metagenomic sequencing reads and produces:

  1. Quality-controlled reads — trimmed and host-removed
  2. Assembled contigs — via MEGAHIT and/or SPAdes
  3. Genome bins — via MetaBAT2, MaxBin2, and/or CONCOCT
  4. Refined bins — via DAS Tool
  5. Quality scores — completeness and contamination via CheckM
  6. Taxonomy — classification via GTDB-Tk
  7. QC summary — MultiQC report

Configuration

ParameterDefaultDescription
stubModefalseTest mode (fast, no real analysis)
skipMegahitfalseSkip MEGAHIT assembler
skipSpadestrueSkip SPAdes assembler
skipProkkatrueSkip Prokka gene annotation
skipConcocttrueSkip CONCOCT binning
skipBinQcfalseSkip bin quality control
skipGtdbfalseSkip GTDB-Tk taxonomy
skipGuncfalseSkip GUNC contamination check
gtdbDbPath to GTDB-Tk database (optional)

Outputs

OutputLocationDescription
AssembliesAssembly/MEGAHIT/Per-sample contig files (.contigs.fa.gz)
BinsGenomeBinning/DASTool/bins/Refined genome bins (.fa)
CheckMGenomeBinning/QC/Completeness and contamination TSV
GTDB-TkTaxonomy/GTDB-Tk/Taxonomy classification TSV
MultiQCmultiqc/Aggregate QC HTML report

Assemblies and bins are automatically parsed and linked to samples in the database after the run completes.


SubMG (ENA Submission Pipeline)

Pipeline: ttubb/submg v1.0.0 Purpose: Automated ENA submission of reads, assemblies, and bins Input: Samples with reads, assemblies, and optionally bins Role required: FACILITY_ADMIN

What SubMG Does

SubMG automates the submission of sequencing data to the European Nucleotide Archive (ENA). It handles the full submission workflow:

  1. Validate inputs — checks study/sample prerequisites and ENA credentials
  2. Generate config — creates SubMG YAML manifests and helper files
  3. Submit — executes submg submit for each manifest
  4. Parse receipts — reads ENA responses and stores accession numbers

Prerequisites

Before running SubMG, ensure:

  • The study has an ENA study accession (PRJEB...)
  • Samples have taxonomy IDs assigned
  • Reads are linked to samples
  • ENA credentials are configured (see ENA Credentials)

Configuration

ParameterDefaultDescription
skipCheckstrueSkip pre-submission validation
submitBinstrueInclude genome bins in the submission
condaEnvsubmgConda environment name
assemblySoftwareMEGAHITAssembler used (for ENA metadata)
completenessSoftwareCheckMQC software used
binningSoftwareMetaBAT2Binner used

Outputs

After a successful run, accession numbers are stored back in the database:

  • Sample accessions — ERS/SAMEA numbers
  • Read accessions — ERX/ERR numbers
  • Assembly accessions — linked to Assembly records
  • Bin accessions — linked to Bin records

MetaxPath (Pathogen Profiling)

Pipeline: hzi-bifo/MetaxPath v0.1.0 Purpose: Long-read clinical metagenomics for pathogen identification, virulence, and AMR detection Input: Long-read FASTQ files (Oxford Nanopore or PacBio) Role required: FACILITY_ADMIN

What MetaxPath Does

MetaxPath is designed for clinical metagenomics on long-read sequencing data. It performs:

  1. Human-read filtering — removes host contamination
  2. Taxonomic profiling — via Metax and Sylph
  3. Assembly — via Flye (or configurable assemblers)
  4. Virulence factor prediction — identifies pathogenic gene markers
  5. AMR detection — predicts antibiotic resistance genes
  6. Reporting — generates HTML reports and species abundance dotplots

Supported Sequencers

  • Oxford Nanopore (MinION, GridION, PromethION)
  • PacBio (Sequel, Revio)

Configuration

ParameterDefaultDescription
sequencerNanoporeSequencing platform (Nanopore or PacBio)
assemblersflyeComma-separated assembler list
threads20CPU threads per process
topn50Number of top species in reports
skipSylphfalseSkip Sylph profiling
skipVirulencefalseSkip virulence factor prediction
skipAmrfalseSkip AMR detection

Database paths (must be configured by admin):

ParameterDescription
metaxDbMetax database prefix (without .json)
metaxDmpDirNCBI taxonomy dump directory
kraken2DbKraken2 database path
sylphDbSylph database path
refIndexHost reference minimap2 index

Outputs

OutputScopeDescription
Profile with VFs/AMRsPer sampleMerged taxonomic profile with virulence and AMR annotations
Top-N HTML reportPer studyCombined species abundance report
Readcount statsPer studyCombined readcount summary
DotplotsPer studySpecies abundance visualizations (PDF)

Reads QC (Quality Overview)

Pipeline: reads-qc v0.1.0 Purpose: Per-sample FASTQ statistics with an HTML summary report Input: Linked sample reads (any scope; runs at study level) Role required: FACILITY_ADMIN

Reads QC computes read count, base count, average quality, and GC content for each sample’s FASTQ files and rolls them up into a study-level HTML overview. It’s a lighter alternative to per-sample FastQC when all you need is a quick comparison across the samples in a study. macOS ARM local runs are supported.

Main outputs:

  • Per-sample read statistics (counts, bases, quality, GC%)
  • Study-level HTML summary report
  • Study-level TSV with per-sample metrics

Study Demo Report

Pipeline: study-demo-report v0.1.0 Purpose: Deterministic HTML, Markdown, and TSV outputs for testing pipeline integration Input: Study + samples Role required: FACILITY_ADMIN

Study Demo Report is a smoke-test pipeline that produces deterministic outputs without any real bioinformatics work. Use it to verify that pipeline execution, weblog ingestion, output parsing, and the Assemblies/Results UI all hang together end-to-end — without burning CPU on real analysis. Useful in CI and as a first run after configuring a new install. macOS ARM local runs are supported.

Main outputs:

  • Study-scope HTML report
  • Markdown summary
  • TSV per-sample table

Order Pipelines

Simulate Reads

Pipeline: simulate-reads v0.2.0 Purpose: Generate dummy FASTQ files for selected order samples Input: Order samples Role required: FACILITY_ADMIN

Simulate Reads generates synthetic FASTQ files and links them back to canonical Read records. It is mainly useful for demos, smoke tests, and exercising downstream order-scoped QC workflows.

Main outputs:

  • Generated FASTQ files linked to Read records
  • Read counts written back to canonical read fields
  • Run-level simulation summary TSV

FASTQ Checksum

Pipeline: fastq-checksum v0.1.0 Purpose: Compute MD5 checksums for linked FASTQ files Input: Linked order FASTQ files Role required: FACILITY_ADMIN

FASTQ Checksum computes canonical MD5 checksums for linked read files and stores them back on the corresponding Read records for downstream validation and submission workflows.

Main outputs:

  • checksum1 / checksum2 on Read records
  • Run-level checksum summary TSV

FastQC

Pipeline: fastqc v0.1.0 Purpose: Run read quality control on linked FASTQ files Input: Linked order FASTQ files Role required: FACILITY_ADMIN

FastQC runs per-sample QC against linked order reads, publishes HTML reports and zip archives, and stores selected summary metrics back onto the canonical Read record.

Main outputs:

  • Per-sample FastQC HTML reports
  • Per-sample FastQC zip archives
  • Read counts and average quality metrics on Read records
  • Run-level FastQC summary TSV

Adding More Pipelines

SeqDesk supports adding custom pipelines through a package structure. See Adding Custom Pipelines for details on creating your own pipeline integrations.