Skip to Content
Pipelines & AnalysisAvailable Pipelines

Available Pipelines

SeqDesk ships with built-in support for three bioinformatics pipelines. Each pipeline is a Nextflow workflow designed for a specific analysis type.

MAG (Metagenome-Assembled Genomes)

Pipeline: nf-core/mag v3.4.0 Purpose: Assembly and binning of metagenomes Input: Paired-end FASTQ reads Role required: FACILITY_ADMIN

What MAG Does

The MAG pipeline takes raw metagenomic sequencing reads and produces:

  1. Quality-controlled reads — trimmed and host-removed
  2. Assembled contigs — via MEGAHIT and/or SPAdes
  3. Genome bins — via MetaBAT2, MaxBin2, and/or CONCOCT
  4. Refined bins — via DAS Tool
  5. Quality scores — completeness and contamination via CheckM
  6. Taxonomy — classification via GTDB-Tk
  7. QC summary — MultiQC report

Configuration

ParameterDefaultDescription
stubModefalseTest mode (fast, no real analysis)
skipMegahitfalseSkip MEGAHIT assembler
skipSpadestrueSkip SPAdes assembler
skipProkkatrueSkip Prokka gene annotation
skipConcocttrueSkip CONCOCT binning
skipBinQcfalseSkip bin quality control
skipGtdbfalseSkip GTDB-Tk taxonomy
skipGuncfalseSkip GUNC contamination check
gtdbDbPath to GTDB-Tk database (optional)

Outputs

OutputLocationDescription
AssembliesAssembly/MEGAHIT/Per-sample contig files (.contigs.fa.gz)
BinsGenomeBinning/DASTool/bins/Refined genome bins (.fa)
CheckMGenomeBinning/QC/Completeness and contamination TSV
GTDB-TkTaxonomy/GTDB-Tk/Taxonomy classification TSV
MultiQCmultiqc/Aggregate QC HTML report

Assemblies and bins are automatically parsed and linked to samples in the database after the run completes.


SubMG (ENA Submission Pipeline)

Pipeline: ttubb/submg Purpose: Automated ENA submission of reads, assemblies, and bins Input: Samples with reads, assemblies, and optionally bins Role required: FACILITY_ADMIN

What SubMG Does

SubMG automates the submission of sequencing data to the European Nucleotide Archive (ENA). It handles the full submission workflow:

  1. Validate inputs — checks study/sample prerequisites and ENA credentials
  2. Generate config — creates SubMG YAML manifests and helper files
  3. Submit — executes submg submit for each manifest
  4. Parse receipts — reads ENA responses and stores accession numbers

Prerequisites

Before running SubMG, ensure:

  • The study has an ENA study accession (PRJEB...)
  • Samples have taxonomy IDs assigned
  • Reads are linked to samples
  • ENA credentials are configured (see ENA Credentials)

Configuration

ParameterDefaultDescription
skipCheckstrueSkip pre-submission validation
submitBinstrueInclude genome bins in the submission
condaEnvsubmgConda environment name
assemblySoftwareMEGAHITAssembler used (for ENA metadata)
completenessSoftwareCheckMQC software used
binningSoftwareMetaBAT2Binner used

Outputs

After a successful run, accession numbers are stored back in the database:

  • Sample accessions — ERS/SAMEA numbers
  • Read accessions — ERX/ERR numbers
  • Assembly accessions — linked to Assembly records
  • Bin accessions — linked to Bin records

MetaxPath (Pathogen Profiling)

Pipeline: hzi-bifo/MetaxPath v0.1.0 Purpose: Long-read clinical metagenomics for pathogen identification, virulence, and AMR detection Input: Long-read FASTQ files (Oxford Nanopore or PacBio) Role required: FACILITY_ADMIN

What MetaxPath Does

MetaxPath is designed for clinical metagenomics on long-read sequencing data. It performs:

  1. Human-read filtering — removes host contamination
  2. Taxonomic profiling — via Metax and Sylph
  3. Assembly — via Flye (or configurable assemblers)
  4. Virulence factor prediction — identifies pathogenic gene markers
  5. AMR detection — predicts antibiotic resistance genes
  6. Reporting — generates HTML reports and species abundance dotplots

Supported Sequencers

  • Oxford Nanopore (MinION, GridION, PromethION)
  • PacBio (Sequel, Revio)

Configuration

ParameterDefaultDescription
sequencerNanoporeSequencing platform (Nanopore or PacBio)
assemblersflyeComma-separated assembler list
threads20CPU threads per process
topn50Number of top species in reports
skipSylphfalseSkip Sylph profiling
skipVirulencefalseSkip virulence factor prediction
skipAmrfalseSkip AMR detection

Database paths (must be configured by admin):

ParameterDescription
metaxDbMetax database prefix (without .json)
metaxDmpDirNCBI taxonomy dump directory
kraken2DbKraken2 database path
sylphDbSylph database path
refIndexHost reference minimap2 index

Outputs

OutputScopeDescription
Profile with VFs/AMRsPer sampleMerged taxonomic profile with virulence and AMR annotations
Top-N HTML reportPer studyCombined species abundance report
Readcount statsPer studyCombined readcount summary
DotplotsPer studySpecies abundance visualizations (PDF)

Adding More Pipelines

SeqDesk supports adding custom pipelines through a package structure. See Adding Custom Pipelines for details on creating your own pipeline integrations.