Skip to Content
Pipelines & AnalysisResults: Assemblies & Bins

Results: Assemblies & Bins

After a pipeline completes, SeqDesk parses the outputs and links them to samples in the database. This page covers the result types and their data structure.

Assemblies

An assembly represents a set of assembled contigs for a sample.

FieldDescription
Assembly NameIdentifier (usually derived from sample name)
Assembly FilePath to the contig FASTA file
Assembly AccessionENA accession if submitted (GCA…)
SampleThe sample this assembly belongs to
Pipeline RunThe run that produced this assembly

Preferred Assembly

Each sample can have a preferred assembly — the assembly used for downstream analysis and ENA submission. When multiple assemblies exist (e.g., from MEGAHIT and SPAdes), the facility admin selects which one to use.

Output Location

MAG pipeline assemblies are found at:

{runDir}/Assembly/MEGAHIT/{sampleName}.contigs.fa.gz

Genome Bins (MAGs)

Genome bins are individual genomes extracted from metagenomic assemblies through binning algorithms.

FieldDescription
Bin NameIdentifier (e.g., sample1.001)
Bin FilePath to the bin FASTA file
Bin AccessionENA accession if submitted
CompletenessCheckM completeness score (0–100%)
ContaminationCheckM contamination score (0–100%)
SampleThe sample this bin belongs to
Pipeline RunThe run that produced this bin

Quality Metrics

Bin quality is assessed by CheckM, which estimates:

  • Completeness — what percentage of a complete genome is present
  • Contamination — what percentage of the bin is from other organisms

Common quality thresholds:

CategoryCompletenessContamination
High quality≥ 90%< 5%
Medium quality≥ 50%< 10%
Low quality< 50%any

Bin Sources

The MAG pipeline can produce bins from multiple binning tools:

  1. MetaBAT2 — default binning
  2. MaxBin2 — alternative binning
  3. CONCOCT — optional (disabled by default)
  4. DAS Tool — bin refinement, combines results from other tools

DAS Tool refined bins (at GenomeBinning/DASTool/bins/) are preferred.

Pipeline Artifacts

Beyond assemblies and bins, pipeline runs produce additional artifacts:

TypeDescription
readsProcessed/filtered reads
assemblyAssembled contigs
binsGenome bins
qc_reportQuality control reports (MultiQC, FastQC)
alignmentBAM alignment files

Each artifact tracks:

  • File path, size, and checksum
  • Which pipeline step produced it
  • Associated study and sample
  • Tool-specific metadata (JSON)

Assemblies Viewer

The Assemblies page (/assemblies) provides a centralized view of all genome assemblies across studies. The table shows:

ColumnDescription
StudyStudy title and associated order number
SampleSample identifier
Final AssemblyThe selected assembly with file path and pipeline run info
Selection ModeHow the final assembly was chosen (see below)
Available CountNumber of alternative assemblies for the sample
DownloadDownload the assembly FASTA file

Assembly Selection Modes

ModeMeaning
Marked FinalAdmin explicitly selected this assembly as preferred
AutomaticSystem selected the latest available assembly
Missing PreferredAdmin marked a preferred assembly, but it is no longer available
UnavailableNo assembly exists for this sample

By default, only facility admins can download assemblies. Enable the allowUserAssemblyDownload setting to let researchers download their own assembly files.

Viewing Results

Results are accessible from multiple places:

  • Assemblies page — centralized view of all assemblies across studies
  • Study page → Pipelines tab — overview of all runs and their results
  • Sample detail page — assemblies and bins for a specific sample
  • Pipeline run detail — all outputs from a single run