Results: Assemblies & Bins
After a pipeline completes, SeqDesk parses the outputs and links them to samples in the database. This page covers the result types and their data structure.
Assemblies
An assembly represents a set of assembled contigs for a sample.
| Field | Description |
|---|---|
| Assembly Name | Identifier (usually derived from sample name) |
| Assembly File | Path to the contig FASTA file |
| Assembly Accession | ENA accession if submitted (GCA…) |
| Sample | The sample this assembly belongs to |
| Pipeline Run | The run that produced this assembly |
Preferred Assembly
Each sample can have a preferred assembly — the assembly used for downstream analysis and ENA submission. When multiple assemblies exist (e.g., from MEGAHIT and SPAdes), the facility admin selects which one to use.
Output Location
MAG pipeline assemblies are found at:
{runDir}/Assembly/MEGAHIT/{sampleName}.contigs.fa.gzGenome Bins (MAGs)
Genome bins are individual genomes extracted from metagenomic assemblies through binning algorithms.
| Field | Description |
|---|---|
| Bin Name | Identifier (e.g., sample1.001) |
| Bin File | Path to the bin FASTA file |
| Bin Accession | ENA accession if submitted |
| Completeness | CheckM completeness score (0–100%) |
| Contamination | CheckM contamination score (0–100%) |
| Sample | The sample this bin belongs to |
| Pipeline Run | The run that produced this bin |
Quality Metrics
Bin quality is assessed by CheckM, which estimates:
- Completeness — what percentage of a complete genome is present
- Contamination — what percentage of the bin is from other organisms
Common quality thresholds:
| Category | Completeness | Contamination |
|---|---|---|
| High quality | ≥ 90% | < 5% |
| Medium quality | ≥ 50% | < 10% |
| Low quality | < 50% | any |
Bin Sources
The MAG pipeline can produce bins from multiple binning tools:
- MetaBAT2 — default binning
- MaxBin2 — alternative binning
- CONCOCT — optional (disabled by default)
- DAS Tool — bin refinement, combines results from other tools
DAS Tool refined bins (at GenomeBinning/DASTool/bins/) are preferred.
Pipeline Artifacts
Beyond assemblies and bins, pipeline runs produce additional artifacts:
| Type | Description |
|---|---|
reads | Processed/filtered reads |
assembly | Assembled contigs |
bins | Genome bins |
qc_report | Quality control reports (MultiQC, FastQC) |
alignment | BAM alignment files |
Each artifact tracks:
- File path, size, and checksum
- Which pipeline step produced it
- Associated study and sample
- Tool-specific metadata (JSON)
Assemblies Viewer
The Assemblies page (/assemblies) provides a centralized view of all
genome assemblies across studies. The table shows:
| Column | Description |
|---|---|
| Study | Study title and associated order number |
| Sample | Sample identifier |
| Final Assembly | The selected assembly with file path and pipeline run info |
| Selection Mode | How the final assembly was chosen (see below) |
| Available Count | Number of alternative assemblies for the sample |
| Download | Download the assembly FASTA file |
Assembly Selection Modes
| Mode | Meaning |
|---|---|
| Marked Final | Admin explicitly selected this assembly as preferred |
| Automatic | System selected the latest available assembly |
| Missing Preferred | Admin marked a preferred assembly, but it is no longer available |
| Unavailable | No assembly exists for this sample |
By default, only facility admins can download assemblies. Enable the
allowUserAssemblyDownload setting to let researchers download their own
assembly files.
Viewing Results
Results are accessible from multiple places:
- Assemblies page — centralized view of all assemblies across studies
- Study page → Pipelines tab — overview of all runs and their results
- Sample detail page — assemblies and bins for a specific sample
- Pipeline run detail — all outputs from a single run