File Discovery & Auto-Detect
SeqDesk scans a configurable data directory to discover sequencing files. It automatically pairs forward and reverse reads and matches files to samples.
How Scanning Works
The file scanner:
- Reads the configured base path (
site.dataBasePath) - Searches up to N levels deep (
sequencingFiles.scanDepth, default: 2) - Filters by allowed extensions (default:
.fastq.gz,.fq.gz,.fastq,.fq) - Skips directories matching ignore patterns (default:
**/tmp/**,**/undetermined/**) - Caches results for performance
Supported File Naming
The scanner recognizes these naming patterns for R1/R2 pairing:
| Pattern | Example |
|---|---|
{sample}_R1.fastq.gz | HG001_R1.fastq.gz |
{sample}_R2.fastq.gz | HG001_R2.fastq.gz |
{sample}_1.fastq.gz | HG001_1.fastq.gz |
{sample}_2.fastq.gz | HG001_2.fastq.gz |
{sample}_R1_001.fastq.gz | HG001_R1_001.fastq.gz |
{sample}.R1.fastq.gz | HG001.R1.fastq.gz |
| Illumina standard | HG001_S1_L001_R1_001.fastq.gz |
The scanner strips R1/R2 indicators, lane numbers (_L001), and sample indices
(_S1) to extract the sample identifier for matching.
Pairing Logic
Files are grouped by their extracted sample identifier:
- If both R1 and R2 are found → paired-end
- If only one file → single-end (if
allowSingleEndis true) - Files that do not match a pairing pattern are listed as unmatched
Configuration
File discovery settings can be configured through:
- Config file —
sequencingFilessection inseqdesk.config.json - Environment variables —
SEQDESK_FILES_*variables - Admin UI — under Data Storage settings
| Setting | Default | Description |
|---|---|---|
extensions | .fastq.gz, .fq.gz, .fastq, .fq | File types to include |
scanDepth | 2 | Directory levels to search (1–10) |
allowSingleEnd | true | Include unpaired files |
ignorePatterns | **/tmp/**, **/undetermined/** | Glob patterns to skip |
Testing the Configuration
In the admin settings, you can test your data path configuration:
- Validate path — checks that the directory exists and is readable
- Count files — shows how many matching files are found
- Simulate discovery — previews what the scanner would find
This helps verify the configuration before using it in production.
Scan Caching
Scan results are cached to avoid repeated filesystem access. The cache is invalidated when:
- The data path setting changes
- File extension settings change
- A manual rescan is triggered from the admin UI