Skip to Content
Sequencing FilesFile Discovery & Auto-Detect

File Discovery & Auto-Detect

SeqDesk scans a configurable data directory to discover sequencing files. It automatically pairs forward and reverse reads and matches files to samples.

How Scanning Works

The file scanner:

  1. Reads the configured base path (site.dataBasePath)
  2. Searches up to N levels deep (sequencingFiles.scanDepth, default: 2)
  3. Filters by allowed extensions (default: .fastq.gz, .fq.gz, .fastq, .fq)
  4. Skips directories matching ignore patterns (default: **/tmp/**, **/undetermined/**)
  5. Caches results for performance

Supported File Naming

The scanner recognizes these naming patterns for R1/R2 pairing:

PatternExample
{sample}_R1.fastq.gzHG001_R1.fastq.gz
{sample}_R2.fastq.gzHG001_R2.fastq.gz
{sample}_1.fastq.gzHG001_1.fastq.gz
{sample}_2.fastq.gzHG001_2.fastq.gz
{sample}_R1_001.fastq.gzHG001_R1_001.fastq.gz
{sample}.R1.fastq.gzHG001.R1.fastq.gz
Illumina standardHG001_S1_L001_R1_001.fastq.gz

The scanner strips R1/R2 indicators, lane numbers (_L001), and sample indices (_S1) to extract the sample identifier for matching.

Pairing Logic

Files are grouped by their extracted sample identifier:

  • If both R1 and R2 are found → paired-end
  • If only one file → single-end (if allowSingleEnd is true)
  • Files that do not match a pairing pattern are listed as unmatched

Configuration

File discovery settings can be configured through:

  • Config filesequencingFiles section in seqdesk.config.json
  • Environment variablesSEQDESK_FILES_* variables
  • Admin UI — under Data Storage settings
SettingDefaultDescription
extensions.fastq.gz, .fq.gz, .fastq, .fqFile types to include
scanDepth2Directory levels to search (1–10)
allowSingleEndtrueInclude unpaired files
ignorePatterns**/tmp/**, **/undetermined/**Glob patterns to skip

Testing the Configuration

In the admin settings, you can test your data path configuration:

  • Validate path — checks that the directory exists and is readable
  • Count files — shows how many matching files are found
  • Simulate discovery — previews what the scanner would find

This helps verify the configuration before using it in production.

Scan Caching

Scan results are cached to avoid repeated filesystem access. The cache is invalidated when:

  • The data path setting changes
  • File extension settings change
  • A manual rescan is triggered from the admin UI