Nextflow’s data-centric paradigm makes it ideal for bioinformatics workflows. Let’s build an RNA-seq quality control pipeline with detailed explanations of each component.

1. Pipeline Architecture
Our pipeline will follow this structure:
my_pipeline/ ├── main.nf # Workflow logic ├── nextflow.config # System configuration └── data/ # Input FASTQs (create this)
2. Understanding the Main Workflow (main.nf)
// Define input parameters
params.reads = "data/*.fastq.gz"
// Process definition
process FastQC {
tag "FASTQC $sample_id" // Log identifier
publishDir "results/fastqc", mode: 'copy' // Output directory
input:
tuple val(sample_id), path(read) // Structured input
output:
path "*_fastqc.*" // Capture all FastQC outputs
script:
"""
fastqc -q $read // -q for quiet mode
"""
}
// Workflow definition
workflow {
// Create input channel
samples = Channel.fromFilePairs(params.reads)
// Execute process
FastQC(samples)
// Optional: Add onComplete hook
onComplete { log.info "Pipeline completed" }
}
Key Improvements:
- Used
fromFilePairs
for paired-end readiness - Added tuple input with sample identifiers
- Included execution hooks for monitoring
3. Configuration Deep Dive (nextflow.config)
profiles {
docker {
docker.enabled = true
process.container = 'staphb/fastqc:0.11.9'
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
}
}
// Default parameters
params {
max_memory = '8.GB'
max_cpus = 4
max_time = '2.h'
}
// Execution policy
executor {
queueSize = 100
}
Why This Matters:
- Multiple containerization options via
profiles
- Resource constraints prevent overconsumption
- Queue management for large datasets
4. Execution with Advanced Options
# Test run with 2 cores
nextflow run main.nf -profile docker --max_cpus 2
# Resume after interruption
nextflow run main.nf -resume
# View execution report
nextflow log -f name,status,duration $run_id
Pro Tips:
- Use
-resume
to continue failed runs - Monitor resources with
-with-report
- Test with
-entry
for complex workflows
5. Extending the Pipeline
process TrimGalore {
container 'quay.io/biocontainers/trim-galore:0.6.7--0'
input:
tuple val(id), path(reads)
output:
tuple val(id), path("*val*.fq.gz"), emit: trimmed
script:
"""
trim_galore --paired ${reads} -o .
"""
}
// Connect processes
workflow {
raw_data = Channel.fromFilePairs(params.reads)
trimmed_data = TrimGalore(raw_data)
FastQC(trimmed_data)
}
Added Value:
- Chained quality control steps
- Demonstrated process communication
- Showed Biocontainer integration
Essential Resources:
NF-Core Pipelines ·
Bioconda Packages ·
BioContainers Registry
This enhanced version provides better error handling, resource management, and clear pathway for expansion. Always validate with test datasets before production use!