Digital Embryo — Bioinformatics Command-Line Tutorial
Welcome to the Digital Embryo bioinformatics tutorial! This hands-on course will take you from command-line basics to analyzing real RNA-seq data, with no prior coding knowledge required.
Learning Path
graph TD
A[Module 0: Setup] --> B[Module 1: Directory Tree]
B --> C[Module 2: Working with Files]
C --> D[Module 3: Manipulating Files]
D --> E[Module 4: Wildcards]
E --> F[Module 5: Pipes & Filters]
F --> G[Module 6: Regular Expressions]
G --> H[Module 7: Process Management]
H --> I[Module 8: Editing & Compression]
I --> J[Module 9: TSV Wrangling]
J --> K[Module 10: RNA-seq Primer]
K --> L[Module 11: FASTQ Analysis]
L --> M[Module 12: Conda & QC]
M --> N[Module 13: Download Data]
N --> O[Module 14: View-Run-View]
O --> P[Module 15: Capstone]
style A fill:#e1f5fe
style I fill:#e1f5fe
style J fill:#fff3e0
style P fill:#c8e6c9
Unix Fundamentals (Modules 0-8)
Build a rock-solid foundation in Unix command-line skills with comprehensive, beginner-friendly modules:
- Module 0: Setup & Understanding Your Environment — Learn what terminals and shells are, understand command structure (command + options + arguments), and set up your workspace.
- Module 1: The Directory Tree & Navigation — Master the Unix filesystem tree, understand absolute vs relative paths, and navigate with pwd, cd, and ls.
- Module 2: Working with Files — Create, view, and inspect files with cat, less, head, tail, and wc. Learn when to use each tool.
- Module 3: Manipulating Files (Safely!) — Copy, move, rename, and delete files while learning critical safety practices and the "interactive mode" flag.
- Module 4: Wildcards & Pattern Matching — Use *, ?, and [...] to work with multiple files efficiently. Essential for batch processing genomic data.
- Module 5: Pipes, Redirects & Filters — Chain commands together with pipes and master grep, cut, sort, uniq, and wc for data wrangling.
- Module 6: Regular Expressions for Bioinformatics — Learn powerful pattern matching for validating sample IDs, finding sequence motifs, and parsing file formats.
- Module 7: Process Management & Job Control — Monitor programs, rescue frozen terminals, manage background jobs, and handle long-running analyses.
- Module 8: Text Editing & File Compression — Edit files with nano, work with compressed genomic data (gzip), and verify file integrity with checksums.
Bioinformatics Applications (Modules 9-15)
Apply your Unix skills to real bioinformatics workflows:
- Module 9: Advanced TSV Data Wrangling — Chain commands to wrangle tabular sample manifests.
- Module 10: RNA-seq Primer — Preview the RNA-seq workflow and vocabulary.
- Module 11: FASTQ Analysis — Inspect sequencing reads and compute QC statistics.
- Module 12: Conda & QC Tools — Build conda environments and run FastQC/MultiQC.
- Module 13: Download Real Data — Retrieve sequencing runs from public archives (SRA/ENA).
- Module 14: View-Run-View Loop — Iterate on analysis pipelines with sanity checks.
- Module 15: Capstone Script — Automate the complete QC workflow end-to-end.
Course Philosophy
Type it, don't paste it
This tutorial emphasizes muscle memory. Type commands first, then copy/paste to check your work. Your future self will thank you.
Look before you loop
Always examine data with head, tail, less, or zless before writing scripts that process many files.
Help first, experiment second
Always run command --help (or man command) the first time you encounter a new tool. Understanding the options saves time and prevents mistakes.
What You'll Learn
- Command-line fundamentals: navigation, file operations, text processing
- Data wrangling: pipes, redirects, grep, awk, and cut
- Process management: background jobs, monitoring, and safe termination
- Bioinformatics tools: conda environments, FastQC, MultiQC, seqtk
- Real data analysis: downloading from SRA/ENA, quality control workflows
- Scripting: building robust, reusable analysis scripts
Prerequisites
- A computer with WSL2 (Windows), Terminal (macOS), or Linux
- VS Code (recommended)
- Willingness to type commands and learn by doing
Time Commitment
- Total: ~8-10 hours
- Per module: 30-90 minutes
- Format: Self-paced with email exit tickets
Getting Started
- Start with Module 0: Setup & Expectations
- Keep the Cheat Sheet handy for quick reference
- Type commands first, copy/paste second
- Submit exit tickets as you complete each module
Ready to begin? Let's build those command-line skills! 🧬