Development
This section covers the development environment setup and testing procedures for contributing to discoal.
Recent Improvements
Memory Optimizations
The codebase has undergone significant memory optimizations:
Ancestry tracking: Replaced fixed-size
ancSitesarrays with dynamic ancestry segment treesSample size limit increased: Maximum sample size increased from 254 to 65,535
Memory usage reduced: 15-99% memory savings across different simulation scenarios
Dynamic allocation: All major data structures now use dynamic memory allocation
These changes eliminate the need for the -DBIG compilation flag, which is now obsolete.
Development Environment
We use conda to manage the development environment. This ensures all developers have consistent tools and dependencies.
Setting Up the Environment
Install conda or miniforge if you haven’t already
Clone the repository:
git clone https://github.com/kern-lab/discoal.git cd discoal
Create the conda environment:
conda env create -f environment.yml
Activate the environment:
conda activate discoal_dev
The environment includes:
Python 3.12
msprime (for comparison testing)
Sphinx and sphinx-rtd-theme (for documentation)
Building discoal
With the development environment activated:
make clean
make discoal
For testing, you’ll need both optimized and legacy versions:
make test_binaries
This builds:
discoal_edited: The optimized version with all memory improvementsdiscoal_legacy_backup: The reference version from the master-backup branch
Testing
discoal has a comprehensive testing framework to ensure code changes maintain correctness and performance.
Unit Tests
discoal has a comprehensive unit testing framework using the Unity test framework. The unit tests cover all major components of the codebase.
Running Unit Tests
To run all unit tests individually:
make run_tests
To run all tests using the unified test runner:
make run_all_tests
To run a specific test suite:
make test_node # Test node operations
make test_event # Test event handling
# ... etc
Test Coverage
The unit test suite includes 77 tests across 9 test files:
Node Operations (
test_node.c- 3 tests):Node initialization and property setting
Creation of new rooted nodes
Basic node structure validation
Event Handling (
test_event.c- 2 tests):Event structure initialization
Event property manipulation
Node Operations (
test_node_operations.c- 4 tests):Creating and destroying nodes
Adding and removing nodes from active set
Node selection by population
Population size tracking
Mutation Tracking (
test_mutations.c- 3 tests):Basic node creation with mutations
Mutation array access and manipulation
Manual mutation addition
Ancestry Segment Trees (
test_ancestry_segment.c- 13 tests):Segment creation and validation
Reference counting (retain/release)
Shallow vs deep copying
Tree merging and splitting operations
Ancestry count queries
NULL safety checks
Active Material Segments (
test_active_segment.c- 12 tests):Active material initialization
Site activity queries
Fixed region removal
Segment coalescing
AVL tree integration
Verification functions
Trajectory Handling (
test_trajectory.c- 12 tests):Trajectory capacity management
File cleanup for rejected trajectories
Memory-mapped file operations
Large file handling
File persistence and cleanup
Concurrent trajectory management
Coalescence and Recombination (
test_coalescence_recombination.c- 11 tests):Basic coalescence operations
Ancestry merging during coalescence
Recombination with ancestry splitting
Gene conversion functionality
Mutation collection for output
Population-specific operations
Memory Management (
test_memory_management.c- 17 tests):Dynamic array initialization and cleanup
Capacity growth for breakpoints, nodes, and mutations
Stress testing with large allocations
Reinitialization handling
NULL pointer safety
Integrated memory usage scenarios
Building Individual Tests
Each test suite can be built separately:
make test_ancestry_segment
make test_memory_management
# etc.
Test Development
When adding new functionality:
Create a new test file in
test/unit/following the naming conventiontest_<component>.cInclude the Unity framework headers
Write setUp() and tearDown() functions for test fixtures
Add test functions following the pattern
test_<functionality>_<scenario>()Update the Makefile with build rules for the new test
Add the test to
test_runner.cfor unified execution
Debugging Tests
To debug a failing test:
# Build with debug symbols
gcc -g -O0 -I. -I./test/unit -o test_name test/unit/test_name.c test/unit/unity.c \
discoalFunctions.c ranlibComplete.c alleleTraj.c ancestrySegment.c \
ancestrySegmentAVL.c ancestryVerify.c activeSegment.c -lm -fcommon
# Run with gdb
gdb ./test_name
Unity Test Framework
The tests use the official Unity test framework (https://github.com/ThrowTheSwitch/Unity) which provides:
Rich assertion macros (TEST_ASSERT_EQUAL, TEST_ASSERT_FLOAT_WITHIN, etc.)
Automatic test discovery and execution
Clear failure messages with file and line information
Test fixtures with setUp/tearDown support
The framework files are located in test/unit/:
* unity.h - Main header file
* unity.c - Implementation
* unity_internals.h - Internal definitions
Quick Testing Reference
Common testing commands during development:
# Run all unit tests
make run_all_tests
# Run specific test suite
make test_memory_management && ./test_memory_management
# Clean and rebuild all tests
make clean && make run_tests
# Quick validation during development
cd testing/ && ./focused_validation_suite.sh
# Full validation before commits
cd testing/ && ./comprehensive_validation_suite.sh
# Statistical validation (if needed)
cd testing/ && ./statistical_validation_suite.sh
# Run comprehensive tests (optimized vs legacy from master-backup)
make test_comprehensive
# Run comprehensive tests (current working dir vs HEAD of branch)
make test_comprehensive_head
Make Targets for Comprehensive Testing
The Makefile provides convenient targets that build the required binaries and run the comprehensive test suite:
make test_comprehensive:Builds
discoal_edited(optimized version from current working directory)Builds
discoal_legacy_backupfrom themaster-backupbranchRuns the comprehensive validation suite comparing these two versions
Use this to ensure your optimizations maintain compatibility with the original implementation
make test_comprehensive_head:Builds
discoal_edited(optimized version from current working directory)Builds
discoal_legacy_backupfrom HEAD of the current branchRuns the comprehensive validation suite comparing working changes against the last commit
Use this to measure performance improvements of uncommitted changes
These targets automatically handle the complex process of building from different sources and are the recommended way to run comprehensive tests during development.
Comprehensive Validation Suite
The primary testing framework compares the optimized version against the legacy version to ensure identical output:
cd testing/
./comprehensive_validation_suite.sh
This suite:
Runs 27 test cases covering all documented features
Compares output between optimized and legacy versions
Profiles memory usage and performance
Reports any differences or regressions
Test categories include:
Basic coalescent simulations
Recombination and gene conversion
Multiple populations with migration
Selection (hard/soft/partial sweeps)
Complex demographic scenarios
Stress tests with extreme parameters
Focused Validation Suite
For rapid testing during development:
cd testing/
./focused_validation_suite.sh
This runs a subset of critical tests for quick feedback.
Statistical Validation Suite
To ensure optimizations don’t introduce statistical biases:
cd testing/
./statistical_validation_suite.sh # 100 replicates, auto mode
./statistical_validation_suite.sh parallel 50 # 50 replicates, parallel mode
./statistical_validation_suite.sh 200 # 200 replicates
This suite:
Runs multiple replicates of each test case
Extracts segregating sites statistics
Performs Kolmogorov-Smirnov tests
Verifies distributions are statistically equivalent
msprime Comparison Suite
To validate discoal against the well-established msprime coalescent simulator:
cd testing/
./msprime_comparison_suite.sh
This suite compares discoal and msprime across:
Neutral models with and without recombination
Various sample sizes and mutation rates
Selection models (hard sweeps with different strengths and ages)
The comparison includes runtime performance metrics and statistical tests to ensure equivalent output distributions.
Parameter Scaling for msprime Comparisons
When comparing discoal with msprime, careful parameter conversion is required due to different conventions:
Population Size: discoal uses scaled parameters assuming Ne=1. For msprime, we use Ne=0.5 with diploid samples (n_samples/2) and ploidy=2 to match discoal’s haploid output.
Mutation Rate:
discoal: θ = 4 × Ne × μ × L (over whole locus)
msprime: mutation_rate = θ / (4 × Ne × L) (per base pair)
Recombination Rate:
discoal: ρ = 4 × Ne × r × L
msprime: recombination_rate = ρ / (4 × Ne × L)
Selection Coefficient (for sweeps):
discoal: α = 2 × Ne × s
msprime: s = α / (2 × Ne) × 2 (factor of 2 for msprime’s fitness model)
Sweep Timing:
When τ > 0 in discoal, we rescale to Ne=0.25 in msprime for consistent time units
Allele frequencies use the original Ne to ensure valid [0,1] bounds
These scaling conventions ensure that both simulators produce statistically equivalent results, as validated by the comparison suite.
Development Workflow
Create a feature branch from the main development branch
Make changes to the code
Run focused tests frequently during development:
cd testing/ && ./focused_validation_suite.sh
Run comprehensive tests before committing:
cd testing/ && ./comprehensive_validation_suite.sh
Document performance improvements in commit messages
Submit pull request with test results
Code Organization
Key source files:
discoal_multipop.c: Main program entry and command-line parsingdiscoalFunctions.c: Core simulation functionsalleleTraj.c: Allele trajectory calculations for sweepsancestrySegment.c: Memory-efficient ancestry trackingactiveSegment.c: Active material trackingdiscoal.h: Main header with data structures
Memory Optimizations
Recent optimizations have achieved significant memory reductions:
Dynamic allocation for all major arrays
Segment trees for ancestry tracking (80% reduction)
Reference counting for segment sharing (10-16% additional reduction)
AVL tree indexing for high-recombination scenarios
Memory-mapped files for sweep trajectories
When developing, maintain these optimizations and ensure new features don’t regress memory usage.
Documentation
To build the documentation locally:
cd docs/
make html
View the built documentation:
open _build/html/index.html # macOS
xdg-open _build/html/index.html # Linux
Before submitting changes, ensure documentation is updated for any new features or parameter changes.