Basic Usage
Command Line Structure
At its most basic, a discoal command line looks like:
./discoal sampleSize numReplicates nSites -t theta
Where:
sampleSize: The number of chromosomes to sample (maximum: 65,535)numReplicates: The number of independent samples to generatenSites: The number of discrete sites in the sequencetheta: The population mutation rate (4N₀μ) where N₀ is the current population size and μ is the mutation rate per generation for the entire locus
Example
$ ./discoal 3 2 100 -t 2
./discoal 3 2 100 -t 2
1665047201 686400060
//
segsites: 4
positions: 0.01835 0.09557 0.46556 0.72880
1000
0110
0111
//
segsites: 1
positions: 0.07594
0
1
0
Output Format
The output follows Hudson’s ms format:
Header: Command line and random number seeds
Sample blocks: Each starting with
//Segregating sites: Number of variable sites
Positions: Relative positions along the sequence (0-1)
Haplotypes: Binary strings where 0 is ancestral and 1 is derived
This format is compatible with existing ms analysis software.
Common Parameters
Basic mutation and recombination:
# With recombination rate rho = 4Nr
./discoal 10 5 10000 -t 10 -r 20
# With gene conversion
./discoal 10 5 10000 -t 10 -r 20 -g 10 300
Population size changes:
# Bottleneck: size drops to 10% at time 0.5, recovers to 80% at time 1.2
./discoal 10 5 10000 -t 10 -en 0.5 0 0.1 -en 1.2 0 0.8
Random Number Seeds
By default, seeds are taken from /dev/urandom, making discoal suitable for cluster computing. You can specify seeds manually:
./discoal 10 5 10000 -t 10 -d 12345 67890
Time Units
discoal measures time in units of 4N₀ generations (coalescent time units), where N₀ is the current effective population size. This is the same convention used by ms.