How to combine analyses across multiple loci¶
Single locus analyses only provide a narrow view of the evolutionary history of a group.
After assembling individual gene data sets and phylogenies using Physcraper, it is straigtforward to combine results (alignments and data) from those analyses to obtain species tree estimates.
The multi_loci.py
script combines results from multiple single locus Physcraper runs and generates concatenated and astral input files.
Usage:
multi_loci.py [-h] [-d MULTIPLE_RUNS_FOLDER] [-o OUTPUT] [-f {concatenate,astral}]
[-s {fasta,nexus}] [-m {False,True}]
Arguments:
- -h , --help
Show the help message and exit.
- -d DIRECTORY_NAME, --locus_runs_folder DIRECTORY_NAME
A name (and path) of a directory containing at least two output directories from different individual Phsycraper runs.
- -o DIRECTORY_NAME, --output DIRECTORY_NAME
A name (and path) for a directory to write the combined results of multiple Physcraper runs. If it exists, it will be overwritten.
- -f {concatenate,astral}, --format {concatenate,astral}
Format of combined output file.
- -s {fasta,nexus}, --schema {fasta,nexus}
Combined output file alignment schema.
- -m {False, True}, --include_missing {False, True}
Where uneven numbers of sequences are available, concatenate with gaps. default to
False
.
Astral¶
To generate input files for an ASTRAL species tree analysis, (https://github.com/smirarab/ASTRAL) use -f astral.
This will generate two files in the output directory.
genetrees.new
, a concatenation of all of the genetrees produced in individual analyses,
and mapping.txt
, a text file linking the tip lables in each of the gene trees to taxon names.
e.g.
multi_loci.py -d tests/data/precooked/multi_loc/ -f astral -o mini_species_tree
You can run Astral diretcly on these files e.g.
java -jar astral.5.7.5.jar -i mini_species_tree/genetrees.new -a mini_species_tree/mappings.txt
Concatenation¶
To concatenate multiple loci into a single alignment use -f concatenate. Default settings only generate concatenated loci for taxa where there is a sequence at each locus .
- e.g.
- multi_loci.py -d tests/data/precooked/multi_loc/ -f concatenate -s fasta -o mini_concat
To generate concatenated taxa with missing loci use -m (for include missing data).
multi_loci.py -d tests/data/precooked/multi_loc/ -f concatenate -s nexus -m -o mini_concat_gaps
This will generate a concatenated alignment in the output directory with the name ‘concat.aln’ in the schema selected using -s (either fasta or nexus). Each concatenated sequences is labeled with the taxon name and an integer.
The sequences from each individual run comprising the concatenated sequence are described in “concat_info.txt” in the output directory.
SVD quartets¶
To write out a concatenated Nexus file with a taxon partitions block linking sequences for the same taxa, for use in SVD quartets analyses (tutorial at http://evomics.org/learning/phylogenetics/svdquartets/) use -f svdq
This will generate a Nexus file of concatenated sequences linked together by their taxon assignment in a taxon block. The sequences from each individual run comprising the concatenated sequence are described in “concat_info.txt” in the output directory, as above. e.g.
multi_loci.py -d tests/data/precooked/multi_loc/ -f svdq -m -o svdq_out
This file can be used to run SVDQ in Paup e.g.
paup4a168_ubuntu64 mini_concat2/svdq.nex
svdq evalq=all taxpartition=species nthreads=ncpus;