Introduction to Physcraper

The Physcraper framework

While genome scale data is increasing rapidly, there are still large quantities of single locus nucleotide sequence data being uploaded to the US National Center on Biotechnology Information (NCBI) database GenBank. These data are often appropriate for looking at phylogenetic relationships, and have the advantage of being orthologous to genetic sequences that have been used to construct existing phylogenetic trees.

If you have access to a single gene or multilocus nucleotide alignment, and a phylogenetic tree, Physcraper automates adding nucleotide sequences of new lineage samples into your tree by using Open Tree of Life tools to reconcile Taxonomy, and the BLAST algorithm to search for loci in the GenBank genetic database that are likely to be locally similar to sequences in the initial DNA alignment.

By using a starting alignment and tree, Physcraper takes advantage of DNA loci alignments as homology hypotheses (ideally orthology, see FAQs) that previous researchers have assessed, curated, and deemed appropriate for the phylogenetic scope. The sequences added during a BLAST search are limited either to a user specified taxon or monophyletic group, or within the taxonomic scope of the ingroup of the starting tree.

These automated, reproducible trees can provide a quick inference of potential phylogenetic relationships, as well as flag problems in the taxonomic assignments of sequences, paralogy and orthology, and areas of potential systematic interest.


Figure 1 from Sanchez-Reyes et al. 2021: The Physcraper framework consists of 4 general steps. The methodology is further described in the Implementation section of this documentation.



The Open Tree of Life

The Open Tree of Life (OpenTree) is a project that unites expert, peer-reviewed phylogenetic inferences and taxonomy to generate a synthetic tree estimate of species relationships across all life.


OpenTree synthetic tree. Figure 1 from Hinchliff et al. 2015. For more information on the OpenTree project go to https://opentreeoflife.github.io


OpenTree aims to construct a comprehensive, dynamic and digitally-available tree of life by synthesizing published phylogenetic trees along with taxonomic data. Currently the tree comprises 2.3 million tips. However, only around 90,000 of those taxa are represented by phylogenetic estimates - the rest are placed in the tree based on their taxonomic names.

To achieve this, the OpenTree Taxonomy (OTT) constructs a reference taxonomy for taxonomic reconciliation, through an algorithmic combination of several source taxonomies, such as: