The Physcraper results

Files generated by a Physcraper run

Within the output directory defined by the user, a Physcraper run generates various subdirectories which are labeled with a TAG name that corresponds to the file name of the input alignment.

The subdirectories consist of:

  • Input files

Within the inputs_TAG directory, Physcraper writes tree and alignment files used in a Physcraper run for the sake of reproducibility, taxon name matching, and taxon reconciliation. It also writes the “.config” file down if none was provided, as well as the results of the mapping of the tree taxon labels, saved as “otu_info.csv”

  • Run files

Within the run_TAG directory, all run files are also automatically written down: Blast runs, alignments, RAxML trees, bootstrap results, etc.

Intermediate processing files, and the json formatted otu information are also stored here. Many fo these files are re-used in the event that the analysis crashes and is restarted. Make sure you use a new output directory or otherwise empty this folder if you want to modify the initial run parameters.

The trees are reconstructed using RAxML, with tip labels corresponding to local ids (e.g., otu42009, otuPS1) and not taxon names (e.g., Ceiba), nor taxonomic ids (e.g., ott or ncbi). Branch lengths are proportional to relative substitution rates.

  • BLAST files

The blast_TAG folder contains XML files with BLAST results for each sequence in the input alignment.

  • Output files

In the output_TAG folder, the updated tree with taxon names as tip labels is saved as “updated_taxonname.tre”. A version of the updated tree in nexson format containing several types of tip labels is also saved here. From the nexson tree, a tree with any label can be produced. See section relabeling the trees below for more info on how to generate trees with different types of labels.

The “seqlen_mismatch.txt” file contains the acession numbers, taxa, and sequence lengths of BLAST matches that didn’t meet the sequence length cutoffs.

Finally, the CSV file “otu_info_TAG.csv” contains a summary of information about original and newly added sequences.

Visualizing the Physcraper results

There are several tree visualization tools available. For reproducibility purposes, we will use some handy functions from the R language to see our results.

updated_tree_otus <- ape::read.tree(file = "docs/examples/pg_55/run_pg_55tree5864_ndhf/RAxML_bestTree.2020-06-18")
ape::plot.phylo(ape::ladderize(updated_tree_otus), type = "phylogram", cex = 0.25, label.offset = 0.001, edge.width = 0.5)
ape::add.scale.bar(cex = 0.3, font = 1, col = "black")
mtext("Updated tree - otu tags as labels", side = 3)

plot of chunk unnamed-chunk-7

updated_tree_taxonname <- ape::read.tree(
  file = "docs/examples/pg_55/outputs_pg_55tree5864_ndhf/updated_taxonname.tre")
ape::plot.phylo(ape::ladderize(updated_tree_taxonname), type = "phylogram", cex = 0.2, label.offset = 0.001, edge.width = 0.5)
ape::add.scale.bar(cex = 0.3, font = 1, col = "black")
mtext("Updated tree - Taxon names as labels", side = 3)

plot of chunk unnamed-chunk-10

Compare the original tree with the pruned updated tree

original_tree_otus <- ape::read.tree(file = "docs/examples/pg_55/inputs_pg_55tree5864_ndhf/physcraper_pg_55tree5864_ndhf.tre")
updated_tree_otus_pruned <- ape::read.tree(
  file = "../data/pg_55/pruned_updated.tre"
)

Now plot them face to face.

We can prune the updated tree, so it is a straight forward comparison:

cotree <-  phytools::cophylo(original_tree_otus, updated_tree_otus_pruned, rotate.multi =TRUE)

Rotating nodes to optimize matching… Done.

phytools::plot.cophylo(x = cotree, fsize = 0.5, lwd = 0.5, mar=c(.1,.2,2,.3), ylim=c(-.1,1), scale.bar=c(0, .05))
title("Original tr.   Pruned updated tr.", cex = 0.1)

plot of chunk cotree-plot1

But it is more interesting to plot it with all the new tips, so we see exactly where the new things are:

original_tree_taxonname <- ape::read.tree(file = "docs/examples/pg_55/inputs_pg_55tree5864_ndhf/taxonname.tre")
cotree2 <-  phytools::cophylo(
  original_tree_taxonname,
  updated_tree_taxonname,
  rotate.multi =TRUE)

Rotating nodes to optimize matching… Done.

phytools::plot.cophylo(
  x = cotree2,
  fsize = 0.3,
  lwd = 0.4,
  mar=c(.1,.1,2,.5),
  ylim=c(-.1,1),
  scale.bar=c(0, .05),
  # link.type="curved",
  link.lwd=3,
  link.lty="solid",
  link.col=phytools::make.transparent("#8B008B",0.5))
title("Original tree      Updated tree", cex = 0.1)

plot of chunk cotree-plot2

We can also plot the updated tree against the synthetic subtree of Malvaceae, to visualize how it updates our current knowledge of the phylogeentic relationships within the family.

However, we are having some trouble with matching the tips right now! A fix will come soon:

tolsubtree <- rotl::tol_subtree(ott_id = 279960)
ape::Ntip(tolsubtree)
#> [1] 5898
grep("Pterygota_alata", tolsubtree$tip.label)
#> [1] 5714
updated_tree_taxonname$tip.label
#>  [1] "Fremontodendron_californicum_otuPS13"
#>  [2] "Quararibea_costaricensis_otuPS38"
#>  [3] "Matisia_cordata_otuPS39"
#>  [4] "Hibiscus_bojerianus_otuPS45"
#>  [5] "Macrostelia_laurina_otuPS29"
#>  [6] "Talipariti_tiliaceum_var._tiliaceum_otuPS48"
#>  [7] "Talipariti_hamabo_otuPS47"
#>  [8] "Papuodendron_lepidotum_otuPS46"
#>  [9] "Cephalohibiscus_peekelii_otuPS34"
#> [10] "Kokia_kauaiensis_otuPS32"
#> [11] "Kokia_drynarioides_otuPS31"
#> [12] "Kokia_cookei_otuPS30"
#> [13] "Ochroma_pyramidale_otuPS40"
#> [14] "Ochroma_pyramidale_otu376430"
#> [15] "Catostemma_fragrans_otuPS37"
#> [16] "Scleronema_micranthum_otuPS44"
#> [17] "Cavanillesia_platanifolia_otuPS50"
#> [18] "Spirotheca_rosea_otuPS42"
#> [19] "Spirotheca_rosea_otu376452"
#> [20] "Bombax_buonopozense_otuPS49"
#> [21] "Bombax_buonopozense_otu376420"
#> [22] "Ceiba_acuminata_otuPS36"
#> [23] "Ceiba_crispiflora_otuPS41"
#> [24] "Pachira_aquatica_otuPS35"
#> [25] "Pachira_aquatica_otu376439"
#> [26] "Septotheca_tessmannii_otuPS11"
#> [27] "Triplochiton_zambesiacus_otuPS43"
#> [28] "Heritiera_elata_otu376445"
#> [29] "Heritiera_littoralis_otu376454"
#> [30] "Heritiera_aurea_otu376446"
#> [31] "Heritiera_simplicifolia_otu376427"
#> [32] "Heritiera_aurea_otu376435"
#> [33] "Brachychiton_acerifolius_otu376453"
#> [34] "Brachychiton_acerifolius_otu376441"
#> [35] "Acropogon_bullatus_otu376442"
#> [36] "Acropogon_dzumacensis_otu376429"
#> [37] "Franciscodendron_laurifolium_otu376443"
#> [38] "Argyrodendron_peralatum_otu376431"
#> [39] "Sterculia_balanghas_otu376450"
#> [40] "Sterculia_tragacantha_otu376428"
#> [41] "Sterculia_tragacantha_otuPS28"
#> [42] "Sterculia_stipulata_otu376440"
#> [43] "Sterculia_coccinea_otu376436"
#> [44] "Sterculia_parviflora_otu376444"
#> [45] "Hildegardia_barteri_otu376432"
#> [46] "Firmiana_malayana_otu376449"
#> [47] "Firmiana_platanifolia_otu376425"
#> [48] "Hildegardia_populifolia_otu376448"
#> [49] "Scaphium_linearicarpum_otu376434"
#> [50] "Scaphium_macropodum_otu376426"
#> [51] "Pterocymbium_tinctorium_otu376433"
#> [52] "Scaphium_macropodum_otu376451"
#> [53] "Octolobus_spectabilis_otu376447"
#> [54] "Cola_acuminata_otu376437"
#> [55] "Pterygota_alata_otu376438"

Trick for the cophylo titles and margins from https://cran.r-project.org/web/packages/phangorn/vignettes/IntertwiningTreesAndNetworks.html

Analysing the Physcraper results

Under construction

The functionalities described in the following sections are still under development. Even in this beta form, we think they have the potential to be useful, so we decided to document them here.

The first two functionalities are not accessible through the command line yet. This means that you can access them only through Python for the moment. The last functionality can be accessed through the command line, but it has not been fully tested yet. Use with caution.

Relabeling the trees

For downstream analyses and figure making, it can be handy to swap labels on tips of the updated phylogeny from alternative taxonomies or taxon id numbers.

Do that with the write_labelled function. For example, to change e.g.,

from physcraper import treetaxon
pg55 = treetaxon.generate_TreeTax_from_run('docs/examples/pg_55_web')
pg55.write_labelled(label='^ot:ottTaxonName', norepeats=False, path='tests/tmp/pg_55_repeats.tre')

Rerooting the trees

A correctly rooted phylogeny is needed to compare relationships between two or more phylogenetic hypotheses. Automatic rooting of phylogenies is not straightforward. Physcraper’s root_tree_from_synth function places a suggested root based on relationships in OpenTree’s synthetic tree or in its taxonomic tree.

To root a Physcraper tree using either the OpenTree taxonomy, or the OpenTree synthetic tree. First load the tree object:

from physcraper import treetaxon
pg55 = treetaxon.generate_TreeTax_from_run('docs/examples/pg_55_web')

Then, to root based on the OpenTree taxonomy, set base = "ott":

from physcraper import opentree_helpers
ott_rooted_tree = opentree_helpers.root_tree_from_synth(pg55.tre, pg55.otu_dict, base='ott')
pg55.tre = ott_rooted_tree  # set tree rooted based on taxonomy as the tree object
pg55.write_labelled(label="^ot:ottTaxonName", path="tests/tmp/pg_55_ott_root.tre")

And, to root based on phylogenetic relationships in the OpenTree synthetic tree, set base = "synth":

from physcraper import opentree_helpers
synth_rooted_tree = opentree_helpers.root_tree_from_synth(pg55.tre, pg55.otu_dict, base='synth')
pg55.tre = synth_rooted_tree  # set tree rooted based on synth as the tree object
pg55.write_labelled(label="^ot:ottTaxonName", path="tests/tmp/pg_55_synth_root.tre")

In this example both trees are the same even though they use the MRCA of different pairs of taxa, because those MRCA’s map to the same node on the output tree.

However rooting based on OTT can be unreliable, especially if taxonomy is a poor fit to true evolutionary relationships. So whenever possible, the root should be specified by the user, for example by choosing from tips in the “otu_info.csv” file in the outputs folder of a Physcraper run, e.g.,

pg55 = treetaxon.generate_TreeTax_from_run('docs/examples/pg_55_web')
outgroup = ['otu376436','otu376444']
mrca = pg55.tre.mrca(taxon_labels=outgroup)
pg55.tre.reroot_at_node(mrca, update_bipartitions=True)
pg55.write_labelled(label="^ot:ottTaxonName", path="tests/tmp/pg_55_manual_root.tre")

Tree comparison with Robinson-Foulds (RF) distance

The tree_comparison.py script takes as an argument the output directory of a Physcraper run, and compares the relationships in the final tree to the relationships in the input tree.

Usage:

tree_comparison.py  [-h] [-d DIRECTORY_NAME] [-t1 FILE_NAME] [-t2 FILE_NAME] [-otu FILE_NAME] [-og OUTGROUP] [-o DIRECTORY_NAME]

Arguments:

-h , --help

Show the help message and exit.

-d DIRECTORY_NAME, --dir DIRECTORY_NAME

A name (and path) to a Physcraper output directory.

-o DIRECTORY_NAME, --output DIRECTORY_NAME

Name (and path) for a directory to write the results of the comparison analysis to. If it exists, it will be overwritten.

-otu FILE_NAME, --otu_info FILE_NAME

Name (and path) of JSON file containing taxon information. Stored in the Physcraper output directory run folder.

-t1 FILE_NAME, --tree1 FILE_NAME

File name (and path) of original tree from a Physcraper output directory inputs folder. Alternatively, any file containing a tree to compare to -t2.

-t2 FILE_NAME, --tree2 FILE_NAME

File name (and path) of an updated tree from a Physcraper output directory outputs folder. Alternatively, any file containing a tree to compare to -t1.

This is the simplest command line for comparison of two trees:

tree_comparison.py  [-h] [-d DIRECTORY_NAME] [-o DIRECTORY_NAME]

For example:

tree_comparison.py -d docs/examples/pg_55_web/ -o pg_55_comparison

It compares the original tree from the inputs folder and the updated tree from the outputs folder. It uses the rooting functions described above to ensure the two trees have the same root. By default, it will root trees based on the OpenTree Taxonomy.

Alternatively, you can pass in OpenTree taxonomic ids (OTT ids) of two or more taxa from the input tree to use as outgroups to root both trees.

For example:

tree_comparison.py -d docs/examples/pg_55_web/ -og otu376420 otu376439 otu376452 -o pg_55_comparison

If the comparison between the two trees is possible (outgroup-wise), the script will print the results to screen, including:

  • The number of new tips
  • The number of new taxa
  • Whether the taxa in the tree are included synthesis phylogenies currently in OpenTree
  • Which taxa phylogenetic information is not currently incorporated into the synthetic tree
  • The RF distance and weighted RF distance between the relationships of tips that are in both trees
  • How the estimates of phylogenetic relationships of taxa included in the OpenTree taxonomy from both trees have conflict with the monophyly of the OpenTree synthetic tree.