Installing Physcraper

While Physcraper can be installed via pip, in order to easily access the example data and ancillary files, we recommend downloading the Physcraper repository from GitHub and installing it locally following the instructions below. This process will also install the following python packages:

Downloading Physcraper

First step is to download Physcraper to your computer.

You can do this with Git:

git clone https://github.com/McTavishLab/physcraper.git

or, you can download the repository from https://github.com/McTavishLab/physcraper.git

Now, move to the newly created “physcraper” directory with cd physcraper to continue.

Next step is to create a virtual environment to run Physcraper on. You can do this using Anaconda or Virtualenv.

Anaconda virtual environment

For this option you will of course need Anaconda installed. You can follow installation instructions on Anaconda’s documentation website.

Now you can create a “conda virtual environment” with:

conda env create -f cond_env.yml
conda activate physcraper_env
pip install -r requirements.txt
pip install -e .

Note the “dot” at the end of that last command, and it should be ready!

Virtualenv virtual environment

For this option you will need Virtualenv installed.

Now you can go ahead and create a “Python virtual environment”.

Remember you need to be in the “physcraper” folder (go there with cd physcraper). Once there do:

virtualenv -p python3 venv-physcraper

This will create a python 3 virtual environment named “venv-physcraper”.

Activate the virtual environment with:

source venv-physcraper/bin/activate

Finally, install Physcraper inside the virtual environment:

pip install -r requirements.txt
pip install -e  .

Do not miss the “dot” at the end of that last command!

The virtual environment remains active even if you change directories. So, Physcraper will run from anywhere, while the virtual environment is activated.

Note that you will have to activate the virtual environment with source venv-physcraper/bin/activate every time you want to run Physcraper.

After you are finished working with Physcraper and you don’t want to run it anymore, deactivate the virtual environment with:

deactivate

Checking for dependencies

Currently complete phylogenetic updating with Physcraper requires raxmlHPC and MUSCLE to be installed and in the path.

You can check if they are already installed with:

which muscle
which raxmlHPC

Checking installation success on remote searches

To test a full run with pre-downloaded BLAST results, copy the example results using:

cp -r docs/examples/pg_55_web pg_55_test

and then run:

physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 --output pg_55_test

There is more info on all the parameter settings in the documentation section Run, but briefly, this gets a tree (tree5864) from study pg_55 on OpenTree, pulls the alignment from TreeBASE, blasts the sequences, and does 10 bootstrap reps on the final phylogeny.

This example tests all the components except for the actual remote BLAST searches (because they can be very slow). To check if your installation was successful for remote searches, try running a full analysis:

physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 --output pg_55_new

This run will take a while - once it starts blasting, that means it’s working! You can use Ctrl-C to cancel.

Local Databases

The BLAST tool can be run using local databases, which can be downloaded and updated from the National Center for Biotechnology Information (NCBI).

Installing BLAST command line tools

To BLAST locally you will need to install BLAST command line tools first. If you perfomed Physcraper installation using Anaconda, the BLAST command line tools will already be installed.

Find general instructions at BLAST’s command line applications user manual and

e.g. installing BLAST command line tools on linux:

wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
tar -xzvf ncbi-blast-2.11.0+-x64-linux.tar.gz

This link may be broken by NCBI BLAST executables updates - if so check https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ for the newest version.

The binaries/scripts/executables will be installed in the /bin folder.

Installing BLAST command line tools on MAC OS is easy, with the installer. Note, however, that the BLAST executables will be installed in usr/local/ncbi/blast and that you will have to add this to your path in order to be able to run the executables, by adding export PATH=$PATH:"usr/local/ncbi/blast/bin" to the .bash_profile

If your terminal uses zshell instead of bash, make sure you’re running the .bash_profile there too.

Downloading the NCBI database

If you want to download the NCBI BLAST database and taxonomy for faster local searches note that the download can take several hours, depending on your internet connection.

This is what you should do:

mkdir local_blast_db  # create the folder to save the database
cd local_blast_db  # move to the newly created folder
update_blastdb.pl nt  # download the NCBI nucleotide databases
cat *.tar.gz | tar -xvzf - --ignore-zeros  # unzip the nucleotide databases
update_blastdb.pl taxdb  # download the NCBI taxonomy database
gunzip -cd taxdb.tar.gz | (tar xvf - )  # unzip the taxonomy database

Downloading the nodes and names into the physcraper/taxonomy directory

cd physcraper/taxonomy
wget 'ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
gunzip -f -cd taxdump.tar.gz | (tar xvf - names.dmp nodes.dmp)

Updating an existing BLAST database

cd local_blast_db  # move to the nucleotide database folder
update_blastdb nt  # download the NCBI nucleotide databases
# update_blastdb.pl nt  # on Mac OS
cat *.tar.gz | tar -xvzf - --ignore-zeros  # unzip the nucleotide databases
update_blastdb taxdb  # download the NCBI taxonomy database
# update_blastdb.pl taxdb  # on Mac OS
gunzip -cd taxdb.tar.gz | (tar xvf - )  # unzip the taxonomy database

Checking install success of local BLAST database

physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 -db local_blast_db --output pg_55_local

This should start running a query using your local BLAST database.

Setting up an AWS BLAST database

To run BLAST searches without NCBI’s required time delays, you can set up your own server on AWS (for $). See instructions at AWS marketplace NCBI BLAST

Create an NCBI API key

Generating an NCBI API key will speed up downloading full sequences following BLAST searches. See NCBI API keys for details

You can add your api key to your config using

Entrez.api_key = <apikey>

or as a flag in your physcraper_run script --api_key