Installing Physcraper¶
While Physcraper can be installed via pip, in order to easily access the example data and ancillary files, we recommend downloading the Physcraper repository from GitHub and installing it locally following the instructions below. This process will also install the following python packages:
- Dendropy https://pythonhosted.org/DendroPy/
- Peyotl https://github.com/OpenTreeOfLife/peyotl (currently needs to be on the Physcraper branch)
- Biopython http://biopython.org/wiki/Download
- ConfigParser
Downloading Physcraper¶
First step is to download Physcraper to your computer.
You can do this with Git:
git clone https://github.com/McTavishLab/physcraper.git
or, you can download the repository from https://github.com/McTavishLab/physcraper.git
Now, move to the newly created “physcraper” directory with cd physcraper
to continue.
Next step is to create a virtual environment to run Physcraper on. You can do this using Anaconda or Virtualenv.
Anaconda virtual environment¶
For this option you will of course need Anaconda installed. You can follow installation instructions on Anaconda’s documentation website.
Now you can create a “conda virtual environment” with:
conda env create -f cond_env.yml
conda activate physcraper_env
pip install -r requirements.txt
pip install -e .
Note the “dot” at the end of that last command, and it should be ready!
Virtualenv virtual environment¶
For this option you will need Virtualenv installed.
Now you can go ahead and create a “Python virtual environment”.
Remember you need to be in the “physcraper” folder (go there with cd physcraper
).
Once there do:
virtualenv -p python3 venv-physcraper
This will create a python 3 virtual environment named “venv-physcraper”.
Activate the virtual environment with:
source venv-physcraper/bin/activate
Finally, install Physcraper inside the virtual environment:
pip install -r requirements.txt
pip install -e .
Do not miss the “dot” at the end of that last command!
The virtual environment remains active even if you change directories. So, Physcraper will run from anywhere, while the virtual environment is activated.
Note that you will have to activate the virtual environment with source venv-physcraper/bin/activate
every time you want to run Physcraper.
After you are finished working with Physcraper and you don’t want to run it anymore, deactivate the virtual environment with:
deactivate
Checking for dependencies¶
Currently complete phylogenetic updating with Physcraper requires raxmlHPC and MUSCLE to be installed and in the path.
You can check if they are already installed with:
which muscle
which raxmlHPC
Checking installation success on remote searches¶
To test a full run with pre-downloaded BLAST results, copy the example results using:
cp -r docs/examples/pg_55_web pg_55_test
and then run:
physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 --output pg_55_test
There is more info on all the parameter settings in the documentation section Run, but briefly, this gets a tree (tree5864) from study pg_55 on OpenTree, pulls the alignment from TreeBASE, blasts the sequences, and does 10 bootstrap reps on the final phylogeny.
This example tests all the components except for the actual remote BLAST searches (because they can be very slow). To check if your installation was successful for remote searches, try running a full analysis:
physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 --output pg_55_new
This run will take a while - once it starts blasting, that means it’s working! You can use Ctrl-C to cancel.
Local Databases¶
The BLAST tool can be run using local databases, which can be downloaded and updated from the National Center for Biotechnology Information (NCBI).
Installing BLAST command line tools¶
To BLAST locally you will need to install BLAST command line tools first. If you perfomed Physcraper installation using Anaconda, the BLAST command line tools will already be installed.
Find general instructions at BLAST’s command line applications user manual and
at the index of blast executables
e.g. installing BLAST command line tools on linux:
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
tar -xzvf ncbi-blast-2.11.0+-x64-linux.tar.gz
This link may be broken by NCBI BLAST executables updates - if so check https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ for the newest version.
The binaries/scripts/executables will be installed in the /bin
folder.
Installing BLAST command line tools on MAC OS is easy, with the installer. Note, however, that the BLAST executables will be installed in usr/local/ncbi/blast
and that you will have to add this to your path in order to be able to run the executables, by adding export PATH=$PATH:"usr/local/ncbi/blast/bin"
to the .bash_profile
If your terminal uses zshell instead of bash, make sure you’re running the .bash_profile there too.
Downloading the NCBI database¶
If you want to download the NCBI BLAST database and taxonomy for faster local searches note that the download can take several hours, depending on your internet connection.
This is what you should do:
mkdir local_blast_db # create the folder to save the database
cd local_blast_db # move to the newly created folder
update_blastdb.pl nt # download the NCBI nucleotide databases
cat *.tar.gz | tar -xvzf - --ignore-zeros # unzip the nucleotide databases
update_blastdb.pl taxdb # download the NCBI taxonomy database
gunzip -cd taxdb.tar.gz | (tar xvf - ) # unzip the taxonomy database
Downloading the nodes and names into the physcraper/taxonomy directory¶
cd physcraper/taxonomy
wget 'ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz'
gunzip -f -cd taxdump.tar.gz | (tar xvf - names.dmp nodes.dmp)
Updating an existing BLAST database¶
cd local_blast_db # move to the nucleotide database folder
update_blastdb nt # download the NCBI nucleotide databases
# update_blastdb.pl nt # on Mac OS
cat *.tar.gz | tar -xvzf - --ignore-zeros # unzip the nucleotide databases
update_blastdb taxdb # download the NCBI taxonomy database
# update_blastdb.pl taxdb # on Mac OS
gunzip -cd taxdb.tar.gz | (tar xvf - ) # unzip the taxonomy database
Checking install success of local BLAST database¶
physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 10 -db local_blast_db --output pg_55_local
This should start running a query using your local BLAST database.
Setting up an AWS BLAST database¶
To run BLAST searches without NCBI’s required time delays, you can set up your own server on AWS (for $). See instructions at AWS marketplace NCBI BLAST
Create an NCBI API key¶
Generating an NCBI API key will speed up downloading full sequences following BLAST searches. See NCBI API keys for details
You can add your api key to your config using
Entrez.api_key = <apikey>
or as a flag in your physcraper_run script --api_key