These are some notes for how to install software and fetch the data required for the rMLST comparison in Acintobacter.
Here are steps for setting up a conda to manage your software installations.
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.shchmod +x ./Miniconda3-py38_4.12.0-Linux-x86_64.sh./Miniconda3-py38_4.12.0-Linux-x86_64.sh~/miniconda3/bin/conda initsource ~/.bashrcconda config --add channels defaultsconda config --add channels conda-forgeconda config --add channels biocondaconda create -y -c conda-forge -n rmlst mambaconda activate rmlst
Using conda makes it easy to install bioinformatics software.
mamba install -y -c bioconda rapidnj cgmlst-dists mashtreemamba install -y -c conda-forge pip notebook nb_conda_kernels jupyter_contrib_nbextensionspip install grapetree
You can fetch genome assemblies from NCBI using the datasets
tool, which is available at https://www.ncbi.nlm.nih.gov/datasets/docs/v1/download-and-install/
To use it, as I have done below, you need a text file of all the accession codes you wish to fetch (I have called it get_ass.txt
).
wget https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasetschmod +x ./datasets./datasets download genome accession --inputfile get_ass.txt --exclude-protein --exclude-rna --include-gbff --exclude-genomic-cds --exclude-sequnzip ncbi_dataset.zip
For the Acintobacter dataset I am using, some of the are not available ... for reasons.
Some of the assemblies provided ('GCA_000580355.1', 'GCA_000580435.1') are valid NCBI Assembly Accessions,but are not in scope for NCBI Datasets.
You can pull the assemblies out of the downloaded zip file to where ever you want. By default, it be in ncbi_dataset/data
.
from os import mkdir, path, listdir , getcwdimport shutilgetcwd()if not path.exists('gen_fasta'):mkdir('gen_fasta')for fasta_path, name in [(path.join('ncbi_dataset/data',x), x) for x in listdir('ncbi_dataset/data') if x.startswith('GCA')]:fasta_file = [path.join(fasta_path, x ) for x in listdir(fasta_path) if x.endswith('.fna')]if fasta_file:shutil.copy(fasta_file[0], f'gen_fasta/{name}.fasta')
Questions or comments? @ me on Twitter @happy_khan
The banner image is an AI generated picture (Midjourney) with prompt; 'bioinformatics'. You can share and adapt this image following a CC BY-SA 4.0 licence.