The University of Melbourne
Browse

ApicomplexanDB

Download (3.21 MB)
dataset
posted on 2023-03-24, 03:47 authored by Lucas HugginsLucas Huggins
<p>  </p> <p>Construction of Filarial Worm and Apicomplexan Haemoparasite Databases for NanoCLUST:</p> <p>Within NCBI nucleotide the filarial worm COI gene Db was constructed using the search terms:</p> <p>(((((((((((cytochrome c oxidase subunit 1[Title]) OR cytochrome c oxidase subunit I) OR cytochrome oxidase subunit 1) OR cytochrome oxidase subunit I) OR COX1) OR CO1) OR COI)) AND txid6295[Organism:exp])) AND 100:100000[Sequence Length])</p> <p>And the NCBI accession NR_029255.1 (<em>Aliivibrio</em> <em>fischeri</em>) required for identification of our positive control. </p> <p>Additionally a second filarial worm Db was constructed from the same sequences downloaded using the aforementioned search terms with the inclusion of the dog genome GCF_014441545.1 (<em>Canis lupus familiaris</em>). </p> <p>For construction of the apicomplexan 18S rRNA gene Db the search terms used were:</p> <p>((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid5794[Organism]) AND 200:10000[Sequence Length] </p> <p>Plus the addition of NR_029255.1 (<em>Aliivibrio</em> <em>fischeri</em>) required for positive control identification. </p> <p>The specific fasta sequences were chosen and downloaded as a fasta file from NCBI.</p> <p>Extracted accession numbers from the fasta headers and produce a single column text file.</p> <p>Downloaded the large NCBI accession2taxid database - a text file:</p> <p>ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz</p> <p>Created a mapping table of each accession to its taxa id using nucl_gb.accession2taxid</p> <p>The awk command:</p> <p>awk -F"\t" 'BEGIN{while(getline<"accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > [Db_name]_map.txt</p> <p>This will take a list of accession numbers "accession_ids.txt" and the downloaded accession2taxid database to produce a two column mapping file called [Db_name]_map.txt</p> <p>Then used the makeblastdb command, downloadable from https://blast.ncbi.nlm.nih.gov/Blast.cgi</p> <p>makeblastdb -in Filaria_AllCOI_species.fasta -parse_seqids -blastdb_version 5 -taxid_map [Db_name]_map.txt -title "[Db_name] database" -out [Db_name] -dbtype nucl</p> <p>This produces the blast database, consisting of 10 files required by NanoCLUST. </p>

History

Usage metrics

    2520 - Veterinary Science

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC