Nemabiome ITS Database
18S, ITS1 and ITS2, 28S Full Nematode Database:
Building a NanoCLUST Db for Parasitic Nematodes using 18S rRNA, 28S rRNA, ITS1, 5.8S and ITS2.
Nematoda Taxonomy ID: 6231 (hence must use txid6231).
Key words:
18S ribosomal RNA
18S rRNA
18S
28S ribosomal RNA
28S rRNA
28S
5.8S ribosomal RNA
5.8S rRNA
5.8S
Ribosomal RNA
SSU rRNA
LSU rRNA
SSU ribosomal RNA
LSU ribosomal RNA
Internal transcribed spacer
Internal transcribed spacer 1
Internal transcribed spacer 2
ITS
ITS1
ITS2
Final NCBI GenBank search term:
(((((((((((((((((((((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR 18S[Title]) OR 28S ribosomal RNA[Title]) OR 28S rRNA[Title]) OR 28S[Title]) OR 5.8S ribosomal RNA[Title]) OR 5.8S rRNA[Title]) OR 5.8S[Title]) OR ribosomal RNA[Title]) OR SSU rRNA[Title]) OR LSU rRNA[Title]) OR SSU ribosomal RNA[Title]) OR LSU ribosomal RNA[Title]) OR Internal transcribed spacer[Title]) OR Internal transcribed spacer 1[Title]) OR Internal transcribed spacer 2[Title]) OR ITS[Title]) OR ITS1[Title]) OR ITS2[Title]) AND txid6231[Organism])) AND 200:10000[Sequence Length])) AND nuccore pubmed[Filter]) NOT unverified[Keyword]
Downloaded as a fasta file.
Next a list of clade III and V parasitic nematodes i.e. the Ascarids, Ancylostomatids, etc were obtained – these downloaded as a fasta file.
Next this fasta file had the titles of the sequences changed to ‘sham’ titles to non-descript accession numbers e.g. Unidentified nematode 18S ribosomal RNA, partial sequence,
# Simplify the headers of your database fasta file
$ awk '{if($0~/^>/){print $1} else {print $0}}' Nemabiome_rRNA_fasta_v5_sequences.fasta > Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta
# Make a text file of all the accession numbers in the database fasta file
$ awk '{if ($1~/^>/) print substr($1,2)}' Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta > Nematoda_rRNA_v5_accession_ids.txt
# Create a mapping table of each accession to its taxa id - takes about 10 minutes as it has to read each of the 300 million lines nucl_gb.accession2taxid
$ awk -F"\t" 'BEGIN{while(getline<"Nematoda_rRNA_v5_accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > Nematode_rRNA_v5_tax_map.txt
# Make the blast database using the database fasta file for example:
$ makeblastdb -in Nematoda_rRNA-ITS-5.8S_v5_30.04.24.fasta -parse_seqids -blastdb_version 5 -taxid_map Nematode_rRNA_v5_tax_map.txt -title "Nemabiome_rRNA database_v5" -out Nemabiome_rRNA_v5_db -dbtype nucl
Final database files produced = 10. For example Nemabiome_rRNA_v5_db.ndb, Nemabiome_rRNA_v5_db.nhr, Nemabiome_rRNA_v5_db.nin
These can be used by NanoCLUST e.g. in the command
nextflow run main.nf -profile docker --reads '/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/100_Sample_Comparison_96-well_Trial-4/pass/barcodes01-96/ITS-reads-amended-filtered/barcode30.tmp.inverse.pblat.fix.fastq-filt.fastq.gz' --db "db/Nemabiome_rRNA_v5_db" --tax "db" --min_read_length 700 --max_read_length 1800 --min_cluster_size 100 --polishing_reads 100 --cluster_sel_epsilon 1 --max_memory ’84.GB’ --max_cpus 12 --outdir ./Nemabiome_trial