<p dir="ltr">18S Db for clade I and IV nematodes: </p><p dir="ltr">Trichinellida (NCBI:txid6329) which encompasses: Capillaridae (Capillaria, Pearsonema, etc), Trichuridae (Trichuris) & Trichinellidae (Trichinellas). </p><p dir="ltr">Strongyloidoidea (NCBI:txid2082224) which encompasses: Rhabdias, Steinernema, Strongyloides and Parastrongyloides</p><p><br></p><p dir="ltr">Search terms:</p><p dir="ltr">For Trichinellida (total reads 914)</p><p dir="ltr">(((((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR 18S[Title]) OR Ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid6329[Organism]) AND 200:10000[Sequence Length]) AND nuccore pubmed[Filter]) NOT unverified[Keyword]</p><p dir="ltr">For Strongyloidoidea (total reads 627)</p><p dir="ltr">(((((((((18S ribosomal RNA[Title]) OR 18S rRNA[Title]) OR 18S[Title]) OR Ribosomal RNA[Title]) OR SSU rRNA[Title]) OR SSU ribosomal RNA[Title]) AND txid2082224[Organism]) AND 200:10000[Sequence Length]) AND nuccore pubmed[Filter]) NOT unverified[Keyword]</p><p><br></p><p dir="ltr">Concatenated</p><p><br></p><p dir="ltr">Next a list of clade III and V parasitic nematodes i.e. the Ascarids, Ancylostomatids, etc were obtained – these downloaded as a fasta file.</p><p dir="ltr">Next this fasta file had the titles of the sequences changed to ‘sham’ titles to non-descript accession numbers e.g. Unidentified nematode 18S ribosomal RNA, partial sequence, </p><p><br></p><p dir="ltr"># Simplify the headers of your database fasta file</p><p dir="ltr">$ awk '{if($0~/^>/){print $1} else {print $0}}' Final_Trich_Strongy_with_Sham_reads_v2.fasta > Trich_Strongy_with_unidentified_v2.fasta</p><p dir="ltr"># Make a text file of all the accession numbers in the database fasta file</p><p dir="ltr">$ awk '{if ($1~/^>/) print substr($1,2)}' Trich_Strongy_with_unidentified_v2.fasta > Trich_Strongy_with_unidentified_v2_accession_ids.txt</p><p dir="ltr"># Create a mapping table of each accession to its taxa id - takes about 10 minutes as it has to read each of the 300 million lines nucl_gb.accession2taxid</p><p dir="ltr">$ awk -F"\t" 'BEGIN{while(getline<"Trich_Strongy_with_unidentified_v2_accession_ids.txt") hash[$1]=1} {if ($2 in hash) print $2,$3}' nucl_gb.accession2taxid > Trich_Strongy_with_unidentified_v2_tax_map.txt</p><p dir="ltr"># Make the blast database using the database fasta file for example: </p><p dir="ltr">$ makeblastdb -in Trich_Strongy_with_unidentified_v2.fasta -parse_seqids -blastdb_version 5 -taxid_map Trich_Strongy_with_unidentified_v2_tax_map.txt -title "Trich_Strongy_with_unidentified_v2_db" -out Trich_Strongy_with_unidentified_v2_db -dbtype nucl</p><p><br></p><p dir="ltr">Final database files produced = 10. For example Trich_Strongy_with_unidentified_v2_db.ndb, Trich_Strongy_with_unidentified_v2_db.nhr, Trich_Strongy_with_unidentified_v2_db.nin</p><p><br></p><p dir="ltr">These can be used by NanoCLUST e.g. in the command </p><p><br></p><p dir="ltr">nextflow run main.nf -profile docker --reads '/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/barcodes01-75/barcode14-allfiles.fastq.gz' --db "db/Trich_Strongy_with_unidentified_db" --tax "db" --min_read_length 600 --max_read_length 1200 --min_cluster_size 100 --cluster_sel_epsilon 1 --max_memory ’84.GB’ --max_cpus 12 --outdir ./Nemabiome_Trial</p>