18S and ITS read separation file and database
PBlat Read Separation
Sequencing data was demultiplexed using MinKNOW and fastq files for each barcode concatenated prior to downstream analysis.
Because both the 18S rDNA and ITS1-to-ITS2 sequences were pooled and given the same barcode for each sample analysed these firstly had to be separated using pblat (M. Wang & Kong, 2019) and seqtk (https://github.com/lh3/seqtk).
To conduct separation and binning of 18S rDNA and ITS sequences into different files a database of nematode 18S rDNA sequences was built just using the region of the 18S rDNA targeted by our primers.
Using a pblat minimum score value of 50, the 18S rDNA sequences from a barcode sequencing file were compared to our pblat 18S rDNA database and extracted to form an 18S rDNA sequence file, whilst the remaining sequences were used to form an ITS1-to-ITS2 sequence file.
Code is
conda activate pblat
for i in {01..75};
do
DIR=/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2
REF=18S_ref_seqs_for_Pblat_v2.fasta
conda run -n seqtk seqtk seq -A /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > tmp.fa
conda run -n pblat pblat -noHead minScore=50 -threads=48 18S_ref_seqs_for_Pblat_v2.fasta tmp.fa barcode${i}.tmp.pblat.psl
awk -F"\t" '{print $10 }' barcode${i}.tmp.pblat.psl | sort | uniq | wc -l > barcode${i}.count.txt
awk '{if ($1>=50) print $10}' barcode${i}.tmp.pblat.psl | sort | uniq > barcode${i}.tmp.pblat.header.txt
conda run -n seqtk seqtk subseq /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz barcode${i}.tmp.pblat.header.txt > barcode${i}.tmp.pblat.fastq
seqkit grep -v -f <(seqkit seq -ni /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}.tmp.pblat.fastq) /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > barcode${i}.tmp.inverse.pblat.fastq
done
NOTE:
Barcode[#].tmp.pblat.fastq = fastq file of 18S reads
Barcode[#].tmp.inverse.pblat.fastq = fastq file of ITS reads and other non-18S reads