18S and ITS read separation file and database

dataset

posted on 2024-12-12, 05:41 authored by NEIL YOUNGNEIL YOUNG, Lucas HugginsLucas Huggins

PBlat Read Separation

Sequencing data was demultiplexed using MinKNOW and fastq files for each barcode concatenated prior to downstream analysis.

Because both the 18S rDNA and ITS1-to-ITS2 sequences were pooled and given the same barcode for each sample analysed these firstly had to be separated using pblat (M. Wang & Kong, 2019) and seqtk (https://github.com/lh3/seqtk).

To conduct separation and binning of 18S rDNA and ITS sequences into different files a database of nematode 18S rDNA sequences was built just using the region of the 18S rDNA targeted by our primers.

Using a pblat minimum score value of 50, the 18S rDNA sequences from a barcode sequencing file were compared to our pblat 18S rDNA database and extracted to form an 18S rDNA sequence file, whilst the remaining sequences were used to form an ITS1-to-ITS2 sequence file.

Code is

conda activate pblat

for i in {01..75};

DIR=/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2

REF=18S_ref_seqs_for_Pblat_v2.fasta

conda run -n seqtk seqtk seq -A /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > tmp.fa

conda run -n pblat pblat -noHead minScore=50 -threads=48 18S_ref_seqs_for_Pblat_v2.fasta tmp.fa barcode${i}.tmp.pblat.psl

awk -F"\t" '{print $10 }' barcode${i}.tmp.pblat.psl | sort | uniq | wc -l > barcode${i}.count.txt

awk '{if ($1>=50) print $10}' barcode${i}.tmp.pblat.psl | sort | uniq > barcode${i}.tmp.pblat.header.txt

conda run -n seqtk seqtk subseq /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz barcode${i}.tmp.pblat.header.txt > barcode${i}.tmp.pblat.fastq

seqkit grep -v -f <(seqkit seq -ni /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}.tmp.pblat.fastq) /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > barcode${i}.tmp.inverse.pblat.fastq

done

NOTE:

Barcode[#].tmp.pblat.fastq = fastq file of 18S reads

Barcode[#].tmp.inverse.pblat.fastq = fastq file of ITS reads and other non-18S reads

18S and ITS read separation file and database

History

Usage metrics

Categories

Keywords

Licence

Exports