The University of Melbourne
Browse

18S and ITS read separation file and database

dataset
posted on 2024-12-12, 05:41 authored by NEIL YOUNGNEIL YOUNG, Lucas HugginsLucas Huggins

PBlat Read Separation


Sequencing data was demultiplexed using MinKNOW and fastq files for each barcode concatenated prior to downstream analysis.

Because both the 18S rDNA and ITS1-to-ITS2 sequences were pooled and given the same barcode for each sample analysed these firstly had to be separated using pblat (M. Wang & Kong, 2019) and seqtk (https://github.com/lh3/seqtk).

To conduct separation and binning of 18S rDNA and ITS sequences into different files a database of nematode 18S rDNA sequences was built just using the region of the 18S rDNA targeted by our primers.

Using a pblat minimum score value of 50, the 18S rDNA sequences from a barcode sequencing file were compared to our pblat 18S rDNA database and extracted to form an 18S rDNA sequence file, whilst the remaining sequences were used to form an ITS1-to-ITS2 sequence file.


Code is


conda activate pblat

for i in {01..75};

do

DIR=/home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2

REF=18S_ref_seqs_for_Pblat_v2.fasta

conda run -n seqtk seqtk seq -A /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > tmp.fa

conda run -n pblat pblat -noHead minScore=50 -threads=48 18S_ref_seqs_for_Pblat_v2.fasta tmp.fa barcode${i}.tmp.pblat.psl

awk -F"\t" '{print $10 }' barcode${i}.tmp.pblat.psl | sort | uniq | wc -l > barcode${i}.count.txt

awk '{if ($1>=50) print $10}' barcode${i}.tmp.pblat.psl | sort | uniq > barcode${i}.tmp.pblat.header.txt

conda run -n seqtk seqtk subseq /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz barcode${i}.tmp.pblat.header.txt > barcode${i}.tmp.pblat.fastq

seqkit grep -v -f <(seqkit seq -ni /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}.tmp.pblat.fastq) /home/Public/Ps1/Lucas_Workspace/MinION_Nemabiome_ECR_Project_SUPdata/MinION_Nemabiome_Trial-2_Trich-Capi-Strongy/pass/looping_test_on_nemabiome_trial-2/barcode${i}-allfiles.fastq.gz > barcode${i}.tmp.inverse.pblat.fastq

done


NOTE:

Barcode[#].tmp.pblat.fastq = fastq file of 18S reads

Barcode[#].tmp.inverse.pblat.fastq = fastq file of ITS reads and other non-18S reads




History

Usage metrics

    University of Melbourne

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC