microbiome_strain_species_nanopore_quantification_tarfiles
a. Code repository is https://github.com/cjwoodruff50/Nanopore_WEHI_cjw
A description of the code pipeline is provided there. Item b. below describes key data files.
b. 1. Input for database generation is in .../papercheck/workDB1, including the
sub-directories 16S and 23S which contain the fasta files of each strains' rRNA
genes. The following tarred files hold the relevant data:-
workDB1_16S_fasta_03022024.tar
workDB1_23S_fasta_03022024.tar
workDB1_16S_23S_blastn_databases.tar
2. Input for dataset generation is the split_zymo_hmw_r104_* files (aa to cx)
These have been tarred into
split_zymo_hmw_r104_aa_cx.tar
Processing of these files by extract_Seraika_16S23S_v10.R generates the primary
16s and 23S datasets.
Separately for both 16S and 23S
3. Input for RAD denoising is 10 fastq files. The primary datasets are in
.../papercheck/in while the sub-sampled datasets are in .../papercheck
amplicon_16S_RADfastqinput_subsampledDatasets_05022024.tar and
amplicon_23S_RADfastqinput_subsampledDatasets_05022024.tar
hold the sub-sampled datasets' fastq files.
4. Input for ASV alignments, and profiling based on these, RAD outputs and the
reference databases. These are held in
amplicon_16S_RADfastqoutput_05022024.tar
amplicon_23S_RADfastqoutput_05022024.tar
denoised_amplicon_D6322_16S23S_05022024_10datasets_ASVs_fasta.tar
The reference databases (see i. above) are also required, of course.
Necessary data for running the denoising and profiling consists of
i. workDB1_16S_23S_blastn_databases.tar
ii. amplicon_D6322_16S_trimmed_05022024.fastq
iii. amplicon_D6322_23S_trimmed_05022024.fastq
iv. amplicon_16S_RADfastqinput_subsampledDatasets_05022024.tar
v. amplicon_23S_RADfastqinput_subsampledDatasets_05022024.tar
vi. denoised_amplicon_D6322_16S23S_05022024_10datasets_ASVs_fasta.tar
vii. amplicon_16S_RADfastqoutput_05022024.tar
viii. amplicon_23S_RADfastqoutput_05022024.tar
ix. RAD_amplicon_16S23S_05022024_10datasets_text_output_06022024.tar
Items i. to iii. allow RAD denoising of the primary 16S and 23S datasets.
Items iv. and v., together with item i. allow RAD denoising of the sub-sampled
datasets
Items vi., vii., viii. and ix. allow identification and quantification of the sample
microbiota for all datasets.