posted on 2024-10-24, 04:15authored byJiashuai ZhuJiashuai Zhu, M. Michelle Malmberg, Maiko Shinozuka, Renata Retegan, Noel O. I. Cogan, Khageswor Giri, Kevin F. Smith, Joe L. Jacobs
This dataset contains the population genotyping result by reference allele frequency derived from the genomic analysis of multiple ryegrass (Lolium spp.) populations.
Data Collection
Sample Size: A minimum of 50 seeds per sample were collected to ensure accurate and representative genotyping. (Australian Pastures Genebank (APG), Australian seed companies that own the respective cultivars or commercially purchased.)
Sequencing Method: Samples were sequenced on an Illumina NovaSeq 6000 platform (Illumina, San Diego, California, USA) following established target capture sequencing protocols (Twist Biosciences, South San Francisco, California, USA). Multiple individual samples from the same population were sequenced as a pool to capture the genetic complexity of ryegrass.
Data Processing and Quality Control
Raw sequencing data were processed through a series of loci filtering, sample filtering, and variant filtering.
Data Structure
Loci: The dataset includes information on 85,903 loci where biallelic SNPs were identified.
Chromosomes: The loci are distributed across 7 chromosomes, with additional loci mapped to "scaffold" and "contig" regions. Each locus identifier indicates its chromosomal location (e.g., "chr1_3491397" indicates a locus on chromosome 1 at position 3491397).
Allele Frequencies: For each locus, the frequency of the reference allele is provided. This is calculated by counting the occurrence of the reference allele in the population and expressing it as a proportion of the total alleles observed at that locus.