This dataset supports the study of correlations between Trace-based Log Representativeness Approximation (TLRA) and two measures: log-system recall (ground truth alignment) and species-discovery-based coverage. The analysis was conducted across event logs of 60 generative systems and varying log sizes and noise levels.
Version 1: Focuses on the correlation analysis between TLRA and species-discovery-based coverage (as presented in ieeexplore.ieee.org/document/10680679).
Version 2: Extends the analysis by incorporating a ground truth evaluation through log-system recall.
The systems and logs used for this analysis are available for download in our GitHub repository.
We kindly request that you cite our work if you use this dataset in your research: A. Karunaratne, A. Polyvyanyy, and A. Moffat, “The role of log representativeness in estimating generalization in process mining,” in Int. Conf. Process Mining. IEEE, 2024, pp. 33-40.