Gaussian Process kernels comparison - Datasets and python code
Overview
Data used for publication in "Comparing Gaussian Process Kernels Used in LSG Models for Flood Inundation Predictions".
We investigate the impact of 13 Gaussian Process (GP) kernels, consisting of five single kernels and eight composite kernels, on the prediction accuracy and computational efficiency of the Low-fidelity, Spatial analysis, and Gaussian process learning (LSG) modelling approach.
The GP kernels are compared for three distinct case studies namely Carlisle (United Kingdom), Chowilla floodplain (Australia), and Burnett River (Australia).
The high- and low-fidelity model simulation results are obtained from the data repository Fraehr, N. (2024, January 19). Surrogate flood model comparison - Datasets and python code (Version 1). The University of Melbourne. https://doi.org/10.26188/24312658.v1.
Dataset structure
The dataset is structured in 5 file folders:
- Carlisle
- Chowilla
- BurnettRV
- Comparison_results
- Python_data
The first three folders contain simulation data and analysis codes.
The "Comparison_results" folder contains plotting codes, figures and tables for comparison results.
The "Python_data" folder contains LSG model functions and Python environment requirement.
Carlisle, Chowilla, and BurnettRV
These files contain high- and low-fidelity hydrodynamic modelling data for training and validation for each individual case study, as well as specific Python scripts for training and running the LSG model with different GP kernels in each case study. There are only small differences between each folder, depending on the hydrodynamic model simulation results and EOF analysis results.
Each case study file has the following folders:
Geometry_data
- DEM files
.npz
files containing of the high-fidelity models grid (XYZ-coordinates) and areas (Same data is available for the low-fidelity model used in the LSG model).shp
files indicating location of boundaries and main flow paths
XXX_modeldata
Folder to storage trained model data for each XXX kernel LSG model.
For example, EXP_modeldata contains files used to store the trainined LSG model using exponential Gaussian Process kernel.
ME3LIN means ME3 + LIN. ME3mLIN means ME3 x LIN.
EXPLow mean inducing points percentage for Sparse GP is 5%.
EXPMid mean inducing points percentage for Sparse GP is 15%.
EXPHigh mean inducing points percentage for Sparse GP is 35%.
EXPFULL mean inducing points percentage for Sparse GP is 100%.
HD_model_data
- High-fidelity simulation results for all flood events of that case study
- Low-fidelity simulation results for all flood events of that case study
- All boundary input conditions
HF_EOF_analysis
Storing of data used in the EOF analysis for the LSG model.
Results_data
Storing results of running the evaluation of the LSG models with different GP kernel candidates.
Train_test_split_data
The train-test-validation data split is the same for all LSG models with different GP kernel candidates. The specific split for each cross-validation fold is stored in this folder.
YYY_event_summary.csv, YYY_Extrap_event_summary.csv
Files containing overview of all events, and which events are connected between the low- and high-fidelity models for each YYY case study.
EOF_analysis_HFdata_preprocessing.py, EOF_analysis_HFdata.py
Preprocessing before EOF analysis and the EOF analysis of the high-fidelity data.
Evaluation.py, Evaluation_extrap.py
Scripts for evaluating the LSG model for that case study and saving the results for each cross-validation fold.
train_test_split.py
Script for splitting the flood datasets for each cross-validation fold, so all LSG models with different GP kernel candidates train on the same data.
XXX_training.py
Script for training each LSG model using the XXX GP kernel.
ME3LIN means ME3 + LIN. ME3mLIN means ME3 x LIN.
EXPLow mean inducing points percentage for Sparse GP is 5%.
EXPMid mean inducing points percentage for Sparse GP is 15%.
EXPHigh mean inducing points percentage for Sparse GP is 35%.
EXPFULL mean inducing points percentage for Sparse GP is 100%.
XXX_training.bat
Batch scripts for training all LSG models using different GP kernel candidates.
Comparison_results
Files used for comparing LSG models using different GP kernel candidates and generate the figures in the paper "Comparing Gaussian Process Kernels Used in LSG Models for Flood Inundation Predictions". Figures are also included.
Python_data
Folder containing Python script with utility functions for setting up, training, and running the LSG models, as well as for evaluating the LSG models.
Python environment
This folder also contains two python environment file with all Python package versions and dependencies. You can install CPU version or GPU version of environment. GPU version environment can use GPU to speed up the GPflow training process. It will install cuda and CUDnn package.
You can choose to install environment online or offline. Offline installation reduces dependency issues, but it requires that you also use the same Windows 10 operating system as I do.
Online installation
- LSG_CPU_environment.yml: python environment for running LSG models using CPU of the computer
- LSG_GPU_environment.yml: python environment for running LSG models using GPU of the computer, mainly using GPU to speed up the GPflow training process. It need to install cuda and CUDnn package.
In the directory where the .yml
file is located, use the console to enter the following command
conda env create -f LSG_CPU_environment.yml -n myenv_name
or
conda env create -f LSG_GPU_environment.yml -n myenv_name
Offline installation
If you also use Windows 10 system as I do, you can directly unzip environment packed by conda-pack.
- LSG_CPU.tar.gz: Zip file containing all packages in the virtual environment for CPU only
- LSG_GPU.tar.gz: Zip file containing all packages in the virtual environment for GPU acceleration
In Windows system, create a new LSG_CPU
or LSG_GPU
folder in the Anaconda environment folder and extract the packaged LSG_CPU.tar.gz
or LSG_GPU.tar.gz
file into that folder.
tar -xzvf LSG_CPU.tar.gz -C ./LSG_CPU
or
tar -xzvf LSG_GPU.tar.gz -C ./LSG_GPU
Access to the environment path
cd ./LSG_GPU
activation environment
.\Scripts\activate.bat
Remove prefixes from the activation environment
.\Scripts\conda-unpack.exe
Exit environment
.\Scripts\deactivate.bat
LSG_mods_and_func
Python scripts for using the LSG model.
Evaluation_metrics.py
Metrics used to evaluate the prediction accuracy and computational efficiency of the LSG models.
Funding
China Scholarship Council (CSC) (No. 202306710125)
The University of Melbourne via the Melbourne Research Scholarship
National Key R&D Programme of China (2023YFC3006501)
History
Add to Elements
- Yes