The University of Melbourne
Browse

Field bounding boxes for primary specimen labels on herbarium specimen sheets

Download (785.62 MB)
Version 2 2025-03-05, 11:31
Version 1 2024-02-23, 05:22
dataset
posted on 2025-03-05, 11:31 authored by Robert TurnbullRobert Turnbull, Emily FitzgeraldEmily Fitzgerald, Karen ThompsonKaren Thompson, JOANNE BIRCHJOANNE BIRCH

This dataset contains the bounding boxes for text fields on primary specimen labels (also known as 'institutional labels') on herbarium specimen sheets. The annotations are in YOLO format. It contains the following classes:

0. genus
1. species
2. year
3. month
4. day
5. family
6. collector
7. authority
8. locality
9. geolocation
10. collector_number
11. infrasp_taxon


These classes were annotated on 3,642 images of institutional labels from 10 herbaria. 2,603 images are from the University of Melbourne's herbarium (MELU) and the remainder are from the nine herbaria represented in the benchmark dataset described by Dillen (10.3897/BDJ.7.e31817). The images are in subdirectories by the code of the respective herbarium. The institution corresponding to each code is:

MELU: The University of Melbourne
BR: Meise Botanic Garden
K: Royal Botanic Gardens, Kew
BM: Natural History Museum, London
B: Botanic Garden and Botanical Museum, Berlin
E: Royal Botanic Garden Edinburgh
P: National Museum of Natural History, Paris
TU: Natural History Museum, University of Tartu
L: Naturalis Biodiversity Center
H: Finnish Museum of Natural History LUOMUS, University of Helsinki


These were broken down into 2,887 training images and 755 validation images and these are listed in train.txt and valid.txt respectively.

For more information, see https://github.com/rbturnbull/hespi

History

Add to Elements

  • Yes