TY - GEN
T1 - Volume Estimation of Travertine Blocks Using Keypoint Detection and Homography from Monocular Video
AU - Arostegui, Erik Vargas
AU - Cerna, Lourdes Ramirez
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - We present a monocular-vision pipeline for estimating the volume (m3) of travertine-marble blocks transported on trucks, using only RGB footage from fixed surveillance cameras. Unlike truck-scale systems that are costly and provide only aggregate weights, our approach estimates dimensions per block. A YOLOv8-Pose model trained on 598 annotated frames achieved mAP0.5 = 95.9% for bounding boxes and mAP0.5:0.95 = 95.97% for five keypoints. Metric conversion combines intrinsic calibration with a planar homography anchored to the truck platform, while block height is inferred from a single reference block. On 31 test frames, the system reached a mean absolute error of 0.97 m3 (≈14% relative, typical block ≈ 7 m3). For comparison, we evaluated a CNN regressor (ResNet18), tabular models (Ridge, Random Forest), and a MiDaS-based pseudo-depth height estimator. These alternatives showed higher errors (RF 16.4%, ResNet18 25.7%, Ridge 38.8%, MiDaS 18.4%), confirming the proposed homography-based method as the most accurate under quarry conditions.
AB - We present a monocular-vision pipeline for estimating the volume (m3) of travertine-marble blocks transported on trucks, using only RGB footage from fixed surveillance cameras. Unlike truck-scale systems that are costly and provide only aggregate weights, our approach estimates dimensions per block. A YOLOv8-Pose model trained on 598 annotated frames achieved mAP0.5 = 95.9% for bounding boxes and mAP0.5:0.95 = 95.97% for five keypoints. Metric conversion combines intrinsic calibration with a planar homography anchored to the truck platform, while block height is inferred from a single reference block. On 31 test frames, the system reached a mean absolute error of 0.97 m3 (≈14% relative, typical block ≈ 7 m3). For comparison, we evaluated a CNN regressor (ResNet18), tabular models (Ridge, Random Forest), and a MiDaS-based pseudo-depth height estimator. These alternatives showed higher errors (RF 16.4%, ResNet18 25.7%, Ridge 38.8%, MiDaS 18.4%), confirming the proposed homography-based method as the most accurate under quarry conditions.
KW - Computer Vision
KW - Keypoint Detection
KW - Planar Homography
KW - Single View Geometry
KW - Volume Estimation
UR - https://www.scopus.com/pages/publications/105024479624
U2 - 10.1007/978-3-032-09044-7_24
DO - 10.1007/978-3-032-09044-7_24
M3 - Articulo (Contribución a conferencia)
AN - SCOPUS:105024479624
SN - 9783032090430
T3 - Lecture Notes in Computer Science
SP - 329
EP - 345
BT - Advances in Soft Computing - 24th Mexican International Conference on Artificial Intelligence, MICAI 2025, Proceedings
A2 - Martínez-Villaseñor, Lourdes
A2 - Vázquez, Roberto A.
A2 - Ochoa-Ruiz, Gilberto
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th Mexican International Conference on Artificial Intelligence, MICAI 2025
Y2 - 3 November 2025 through 3 November 2025
ER -