ArcGIS Pro Workflow and Python Code for Furtner et al., 2026 – J. Archaeol Method Theory “Evaluating Random Forest Model Performance for Cave and Sinkhole Prediction in the Cradle of Humankind, South Africa: Preliminary Analysis and Variable Importance Assessments”

An illustrated scene showing a laptop with data visualizations, stacked maps with location pins, a magnifying glass, charts on the wall, and a skull overlooking a stylized landscape with mountains and a river.

About This Article

This is an AI-generated summary of a research paper. The original authors did not write or review this article. See full disclosure ↓

Zenodo (CERN European Organization for Nuclear Research)·2026-03-04·View original paper →

Overview

This software documentation release accompanies a methodological study evaluating Random Forest model performance for predicting cave and sinkhole locations within the Cradle of Humankind, South Africa. The documentation comprises two primary components: an ArcGIS Pro workflow detailing the generation of 48 raster layers representing landscape characteristics of the study region, and Python code implementing spatial 10-fold cross-validation for model training and permutation feature importance analysis. The workflow specifies exact tools and parameters employed to extract raster pixel values at training data point locations for subsequent model input. The Python implementation includes six custom functions designed to partition input data into spatial folds based on site clustering, compute averaged evaluation metrics across training iterations, generate averaged confusion matrices from the ensemble of trained models, and execute permutation feature importance procedures that account for multicollinearity among predictor variables.

Methods and approach

The ArcGIS Pro workflow documents procedures for generating 48 raster datasets characterizing the Cradle of Humankind landscape, with raster values extracted at training data locations to create tabular input for predictive modeling. The Python code operationalizes a spatial cross-validation framework that divides training data into 10 folds according to site cluster membership rather than random assignment. Six functions structure the analytical pipeline: data preparation functions split input data by spatial clusters, evaluation functions aggregate performance metrics across the 10 training iterations, matrix functions construct averaged confusion matrices from the model ensemble, and feature importance functions implement permutation-based variable assessment designed to handle multicollinear predictors. This spatial folding strategy organizes the cross-validation process around geographic clustering patterns present in the cave and sinkhole distribution data.

Results

The documentation provides complete procedural specifications for replicating the raster generation and model evaluation workflows employed in the associated research article. The ArcGIS Pro workflow enumerates tool selections and parameter settings for all 48 landscape characteristic rasters, ensuring transparency in data preparation steps. The Python script delivers functional code for spatial cross-validation implementation and permutation feature importance analysis, with explicit handling of multicollinearity in the variable importance assessment. The averaged confusion matrices and evaluation metrics derived from the 10-fold spatial cross-validation represent aggregated model performance across spatially distinct subsets of the training data. The permutation feature importance results quantify individual variable contributions to predictive accuracy while accounting for correlations among landscape characteristics.

Implications

The documented workflows establish a reproducible methodological framework for applying Random Forest classification to archaeological site prediction in karst terrain contexts. The spatial cross-validation approach, which structures data folds by site clustering rather than random partitioning, represents a methodological consideration for predictive modeling in archaeological contexts where spatial structure is inherent to the data. The multicollinearity-aware permutation feature importance procedure addresses a common challenge in landscape-based predictive modeling where environmental variables frequently exhibit correlation. The comprehensive documentation of ArcGIS Pro procedures and Python implementation facilitates adaptation of these methods to other regional contexts or archaeological site prediction applications. The functional code structure, with modular components for data splitting, model evaluation, and feature importance assessment, provides a template for similar spatial prediction studies requiring cross-validation frameworks that respect geographic clustering in training data.

Disclosure

  • Research title: ArcGIS Pro Workflow and Python Code for Furtner et al., 2026 – J. Archaeol Method Theory "Evaluating Random Forest Model Performance for Cave and Sinkhole Prediction in the Cradle of Humankind, South Africa: Preliminary Analysis and Variable Importance Assessments"
  • Authors: Margaret Furtner
  • Publication date: 2026-03-04
  • DOI: https://doi.org/10.1007/s10816-025-09761-1
  • OpenAlex record: View
  • Disclosure: This post is an AI-generated summary of a research work. It was prepared by an editor. The original authors did not write or review this post.