In semi-arid regions, the timing and duration of the rainy season determines plant water availability, which directly impacts food security. Rainy season metrics, which aim to define and, in some cases, predict the onset and end of seasonal rains, can support agricultural planning, such as scheduling planting dates and managing water resources. However, these metrics based on precipitation time series do not always accurately reflect plant water availability, and the variety of available metrics can complicate the selection of the most suitable one. Furthermore, a metric's ability to capture observed vegetation variability can indicate its applicability over larger spatial or temporal scales. This study introduces a new bucket-type metric that incorporates a simplified water balance, accounts for both accumulation and storage, and also takes interannual legacy effects into account. We evaluate its performance against seven commonly used rainy season metrics, both calibrated and uncalibrated, using 18 years of the satellite-derived Normalized Difference Vegetation Index (NDVI) from the semi-arid Rio Santa basin in the Peruvian Andes. Our results demonstrate that calibrating metrics using vegetation data significantly enhances their ability to capture rainy season dynamics, with the bucket metric outperforming others in both accuracy and robustness. Furthermore, we examine the sensitivity of all metrics to variations in rainfall intensity and frequency under future climate scenarios, using a previously published high-resolution dataset specifically designed for the Rio Santa basin which provides historical (1981–2018) rainfall data and future projections (2019–2100) based on 30 statistically downscaled CMIP5 models for the Representative Concentration Pathway (RCP) 4.5 and 8.5 scenarios, respectively. While most rainy season metrics exhibit expected correlations in response to climatic changes, some established metrics display physically inconsistent behavior due to methodological artifacts, highlighting their limitations in assessing hydroclimatic changes. In addition to the sensitivity analysis, we evaluate long-term trends in rainy season characteristics. Statistically downscaled CMIP5 ensemble projections for the future period suggest only a slight delay in the rainy season end, with no consistent trends in onset timing. Instead, interannual variability and ensemble spread remain the dominant influences. Our findings emphasize the need for careful calibration of metrics across diverse climate scenarios and different locations to ensure their reliability for agricultural planning, policymaking, and climate adaptation strategies. By providing a novel framework for evaluating rainfall metrics, this study offers a scalable approach that can be readily applied to other semi-arid regions.