To improve the sustainability and resilience of modern food systems, designing
improved crop management strategies is crucial. The increasing abundance of data
on agricultural systems suggests that future strategies could benefit from adapting
to environmental conditions, but how to design these adaptive policies poses a
new frontier. A natural technique for learning policies in these kinds of sequential
decision-making problems is reinforcement learning (RL). To obtain the large
number of samples required to learn effective RL policies, existing work has used
mechanistic crop growth models (CGMs) as simulators. These solutions focus on
single-year, single-crop simulations for learning strategies for a single agricultural
management practice. However, to learn sustainable long-term policies we must
be able to train in multi-year environments, with multiple crops, and consider
a wider array of management techniques. We introduce CYCLESGYM, an RL
environment based on the multi-year, multi-crop CGM Cycles. CYCLESGYM
allows for long-term planning in agroecosystems, provides modular state space and
reward constructors and weather generators, and allows for complex actions. For
RL researchers, this is a novel benchmark to investigate issues arising in real-world
applications. For agronomists, we demonstrate the potential of RL as a powerful
optimization tool for agricultural systems management in multi-year case studies
on nitrogen (N) fertilization and crop planning scenarios.