Why is farmer yield so hard to predict spatially?
There is increasing interest, and use, for spatial and temporal estimates of on-farm or farmer yield. These data are most obviously needed for food security assessments and national or regional planning (i.e. greatest need and greatest opportunity for investment). Sustainable intensification and closing the yield gap are very much in vogue, and farmer yield is the baseline that defines the potential opportunity. Furthermore, many applications or models need this baseline to make, for example, nutrient recommendations. One could argue, and I would certainly do so, that we know technically how to close the yield gap in any given location. So, can we estimate farmer yields with any degree of confidence spatially and temporally, especially in sub-Saharan Africa?
The first question might be: do we have enough data to analyze in order to make spatial and temporal estimates? In general the answer is a resounding no. Few countries have monitoring systems in place that can collect credible crop yield data in a timely and reproducible manner. On the ground measurements can be made using crop-cuts from farmers’ fields, but with limited spatial coverage and high cost per data point. Farmer recall data might be an alternative from nationally representative surveys, but these data are subject to bias and often constrained by errors in yield and area estimates. Remote sensing products are improving all the time (e.g. the Sentinel satellites that potentially provide 10m2 resolution every 5 d), but even the best analytical techniques, such as machine learning, combing satellite data and models can’t account for mote than about 40% of the variation in USA – and much less than that in SSA.
This brings us to a second question: do we understand enough about observed variation in farmer yield to be able to predict it? It seems not. Even in spatial data sets with high-resolution yield data and many weather, agronomy and soil variables, predictability is still low. Of course as crop scientists we are disappointed, made worse by my economist friends who say ‘duh – what do you expect?’ For economists, production functions are a more useful way of looking at yield; but that is another story. One can think of several reasons why predictions are poor, aside from data quality issues. One reason may be that we are not measuring the right variables – especially crop management variables such as timing of key practices. It is well known in drought studies, for example, that a few days difference in flowering date can significantly impact yield. The corollary would be the timing and frequency of weeding or fertilizer applications, events that are rarely well recorded in on-farm data. Another reason maybe that we are not capturing the spatial dimension adequately. We assume that soils, weather, topography for example defines this, but maybe not. This is born out by a common observation in spatial datasets such as surveys that including dummy variables such as District or Ward improve the prediction! What this spatial dimension is I don’t know, but watch the space.