Ready, Steady…. Analyse Data

Using Earth Observation data has traditionally involved a steep learning curve, with many steps between downloading a product and extracting usable information. Seasoned EO data users have long since learned how to handle this complexity, but the wealth of available data is attracting many new users keen to start applying the data but with no desire to study it in depth. What’s more, not all of these users are human as Machine Learning needs standardised data to identify patterns or anomalies.

For these reasons, the demand for Analysis Ready Data (ARD) is on the rise. The general concept is simple: all repetitive pre-processing should be done by the provider and not by the end-user. But a closer look shows that there is no single understanding of what constitutes ARD, leaving open questions such as: Ready for whom? To do what?

So, what counts as ARD? Loosely speaking, it means getting the data ready to be used by a non-expert (in this context meaning someone with a background in GIS but not EO), or as input for machine learning. For optical imagery this typically means having data clipped to standard tiles, co-registered, atmospherically corrected and radiometrically calibrated to allow cross-sensor comparison. Masks are provided for cloud, shadow and snow, all of which can confuse an algorithm. Often, the metadata are provided in a standard format to make it machine readable. A similar process takes place for SAR data.

The aim of this process is to eliminate sensor induced variations (such as different pixel size, viewing angles or daylight conditions), so that differences between two ARD products reflect real changes in the conditions on the ground, though in practice some variations will always remain.

How important this is depends on the use; for services such as ship detection there are limited benefits to using ARD, but for services that depend on accurate mapping or change detection it is essential to be able to precisely relate different images.

Some of the earliest ARD work was done for Landsat, but the concept has been enthusiastically adopted by operators of large constellations such as Planet and Maxar. These have large fleets of satellites, each carrying sensors with subtly different characteristics, yet their business model relies on selling data to a wide range of users who are not all EO experts. Providing the products as ARD means that more people can use it directly, increasing the potential market for their products.

For Copernicus, Analysis Ready Data is provided through the Sentinel Hub, via the Euro Data Cube, or as part of the Sentinel-2 Global Mosaic Service, although it has not been adopted fully or consistently. This is very much a live issue under discussion by the community.

The advantages of ARD are obvious, but what are its limitations? Applying the required corrections does take time, which could be an issue for time-critical applications. There is also the risk for pre-processing to remove real variations.

But the real difficulties become apparent when looking at interoperability across missions, as the pre-processing done by one provider may not be consistent with that done by another. How do we relate Sentinel-2 with Planet, or Sentinel-1 with ICEYE?

To address such issues the Committee on Earth Observation Satellites (CEOS) has defined a strategy for standardising ARD. This includes a set of requirements for products to be considered as CEOS Analysis Ready Data for Land (CARD4L). This is a two-stage process, with ARD providers being expected to first aim for a minimum level of compliance before upgrading to the full target compliance. At the moment, only free and open data sets have been assessed against the CEOS methodology. So far, only Landsat data are deemed to be CARD4L compliant whereas other missions (including the Sentinels) are undergoing assessment. Commercial providers are involved in these initiatives, but it is not clear whether being formally CARD4L compliant presents a real commercial advantage for them.

In summary, by reducing the EO learning curve, ARD has the potential to greatly increase the market for EO data thus benefiting both users and providers. The real challenge though is making the different types of ARD consistent with each other. This is being addressed for publicly available data although it remains to be seen whether an equivalent push will follow for the many different sources of commercial data.