At Mutus Tech, we have developed a multimodal carbon flux dataset that integrates climate variables, remote sensing imagery, and soil properties to support high-resolution carbon cycle modelling and AI-powered environmental forecasting systems.
Multimodal Dataset Composition
Our dataset fuses multiple sources of environmental data:
- Climate Time Series: Hourly meteorological variables (temperature, precipitation, solar radiation, humidity, wind speed/direction).
- Remote Sensing Imagery: MODIS surface reflectance, vegetation indices (NDVI, EVI), and land cover classifications.
- Soil Characteristics: Derived from SoilGrids, covering pH, organic carbon, sand/silt/clay proportions, and water holding capacity.
- Carbon Flux Targets: Hourly measurements from eddy covariance towers (GPP, Reco, NEE).
Each sample is linked to its corresponding timestamp, geographic location, and ecosystem type (e.g., cropland, grassland, forest), making it ideal for spatiotemporal modelling.
Our Structured Environmental Data Framework
Input Data Modalities
Level 1 – Core Climatic Inputs
- Temperature (2m, soil)
- Precipitation
- Radiation
- Wind (u/v components)
- Vapour Pressure Deficit (VPD)
- Surface pressure
Level 2 – Land and Soil Features
- Soil pH
- Soil moisture & texture
- Land use/land cover class
- Elevation & slope
- Vegetation indices (NDVI, LAI, FPAR)
Level 3 – Target and Historical Signals
- Net ecosystem exchange (NEE)
- Gross primary productivity (GPP)
- Ecosystem respiration (Reco)
- Historical flux trends
- Management practices (cropping, irrigation, fertilisation if available)
Simulated and Real-World Carbon Flux Samples

To support robust carbon modelling in agricultural ecosystems, we curated a high-quality dataset comprising 39 real-world eddy covariance tower sites located in global cropland regions. These sites cover a wide range of climatic and soil conditions across North America and Europe.
The dataset includes:
- 39 real-world agricultural flux tower sites
- Multimodal daily to hourly inputs (climate, satellite imagery, soil properties)
- Millions of aligned observations, supporting model training and temporal generalisation
This curated real-world dataset provides a scientifically grounded benchmark for evaluating multimodal spatio-temporal learning frameworks.
Applications and Future Work
This dataset enables the development of advanced spatiotemporal models for:
- Carbon flux forecasting under climate change
- Agricultural GHG emission assessment
- Remote sensing-based carbon cycle analytics
- AI-driven land management decision tools
We continue to expand the dataset by integrating new sources (e.g., ERA5-Land, Sentinel, drone imagery) and validating results through global collaborations and flux site campaigns.
Data Sample Access
A representative sample of our carbon flux dataset is available for research and testing. It includes formatted input features, aligned output labels, and metadata needed for model development.