This project uses satellite imagery on Google Earth Engine (GEE) to predict canopy height and estimate carbon content in the University of Malaya (Kuala Lumpur, Malaysia) area.
- Prerequisites: Obtain access to Google Earth Engine Code Editor and have a GEE account. If not, please register here.
- Installation: No installation required; access GEE platform and import necessary scripts.
- Example snippets to execute canopy height prediction and carbon estimation using GEE's JavaScript.
- Instructions on how to run the scripts within the GEE platform.
-
Sentinel-1 (C-band synthetic aperture radar SAR): Sentinel-1 is a space mission by the European Union (EU) and carried out by ESA under the Copernicus Program.
- VV and VH bands capture signals reflected from the Earth's surface, including vegetation and terrain.
- Use VV and VH as predictor variables. These bands offer radar backscatter values that can be related to vegetation structure, including canopy height, as validation with GEDI Data.
-
Sentinel-2: Sentinel-2 is a European multi-spectral imaging mission with high resolution.
- Included 12 spectral bands with resolutions of 10m, 20m, and 60m.
- Primary objective: monitoring variability in land surface conditions.
- Select bands 'B2’, 'B3’, 'B4’, 'B5’, 'B6’, 'B7’, 'B8’, 'B11’, 'B12’ as predictor variables. → Manual feature selection
- Take all the 12 bands from ’B1’ to ‘B12’ as independent variables. → Automatic feature selection
-
NASA SRTM Digital Elevation 30m: SRTM data is used to construct a global digital elevation model.
- SRTM was conducted aboard the space shuttle Endeavour from February 11-22, 2000.
- Extract ”Slope” and “Elevation” as predictor variables.
-
European Space Agency (ESA) WorldCover 10m: The ESA dataset was launched since there is a massive need for accurate, timely, and high-resolution information on land cover, land use, and earth surface change.
- Use value 10 (Tree Cover only) as a predictor variable.
-
GEDI L2A Raster Canopy Top Height: GEDI Light Detection and Ranging (Lidar) is an active remote sensing technology that uses a laser pulse to measure distance.
- Primarily used for measuring tree height, aboveground biomass, as well as leaf area index.
- Using 'rh98' (relative height metrics at 98%) to be the sampling point.
-
GEDI L4B Gridded Aboveground Biomass Density: This dataset encompasses gridded information on aboveground biomass density derived from GEDI observations. These data provide insights into vegetation growth and distribution, offering valuable information for understanding global vegetation health and the impacts of climate change.
- Using 'MU' (mean aboveground biomass density, which estimated mean AGBD for the 1 km grid cell, including forest and non-forest.) to be the sampling point.
Important Note: This example code illustrates the process of conducting canopy height regression modeling utilizing the Random Forest model along with manual feature selection.
// Choose boundary on GEE
var boundary = ee.Geometry.Polygon(/* Coordinates */);
// Create a polygon feature
var polygon = ee.Feature(ee.Geometry.Polygon(boundary.coordinates()));
// Create a feature collection from the polygon
var boundary_feature = ee.FeatureCollection([polygon]);
// Print the feature collection
print('Boundary:', boundary_feature);
// Load Sentinel-1 for the non-rainy season.
var S1_PRS = ee.ImageCollection('COPERNICUS/S1_GRD')
.filterDate('2021-04-01', '2021-06-30')
.filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VV'))
.filter(ee.Filter.listContains('transmitterReceiverPolarisation', 'VH'))
.filter(ee.Filter.eq('instrumentMode', 'IW'))
.filter(ee.Filter.eq('orbitProperties_pass', 'ASCENDING'))
.filter(ee.Filter.bounds(boundary));
// Load Sentinel-2 spectral reflectance data.
var s2 = ee.ImageCollection('COPERNICUS/S2_SR');
// Load SRTM
var SRTM = ee.Image("USGS/SRTMGL1_003");
// Clip Elevation
var elevation = SRTM.clip(boundary);
// Load ESA World cover data
var dataset = ee.ImageCollection("ESA/WorldCover/v100").first();
// Clip the land cover to the boundary
var ESA_LC_2020 = dataset.clip(boundary);
// Extract forest areas from the land cover value = 10
var forest_mask = ESA_LC_2020.updateMask(
ESA_LC_2020.eq(10) // Only keep pixels where class equals 2
);
// Clip land cover to the boundary and visualize forests
// Display forests only from ESA imagery.
var visualization = {bands: ['Map'],};
Map.addLayer(forest_mask, visualization, "Trees");
// Import the "GEDI02_A_002_MONTHLY" dataset
var dataset = ee.ImageCollection('LARSE/GEDI/GEDI02_A_002_MONTHLY')
.map(qualityMask)
.select('rh98').filterBounds(boundary);
// Create a palette to visaulize the dataset
var gediVis = {
min: 1,
max: 60,
palette: 'darkred, red, orange, green, darkgreen',
};
// Set the map center and visualize the dataset
Map.setCenter(101.6556468417507, 3.127386383953448, 13);
Map.addLayer(dataset, gediVis, 'rh98');
// Prepare inter-quartile range (IQR)
var S1_PRS_pc = S1_PRS.reduce(ee.Reducer.percentile([25,50,75]));
// Convert to natural units (linear units, which can be averaged)
var S1_PRS_pc = ee.Image(10).pow(S1_PRS_pc.divide(10));
var S1_PRS_pc_Feats = S1_PRS_pc.select(['VH_p50', 'VV_p50']).clip(boundary);
// Reproject to MY zone
var S1_PRS_pc_Feats = S1_PRS_pc_Feats.reproject({crs: 'EPSG:32647', scale:30});
// Check projection information
print('Projection, crs, and crs_transform:', S1_PRS_pc_Feats.projection());
// Also, Sentinel-2 and SRTM images have to be reprojected.
//Reproject to MY zone
var S2_composite = composite.median().reproject({crs: 'EPSG:32647', scale:30});
// Check projection information
print('S2_Projection, crs, and crs_transform:', S2_composite.projection());
// Reproject 'elevation' to MY zone
var elevation = elevation.reproject({crs: 'EPSG:32647', scale:30});
// Check projection information
print('elevation_Projection, crs, and crs_transform:', elevation.projection());
// Derive slope from SRTM
var slope = ee.Terrain.slope(SRTM).clip(boundary);
// Reproject 'slope' to MY zone
var slope = slope.reproject({crs: 'EPSG:32647', scale:30});
// Check projection information
print('slope_Projection, crs, and crs_transform:', slope.projection());
// Calculate IQR for the VV and VH polarizations
// Calculate IQR for the VV polarization
var PRS_VV_iqr = S1_PRS_pc_Feats
.addBands((S1_PRS_pc.select('VV_p75')
.subtract(S1_PRS_pc.select('VV_p25')))
.rename('VV_iqr'));
// Calculate IQR for the VH polarization
var PRS_VH_iqr = S1_PRS_pc_Feats
.addBands((S1_PRS_pc.select('VH_p75')
.subtract(S1_PRS_pc.select('VH_p25')))
.rename('VH_iqr'));
// Print the image to the console
print('Post-rainy Season VV IQR', PRS_VV_iqr);
// Print the image to the console
print('Post-rainy Season VH IQR', PRS_VH_iqr);
// Display S1 IQR imagery
Map.addLayer(PRS_VV_iqr.clip(boundary), {'bands': 'VV_iqr', min: 0, max: 0.1}, 'Sentinel1 IW VV');
Map.addLayer(PRS_VH_iqr.clip(boundary), {'bands': 'VH_iqr', min: 0, max: 0.1}, 'Sentinel1 IW VH');
// Create masks, filter clouds from Sentinel-2, and prepare composite imagery
// Create a function to mask clouds using the Sentinel-2 QA band.
function maskS2clouds(image) {
var qa = image.select('QA60');
//Bits 10 and 11 are clouds and cirrus, respectively.
var cloudBitMask = ee.Number(2).pow(10).int();
var cirrusBitMask = ee.Number(2).pow(11).int();
//Both flags should be set to zero, indicating clear conditions.
var mask = qa.bitwiseAnd(cloudBitMask).eq(0).and(
qa.bitwiseAnd(cirrusBitMask).eq(0));
//Return the masked and scaled data.
return image.updateMask(mask).divide(10000);
}
// Filter clouds from Sentinel-2 for a given period.
var composite = s2.filterDate('2021-04-01', '2021-06-30')
// Pre-filter to get less cloudy granules.
.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))
.map(maskS2clouds)
.select('B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12');
// Display a composite S2 imagery
Map.addLayer(S2_composite.clip(boundary), {bands: ['B11', 'B8', 'B3'], min: 0, max: 0.3}, 'S2_composite');
// Merge predictor variables
var mergedCollection = S2_composite
.addBands(PRS_VV_iqr)
.addBands(PRS_VH_iqr)
.addBands(elevation)
.addBands(slope)
.addBands(forest_mask);
// Clip to the output image to Uni Malaya study area boundary.
var clippedmergedCollection = mergedCollection.clipToCollection(boundary_feature);
print('clippedmergedCollection: ', clippedmergedCollection);
// Bands to include in the classification
var bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B11', 'B12', 'VV_iqr', 'VH_iqr', 'elevation', 'slope', 'Map'];
// Prepare training dataset from GEDI
// GEDI's Level 2A Geolocated Elevation and Height Metrics Product (GEDI02_A) is primarily composed of
// 100 Relative Height (RH) metrics, which collectively describe the waveform collected by GEDI.
// The original GEDI02_A product is a table of point with a spatial resolution (average footprint) of 25 meters.
// Sample 10,000 points from the dataset (Depending on the area in which GEDI received the points more or less)
var points = mosaic.sample({
region: boundary,
scale: scale,
numPixels: 10000,
projection: projection,
seed:66,
geometries: true});
// Check the number of the training points
print('Size: ', points.size());
print(points.limit(10));
// Display the training points
Map.addLayer(points, {}, 'Sampling points');
// Add a random column (named random) and specify seed value for repeatability.
var datawithColumn = points.randomColumn('random', 27);
// Seperate 70% for training, 30% for validation
var split = 0.7;
var trainingData = datawithColumn.filter(ee.Filter.lt('random', split));
// Print the training data
print('training data', trainingData);
var validationData = datawithColumn.filter(ee.Filter.gte('random', split));
// Print the testing (validation) data
print('validation data', validationData);
// Train a random forest regression model
// Collect training data
var training = clippedmergedCollection.select(bands).sampleRegions({
collection: trainingData,
properties: ['rh98'],
scale: 30 // Need to change the scale of training data to avoid the 'out of memory' problem
});
// Train a random forest classifier for regression
var classifier = ee.Classifier.smileRandomForest({numberOfTrees:50, seed:3}) // 50 trees
.setOutputMode('REGRESSION')
.train({
features: training,
classProperty: 'rh98',
inputProperties: bands
});
// Run the classification and clip it to the boundary
var regression = clippedmergedCollection.select(bands)
.classify(classifier, 'predicted')
.clip(boundary);
print("regression: ", regression);
// Visualize regression output and define a palette
var palette = ['#f7fcb9','#addd8e','#31a354'];
// Display the regression output.
// Get dictionaries of min & max predicted value
var regressionMin = (regression.reduceRegion({
reducer: ee.Reducer.min(),
scale: 30,
crs: 'EPSG:32647',
geometry: boundary,
bestEffort: true,
tileScale: 5
}));
var regressionMax = (regression.reduceRegion({
reducer: ee.Reducer.max(),
scale: 30,
crs: 'EPSG:32647',
geometry: boundary,
bestEffort: true,
tileScale: 5
}));
var regressionAvg = (regression.reduceRegion({
reducer: ee.Reducer.mean(),
scale: 30,
crs: 'EPSG:32647',
geometry: boundary,
bestEffort: true,
tileScale: 5
}));
// Add to map
var viz = {palette: palette, min:regressionMin.getNumber('predicted').getInfo(), max:regressionMax.getNumber('predicted').getInfo()};
Map.addLayer(regression, viz, 'Regression');
// Check model performance
print("Regression Min & Max: ", regressionMin.getNumber('predicted').getInfo(), regressionMax.getNumber('predicted').getInfo());
print("Regression Average: ", regressionAvg.getNumber('predicted').getInfo());
// Create model assessment statistics
// Get predicted regression points in same location as training data
var predictedTraining = regression.sampleRegions({collection: trainingData, geometries: true});
// Seperate the observed (rh98_GEDI) & predicted (regression) properties
var sampleTraining = predictedTraining.select(['rh98', 'predicted']);
// Compute Root Mean Squared Error (RMSE)
// Get array of observation and prediction values
var observationTraining = ee.Array(sampleTraining.aggregate_array('rh98'));
var predictionTraining = ee.Array(sampleTraining.aggregate_array('predicted'));
// Compute residuals
var residualsTraining = observationTraining.subtract(predictionTraining);
// Compute RMSE (Training) with equation and print the result
var rmseTraining = residualsTraining.pow(2).reduce('mean', [0]).sqrt();
print('Training RMSE', rmseTraining);
//Perform validation
// Get predicted regression points in same location as validation data
var predictedValidation = regression.sampleRegions({collection: validationData, geometries: true});
// Seperate the observed (rh98_GEDI) & predicted (regression) properties
var sampleValidation = predictedValidation.select(['rh98', 'predicted']);
// Compute RMSE (Validation)
// Get array of observation and prediction values
var observationValidation = ee.Array(sampleValidation.aggregate_array('rh98'));
var predictionValidation = ee.Array(sampleValidation.aggregate_array('predicted'));
// Compute residuals
var residualsValidation = observationValidation.subtract(predictionValidation);
// Compute RMSE with equation and print the result
var rmseValidation = residualsValidation.pow(2).reduce('mean', [0]).sqrt();
print('Validation RMSE', rmseValidation);
// Export the regression output
Export.image.toDrive({
image:regression,
description: 'UM_TCH_GEDI_2021', //Please change your description here.
scale: 20,
crs: 'EPSG:32647',
maxPixels: 6756353855,
region: boundary
});
Table 1 showcases the outcomes of various machine learning models in predicting canopy height based on the satellite imagery and geographical data utilized in this project:
No | Machine Learning Model | Min. Canopy Height (m) | Avg. Canopy Height (m) | Max. Canopy Height (m) |
---|---|---|---|---|
1 | Random Forest (RF) | 7.435 | 13.69 | 20.847 |
2 | RF with PCA | 5.81 | 11.714 | 23.469 |
3 | Gradient Boost Tree Regression | 1.466 | 9.182 | 22.049 |
4 | GBTR with PCA | 4.457 | 10.437 | 20.889 |
5 | Support Vector Machine (SVM) | 11.087 | 12.368 | 14.058 |
6 | SVM with PCA | 7.809 | 9.318 | 12.036 |
These results demonstrate the variation in minimum, average, and maximum canopy heights predicted by different machine learning models, highlighting each model's potential to effectively estimate canopy height.
Table 2 represents the assessment metrics of the machine learning models used for canopy height prediction:
No | Machine Learning Model | R² Training (70%) | R² Testing (30%) | RMSE (m) Training | RMSE (m) Testing |
---|---|---|---|---|---|
1 | Random Forest (RF) | 0.655 | 0.355 | 7.954 | 5.523 |
2 | RF with PCA | 0.731 | 0.721 | 6.814 | 4.05 |
3 | Gradient Boost Tree Regression | 0.389 | 0.377 | 8.654 | 6.686 |
4 | GBTR with PCA | 0.577 | 0.685 | 7.716 | 4.196 |
5 | Support Vector Machine (SVM) | 0.604 | 0.193 | 9.905 | 7.764 |
6 | SVM with PCA | 0.169 | 0.48 | 11.085 | 8.511 |
These metrics gauge the performance of each model in predicting canopy height, providing insights into their training and testing R² values along with the Root Mean Squared Error (RMSE). A higher R² value closer to 1 and lower RMSE values indicate better predictive accuracy.
Describe these tables to highlight the significance of each model's performance metrics in predicting canopy height and assessing the model's accuracy using R² and RMSE values.
Using the GEDI L4B dataset and employing the Random Forest modeling method, the aboveground biomass prediction results in the University of Malaya are as follows:
No | Machine Learning Model | Min. AGBD (Mg/ha) | Avg. AGBD (Mg/ha) | Max. AGBD (Mg/ha) |
---|---|---|---|---|
1 | Random Forest (RF) | 21.406 | 41.252 | 68.559 |
No | Machine Learning Model | R² Training (70%) | R² Testing (30%) | RMSE (Mg/ha) Training | RMSE (Mg/ha) Testing |
---|---|---|---|---|---|
1 | Random Forest (RF) | 0.73 | 0.209 | 29.637 | 27.751 |
These metrics showcase the effectiveness of the Random Forest model in predicting aboveground biomass density, indicating its potential to estimate biomass with considerable accuracy.
The total tree land area within the University of Malaya study area has been calculated:
// Tree Land area (hectare, ha) in the Study area (University of Malaya)
// Calculate the tree area in hectares
var treeArea = forest_mask.select('Map');
var areaImage = treeArea.multiply(ee.Image.pixelArea());
// Now that each pixel for the forest class in the image has the value
// equal to its area, we can sum up all the values in the region
// to get the total forested area.
var area = areaImage.reduceRegion({
reducer: ee.Reducer.sum(),
geometry: boundary,
scale: 30, // Specify the scale that matches your projection
maxPixels: 1e9 // Adjust as needed
});
// Convert the forested area to hectares
var treeAreaHectares = ee.Number(area.get('Map')).divide(10000); // 1 square meter = 0.0001 hectares
// Print the tree area in hectares
print('Tree Area (Hectares) in the University of Malaya:', treeAreaHectares);
Tree Area (Hectares) in the University of Malaya: 2109.434 hectares
The carbon storage estimation process involves the computation of aboveground carbon storage based on the calculated aboveground biomass density and the specified tree area in the University of Malaya study area:
// Aboveground Carbon Storage Calculation in Study Area (UM)
// Formula: Aboveground Carbon Storage (Mg, Megagram also known as tonnes) = Aboveground Biomass (Mg/ha) x Carbon Content Factor x Tree Area (ha)
// Define the inputs
var minAbovegroundBiomass = 21.406166582107545; // Min aboveground biomass (Mg/ha)
var maxAbovegroundBiomass = 68.5589729944865; // Max aboveground biomass (Mg/ha)
var avgAbovegroundBiomass = 41.252308736491585; // Avg aboveground biomass (Mg/ha)
var carbonContentFactor = 0.5; // Carbon content factor
var treeArea = 2109.4340347280445; // Tree area 2109.4340347280445 ha
// Calculate Aboveground Carbon Storage
var abovegroundCarbonStorageMin = minAbovegroundBiomass * carbonContentFactor * treeArea;
var abovegroundCarbonStorageMax = maxAbovegroundBiomass * carbonContentFactor * treeArea;
var abovegroundCarbonStorageAvg = avgAbovegroundBiomass * carbonContentFactor * treeArea;
// Print the results
print('Aboveground Carbon Storage (Mg, tonnes) in the University of Malaya:');
print('Minimum Estimate:', abovegroundCarbonStorageMin, 'Mg (tonnes)');
print('Maximum Estimate:', abovegroundCarbonStorageMax, 'Mg (tonnes)');
print('Average Estimate:', abovegroundCarbonStorageAvg, 'Mg (tonnes)');
Aboveground Carbon Stock Estimation Results
Table 5 summarizes the estimates for aboveground carbon stock in the University of Malaya study area:
No | Aboveground Carbon Stock Area | Min. Carbon (Mg) | Avg. Carbon (Mg) | Max. Carbon (Mg) |
---|---|---|---|---|
1 | University of Malaya | 22577.448 | 43509.512 | 72310.316 |
These estimates indicate the potential carbon storage within the specified study area, providing valuable insights into its carbon sequestration capacity and highlighting its significance in environmental assessments and management strategies.
- Canopy Height Prediction Random Forest Model without PCA
- Canopy Height Prediction Random Forest Model with PCA
- Aboveground Biomass Prediction Random Forest Model
This project leveraged satellite imagery and machine learning through Google Earth Engine to estimate canopy height and carbon content in the University of Malaya area. By analyzing diverse datasets like Sentinel-1, Sentinel-2, SRTM, ESA WorldCover, GEDI L2A, and GEDI L4B, we gained valuable insights.
-
Canopy Height Prediction: Various machine learning models were employed, showcasing the potential to estimate canopy height. Random Forest and RF with Principal Component Analysis emerged as promising models.
-
Aboveground Biomass Density: The Random Forest model proved valuable in estimating biomass density, providing insights into vegetation health and carbon sequestration potential.
-
Carbon Storage Assessment: The estimation of aboveground carbon stocks revealed significant reserves within the University of Malaya area, highlighting its role as a crucial carbon sink.
This study sheds light on the University of Malaya's ecosystem, emphasizing its importance in carbon sequestration. Understanding its capacity for storing carbon reinforces the significance of preserving such areas for environmental stability and biodiversity conservation.
The methodologies employed and findings derived from this research offer implications for environmental assessments and conservation strategies. The accuracy of predictive models and detailed carbon stock estimations could serve as a foundational framework for similar studies globally, aiding ecosystem management and informed decision-making.
In summary, this project underscores the University of Malaya's ecosystem's vital role in carbon sequestration, emphasizing the significance of satellite imagery and machine learning in environmental assessments and conservation endeavors.