Seqana Assignment

Soil data visualization

The following map displays soil organic carbon (SOC) stocks in Leeds a town in Columbia County, Wisconsin, United States from 2009.

The SOC stocks are colored according to the value of each sample’s metric ton of carbon per hectare \(t/ha\).

#read vector data
sample_data <- read_sf("seqana_gis_solutions_eng_challenge_existing_sample_data.gpkg")
#visualize with mapview
mapview(sample_data, map.types = "Esri.WorldImagery", alpha.regions = 0.5, zcol="soc_stock_t_ha")

Soil data statistical analysis

The SOC stocks mean value is 43.52 \(t/ha\), the median is 42.15 \(t/ha\), the standard deviation is 9.565.845 \(t/ha\), the max value is 72.20 \(t/ha\), and the minimum value is 9/10 \(t/ha\).

In the histogram we can see a normal distribution of the SOC stocks. Where most of the samples are between 30-50 \(t/ha\).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.10   37.50   42.15   43.52   48.58   72.20

## [1] 9.565845

SOC stock co-variates

The following code adds the co-variate raster layers to the script and extracts the values based on the SOC stock point layer.

#read raster layers
ph <- raster("open_land_map_soil_ph.tif")
dem <- raster("CGIAR_SRTM90_V4.tif")
ndvi <- raster("landsat7_c01_t1_annual_ndvi_2009.tif")
clay <- raster("open_land_map_soil_clay.tif")
#extract raster values and add to sample data
ph_p <- extract(ph, sample_data)
dem_p <- extract(dem, sample_data)
ndvi_p <- extract(ndvi,sample_data)
clay_p <-extract(clay, sample_data)
sample_data<-cbind(sample_data, ph_p,dem_p,ndvi_p,clay_p)
#check distribution of co-variates
hist(sample_data$ph_p)

hist(sample_data$dem_p)

hist(sample_data$ndvi_p)

hist(sample_data$clay_p)

summary(sample_data[,4:7])

##       ph_p           dem_p           ndvi_p            clay_p     
##  Min.   :58.00   Min.   :317.0   Min.   :0.08457   Min.   :17.00  
##  1st Qu.:59.00   1st Qu.:319.0   1st Qu.:0.25317   1st Qu.:18.00  
##  Median :60.00   Median :320.0   Median :0.28386   Median :21.00  
##  Mean   :59.94   Mean   :319.6   Mean   :0.29294   Mean   :19.86  
##  3rd Qu.:60.75   3rd Qu.:320.0   3rd Qu.:0.36445   3rd Qu.:21.00  
##  Max.   :62.00   Max.   :322.0   Max.   :0.45280   Max.   :21.00  
##             geom    
##  POINT        :194  
##  epsg:4326    :  0  
##  +proj=long...:  0  
##                     
##                     
##

Pearson correlation measures the strength of the linear relationship between two variables (x, y) and the product of their standard deviations. It is a normalized measurement of the covariance, such that the result always has a value between −1 and 1. Where a value of -1 meaning a total negative linear correlation, 0 being no correlation, and + 1 meaning a total positive correlation, and if the p-value is < 5%, then the correlation between x and y is significant.

#correlation between co-variantes
cor.test(sample_data$soc_stock_t_ha, sample_data$ph_p)

## 
##  Pearson's product-moment correlation
## 
## data:  sample_data$soc_stock_t_ha and sample_data$ph_p
## t = -1.5896, df = 192, p-value = 0.1136
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2508175  0.0273442
## sample estimates:
##        cor 
## -0.1139697

cor.test(sample_data$soc_stock_t_ha, sample_data$dem_p)

## 
##  Pearson's product-moment correlation
## 
## data:  sample_data$soc_stock_t_ha and sample_data$dem_p
## t = -0.76206, df = 192, p-value = 0.447
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.19428579  0.08663107
## sample estimates:
##         cor 
## -0.05491394

cor.test(sample_data$soc_stock_t_ha, sample_data$ndvi_p)

## 
##  Pearson's product-moment correlation
## 
## data:  sample_data$soc_stock_t_ha and sample_data$ndvi_p
## t = 2.1889, df = 192, p-value = 0.02981
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.01550011 0.29052300
## sample estimates:
##       cor 
## 0.1560342

cor.test(sample_data$soc_stock_t_ha, sample_data$clay_p)

## 
##  Pearson's product-moment correlation
## 
## data:  sample_data$soc_stock_t_ha and sample_data$clay_p
## t = 0.015655, df = 192, p-value = 0.9875
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1397673  0.1419820
## sample estimates:
##         cor 
## 0.001129806

According to the results of the Pearson correlation test the SOC stocks are slightly dependent on the NDVI of the soil.

Stratification approach

Each co-variate can affect the variability of the soil. For example, moisture, vegetation cover, hydrologic conditions, management history or other variables that might be affecting the topsoil of the SOC.

According to the methodology we can stratify the sample data at hand with the co-variate but we would need a satellite image or a GPS device with a precision of 4 meters.

To stratify the following steps would be taken -

Calculate the minimum number of sampling points to be collected based on 3 parameters- hectares of grassland in the AOI, satellite calibration points, and increase of 30% for data validation.
Decide on the stratifying factor - in this case I would choose to divide into stratum based on the NDVI since there is a correlation with the SOC.
Use raster data of NDVI for the plot to determine NDVI ranges - meaning that I would see which areas in the AOI can be grouped to low NDVI, medium NDVI and high NDVI.
In each NDVI boundary I will compute random points in QGIS or R.
The decision on how many times to compute random points will be based on the results of at least 2 compuations.
When sampling the guidelines provided by the Regen Network will be followed.
Lastly I will use standard sampling equations (see bellow) to calculate the SOC stocks potential mean and total based on the collected samples and compare between confidence intervals and estimated errors to conclude the most accurate results.

\(N\) is the total population size.
\(n\) is the sample size.
\(s^2\) is the sample variance.
\(\overline{y}\) is the estimated mean.
\(\hat{t}\) is the estimated total.
SEM/SET is the standard error of the mean/total.

Equations to calculate the estimated \(\overline{y}\).

\[\begin{equation} \hat{var}(\overline{y})= (\frac{N-n}{N})(\frac{s^2}{n}) \end{equation}\]

\[\begin{equation} SEM = \sqrt{\hat{var}(\overline{y})} \end{equation}\]

Equations to calculate the estimated \(\hat{t}\).

\[\begin{equation} \hat{t} = N{\overline{y}} \end{equation}\]

\[\begin{equation} \hat{var}(\hat{t})= N^2\hat{var}(\overline{y}) \end{equation}\]

\[\begin{equation} SET = \sqrt{\hat{var}(\hat{t})} \end{equation}\]

Seqana Assignment

Einav Grinberg

1/31/2022

Soil data visualization

Soil data statistical analysis

SOC stock co-variates

Stratification approach