1 Week 7: Time Series Analysis w/ Satellite Data

1.1 Introduction

Analyzing trends in climate and environmental data requires specific statistical techniques for time series analysis to detect, model, and forecast. These are generic methods that can be used for any time series data. We will cover three methods for establishing the presence of a trend, including a generic linear trend analysis:

  • Linear Trend Analysis
  • Mann-Kendall Trend Analysis
  • Vector Autoregression (VAR) Model

Additionally, we will explore Granger causality testing, that extends VAR model to determine whether one environmental variable influences another.

1.2 Linear Trend Analysis

Linear trend analysis is one of the simplest methods for identifying trends in time series data. It fits a straight line to data points, using the equation:

\[ Y_t = a + bt + \varepsilon_t \]

where: - \(Y_t\) is the dependent variable (e.g., temperature, CO2 concentration), - \(t\) is time, - \(a\) is the intercept, - \(b\) is the slope (representing the rate of change over time), and - \(\varepsilon_t\) is the error term.

1.2.1 Pros of Linear Trend Analysis

  • Easy to interpret: The slope provides a direct measure of trend magnitude.
  • Computationally simple: Requires minimal computational power.
  • Applicable for long-term trends: Useful when changes over time follow a linear pattern.

1.2.2 Cons of Linear Trend Analysis

  • Assumes linearity: Not suitable for data with cyclical or non-linear trends.
  • Sensitive to outliers: Extreme values can heavily influence results.
  • Ignores seasonal and autocorrelated patterns: May not capture complex environmental fluctuations.

1.3 Mann-Kendall Trend Analysis

The Mann-Kendall (MK) test is a non-parametric method used to detect trends in time series without assuming a specific distribution. It is based on comparing the relative directions of data points over time. Many other non-parametric methods for time series are built upon or extensions of the Mann-Kendall test.

The MK test statistic \(S\) is calculated by summing the signs of differences between all pairs of observations:

\[ S = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \text{sign}(Y_j - Y_i) \]

where \(\text{sign}(x)\) is +1, 0, or -1 depending on whether \(x\) is positive, zero, or negative.

1.3.1 Pros of Mann-Kendall Trend Analysis

  • Non-parametric: Does not assume normality or a specific trend shape.
  • Handles missing data well: Robust to gaps in time series.
  • Effective for detecting monotonic trends: Useful in analyzing gradual climate changes.

1.3.2 Cons of Mann-Kendall Trend Analysis

  • Less powerful for short time series: Requires a sufficiently long dataset to detect trends.
  • Cannot quantify trend magnitude directly: Only determines trend direction. The tau output is more analagous to a correlation coefficient.
  • Sensitive to autocorrelation: Can produce misleading results if data points are highly dependent on previous values.

1.4 Vector Autoregression (VAR) Model

While trend analysis helps in detecting changes over time, it does not capture relationships between multiple environmental variables. Vector Autoregression (VAR) is a multivariate time series model that accounts for interactions among variables. There are tons of types of autoregressive models such as ARIMA, but VAR is (relatively…) readily interpretable and allows us to add in Granger causality.

In a VAR model, each variable is expressed as a function of its past values and the past values of other variables:

\[ Y_t = c + \Phi_1 Y_{t-1} + \Phi_2 Y_{t-2} + \dots + \Phi_p Y_{t-p} + \varepsilon_t \]

where: - \(Y_t\) is a vector of environmental variables (e.g., temperature, CO2 levels, precipitation), - \(\Phi\) are coefficient matrices showing the influence of past values, - \(p\) is the lag order, - \(\varepsilon_t\) is the error term.

1.4.1 Pros of VAR Model

  • Captures interactions between multiple variables: Ideal for studying interdependencies in climate systems.
  • Accounts for past values: Considers historical influences when forecasting future changes.

1.4.2 Cons of VAR Model

  • Requires large datasets: Needs sufficient historical data to estimate parameters reliably.
  • Sensitive to lag selection: Choosing an inappropriate lag length can lead to poor model performance.
  • Difficult to interpret: Large coefficient matrices make complex models harder to analyze.

1.5 Granger Causality Testing in VAR Models

An extension of the VAR model, Granger causality testing, determines whether one time series can predict another. It does not imply direct causation but helps in identifying lead-lag relationships between environmental variables.

For example, in climate studies, Granger causality can test whether rising CO2 levels “cause” temperature increases, meaning past CO2 values help predict future temperature changes. The plain English description is that you are looking to see if you have significant improvement in the prediction of a time series when including another variable rather than just including itself autoregressively.

1.5.1 How Granger Causality Works

The test compares two models: 1. A model where variable \(X_t\) is predicted only using its past values. 2. A model where \(X_t\) is predicted using its past values and past values of another variable \(Y_t\).

If including past values of \(Y_t\) significantly improves predictions of \(X_t\), then \(Y_t\) Granger-causes \(X_t\).

1.5.2 Limitations of Granger Causality

  • Correlation does not imply causation: The test only measures predictive relationships, not direct physical causation.
  • Dependent on lag selection: Choosing the wrong lag length can produce misleading results.
  • Assumes stationarity