Chapter 6 Marketing Analysis II

This page is based on Hal R. Varian’s paper (2016), Causal Inference in Economics and Marketing.

Problem: Causal effect of ad spending on dollar sales of a product.

A motivating example:

Think about that an analyst has a dataset of hockey equipment sales, \(y_c\), and televesion ads, \(x_c\), in two cities in two different countries. Let’s assume that these cities are Toronto, Canada and Cape Town, South Africa. Suppose that the dataset indicate that the advertiser spent $1 on television advertising in Cape Town and observed $10 sales in per capita, whereas in Toronto, the advertiser spent $10 per capita and observed $100 sales per capita. Hence, the model \(y_c=10x_c\) fits the data perfectly.

However, the analyst may not be convinced that increasing per capita spend in Cape town to $10 would result in hockey equipment sales of $100 per capita. The analyst then start to think about why his/her regression model is wrong.

A motivating problem:

The problem in this regression model is that there is an omitted variable, which we may call “interest in hockey”. Interest in hockey is high in Toronto and low in Cape Town. Moreover, the advertisers that determine ad spend presumably know this, and they choose to advertise more where interest is high and less where it is low. Therefore, this omitted variable affects both \(y_{c}\) and \(x_{c}\).

To express this point mathematically, think of \((y,x,e)\) as being the population analogs of the sample \((y_{c}, x_{c},e_{c})\). The regression coefficient is given by \(\beta_{1}=cov(x,y)/cov(x,x)\). Substituting \(y=\beta_{1}x+e\), we have

\(\beta_{1}=cov(x,\beta_{1}x+e)/cov(x,x)=\beta_{1}+cov(x,e)/cov(x,x)\).

The regression coefficient will be unbiased when \(cov(x,e)=0\).

Key points:

The simple regression may be just fine if we aim to predict hockey equipment sales as a function of hockey ad spend and assuming that the advertiser’s behaviour remains constant. However, in real life we usually want to know how hockey equipment receipts would respond to a change in the advertiser’s behaviour.

The error term, \(e\), in above setting represents many unknown factors that may affect the outcome (sales) only. The problem in this example arises since the “interest in hockey” affects both the outcome (sales) and the predictor (ads); therefore, the simple regression of sales on ads will not give us a good estimate of the causal effect. The causal effect will be the answer to the following question:

what would happen to sales if we explicitly intervened and changed ad expenditure across the board.

Note that the simple regression of \(y_{c}\) on \(x_{c}\) would overestimate the effect of ad on sales since cities with high interest in hockey may have high ad expenditure and high hockey equipment sale receipts.

Think about many other confounding variables such as seasonality and weather, which aren’t accounted for in this regression model. Everyone knows that adding an extra predictor to the regression model will typically change the estimated coefficients on the other predictors because the relevant predictors are generally correlated with each other. Although this phenomenon is known very well, in many modeling exercises “analysts assume that the predictors we do not observe are magically orthogonal to the predictors we do observe”.

Following Varian (2016), “the ideal data, from the viewpoint of the analyst, would be data from an incompetent advetiser who allocated expenditures randomly across cities. If ad expenditure is truly random, then we do not have to worry about confounding variables because the predictors will automatically be orthogonal to the error term”.

Varian (2016) continues that “however, statisticians are seldom lucky enough to have a totally incompetent client”.

The ideal way to estimate causal effect of advertising on sales is to run a controlled experiment. In this method, the control group provides an estimate of the counterfactual: what would happened without ad exposures.

Another motivating problem:

Think about another classic example: “there are often more police in precincts with high crime”.

“If our data was generated by policymakers who assigned police to areas with high crime, then the observed relationship between police and crime rates could be highly predictive for the historical data, but not useful in predicting the causal impact of explicitly assigning additional police to a precinct” – this is because the policmakers know the “need for more police” in high crime precincts.