Latent Profile Analysis

Bob Rietveld

28-08-2019

1 Notes Latent Profile Analysis

Latent Profile Analysis (LPA) is a variant of cluster analysis. Cluster analysis is a statistical technique for finding ‘‘clusters’’ of observations that have similar values on a set of variables.

In this sense it is the “classic” way of segmentation, i.e. finding homogenous groups with hetrogenous attributes.

LPA is a model based approach which means observations ( = customers) obtain a probability of beloning to a class (which is the latent part of LPA).

LPA handles continous variables and allows the researcher to find an optimal solution (number of clusters) based on several fit criteria.

1.1 Use Cases

LPA (or another clustering technique) can be used to:

1.2 Example

We use the following example data to illustrate LPA analysis from a Wholesale distributor. The data reflects the annual spending of business in Portugal.Below you see a sample of the data. Each row is a customer.

More information on the dataset can be found here.

fresh milk grocery frozen detergents_paper delicassen
12669 9656 7561 214 2674 1338
7057 9810 9568 1762 3293 1776
6353 8808 7684 2405 3516 7844
13265 1196 4221 6404 507 1788
22615 5410 7198 3915 1777 5185
9413 8259 5126 666 1795 1451

1.3 How many clusters should we define?

With any cluster analysis, this is a critical question. There are various ways from a analytical standpoint to answer this question. LPA is a unsupervised technique, based on probabilities. The algortihm can therefore determine how many cluster provide and “optimal” fit for the data.

We run a set of clusters (1, n) and options and let the model determine the optimal set of clusters. There are a number of information criteria which can be used to select the number of clusters (e.g. Aikake Information Criterion etc.) for AIC and BIC lower is better. In our examepl a 3 or 4 cluster solution provides the best fit to our data.

Model 1 and Model 6 refer to settings of variance and covariances

We use a more advanced approach defined (Akogul and Erisoglu 2017Akogul, Serkan, and Murat Erisoglu. 2017. “An Approach for Determining the Number of Clusters in a Model-Based Cluster Analysis.” Entropy 19 (9). Multidisciplinary Digital Publishing Institute: 452.) to select the optimal number of clusters. In this case 3 clusters are most appropriate.

1.4 Visualize latent profiles

Once the model / researcher has decided on the number of clusters, meaning must be assigned. It is a matter of interpretation what type of labels are assigned to each cluster. Generally speaking the most obvious / interesting / useful assignments are made.

Size and distribution of the variables can be compared between clusters

Same data as before only not the relative percentages

In our example one could assign the following clusters

cluster fresh milk grocery frozen detergents_paper delicassen
Big spenders 22198 18952 22171 8687 8708 5356
Freshies 13209 2023 2565 3602 371 909
Small spenders 8568 7003 10638 1325 4303 1366

1.5 Using customer segments in subsequent analysis

Once clusters and meaning have been assigned this data can be merged back to the original set to obtain further insights. In our example we also have information on the type of business and region of channel

Same data as before only not the relative percentages

Horeca has relatively more Freshies, which makes sense.

1.6 Visualize similarity between customers

A different way to visualize whether the clusters make sense and get a more indepth understanding of the variability of the cluster solution is using a dimension reduction technique. In this case we use UMAP which takes the original data and reduces it to two dimensions (without loosing too much information).

You can see a grouping of customers which (to some degree) match out cluster solution. The map also demonstrates that some clusters have a wide variety within a cluster (e.g. a Big Spender can vary in their purchase pattern).

This technique provides an individial level estimate of similarity which can be used for downstream analysis.

Points represent our customers. Customers which have a more similar purchase pattern appear closer together.