Explorative Faktorenanalyse – EFA

Übung – Lösungsvorschlag

Author

Prof. Dr. Armin Eichinger

Published

06.11.2024

1 Aufgabe 1: Beispiel Regionen aus der Vorlesung nachvollziehen

# Für den Bartlett-Test:
library(psych)

regionen_data <- read.csv("https://bookdown.org/Armin_E/efa_ex_1/data/regionen.csv",sep=";", dec=",")
# Test: Was passiert, wenn wir dec="." verwenden? Achten Sie auf die Datentypen
regionen_data
      ED   BIP   EL WBIP  GEB   WS
1  212.4 20116  9.8 53.0  8.4 -0.7
2  623.7 24966  3.4 73.1  6.1  3.4
3   93.1 19324 23.6 47.9 12.3 -1.9
4  236.8 23113  8.7 66.8  8.7  2.0
5  412.0 23076  8.9 46.9  8.0 -3.1
6  566.7 24516  6.1 44.3  8.6 -3.0
7  331.9 22187  7.4 57.6 10.3  4.7
8  111.4 20614 16.3 63.8 13.9  5.2
9  489.0 25006  5.7 49.4  6.7 -2.6
10 287.4 23136  8.8 59.4 12.4  1.7
11 166.2 20707 14.1 74.0 13.0  3.6
12 388.1 23624  9.6 54.3  6.9 -0.4
# Bartlett-Test; Achtung: erfordert library(psych)
cortest.bartlett(regionen_data)
$chisq
[1] 48.88061

$p.value
[1] 1.831945e-05

$df
[1] 15
# Matrix der Korrelationen; Achtung: erfordert library(psych)
cor.plot(regionen_data)

# Parallel-Test
fa.parallel(regionen_data, fa="fa")

Parallel analysis suggests that the number of factors =  2  and the number of components =  NA 
# Kaiser-Meyer-Olkin-Kriterium: MSA
kmo_result <- KMO(regionen_data)
kmo_result
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = regionen_data)
Overall MSA =  0.69
MSA for each item = 
  ED  BIP   EL WBIP  GEB   WS 
0.80 0.76 0.77 0.48 0.75 0.48 
#X  ? Faktoren (vgl. Parallel-Test), varimax-Rotation
efa_result <- fa(regionen_data, nfactors = 2, rotate = "varimax")

print(efa_result, digits=2, cut=0.3, sort=TRUE)
Factor Analysis using method =  minres
Call: fa(r = regionen_data, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
     item   MR1   MR2   h2     u2 com
ED      1 -0.95       0.92  0.080 1.1
BIP     2 -0.92       0.86  0.142 1.0
EL      3  0.92       0.86  0.139 1.0
GEB     5  0.78       0.70  0.299 1.3
WS      6        0.99 1.01 -0.009 1.1
WBIP    4        0.83 0.69  0.309 1.0

                       MR1  MR2
SS loadings           3.25 1.79
Proportion Var        0.54 0.30
Cumulative Var        0.54 0.84
Proportion Explained  0.64 0.36
Cumulative Proportion 0.64 1.00

Mean item complexity =  1.1
Test of the hypothesis that 2 factors are sufficient.

df null model =  15  with the objective function =  5.99 with Chi Square =  48.88
df of  the model are 4  and the objective function was  0.25 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  12 with the empirical chi square  0.12  with prob <  1 
The total n.obs was  12  with Likelihood Chi Square =  1.72  with prob <  0.79 

Tucker Lewis Index of factoring reliability =  1.331
RMSEA index =  0  and the 90 % confidence intervals are  0 0.296
BIC =  -8.22
Fit based upon off diagonal values = 1
# Ggf. Faktor-Scores berechnen
factor_scores <- factor.scores(regionen_data,f=efa_result) 
head(factor_scores$scores)
              MR1        MR2
[1,]  0.517827928 -0.6374872
[2,] -1.660317949  1.1297825
[3,]  1.897558319 -1.1397679
[4,]  0.009766179  0.5512555
[5,] -0.327046953 -1.2442510
[6,] -1.021926139 -1.1920421
# Diagramm der Faktorenladungen 
fa.diagram(efa_result)

# Weitere Diagramme

# Achsen festlegen
xlim = c(-2, 2)
ylim = c(-1.5, 1.5)

# Variablen im Faktorraum
plot(factor_scores$scores, xlim=xlim,ylim=ylim)
text(factor_scores$scores, labels = c(1:12), cex = 0.9, pos = 1, font = 1, col = "black")

# Achsen festlegen
xlim = c(-1, 1)
ylim = c(-1, 1.5)

# Regionen im Faktorraum (Faktorwerte)
plot(efa_result$loadings, xlim=xlim,ylim=ylim)
text(efa_result$loadings, labels = colnames(regionen_data), cex = 0.9, pos = 1, font = 1, col = "black")

2 Aufgabe 2: NASA-TLX

tlx_data <- read.csv("https://bookdown.org/Armin_E/explorativ-multivariat-efa/data/NASATLX.csv",sep=";")
head(tlx_data)
  Geistige.Anforderung Körperliche.Anforderung Zeitliche.Anforderung Leistung
1                    5                       1                     5        9
2                    5                       1                     6        9
3                    6                       2                     5        8
4                    4                       1                     4        9
5                    5                       1                     5        9
6                    1                       1                     4       10
  Anstrengung Frustration
1           5           5
2           6           3
3           6           6
4           3           2
5           5           5
6           2           1
# Bartlett-Test
cortest.bartlett(tlx_data)
$chisq
[1] 254.3024

$p.value
[1] 1.612454e-45

$df
[1] 15
# Matrix der Korrelationen
cor.plot(tlx_data)

# Parallel-Test
fa.parallel(tlx_data, fa="fa")

Parallel analysis suggests that the number of factors =  1  and the number of components =  NA 
kmo_result <- KMO(tlx_data)
kmo_result
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = tlx_data)
Overall MSA =  0.85
MSA for each item = 
   Geistige.Anforderung Körperliche.Anforderung   Zeitliche.Anforderung 
                   0.82                    0.90                    0.88 
               Leistung             Anstrengung             Frustration 
                   0.90                    0.78                    0.93 
#X  ? Faktoren (vgl. Parallel-Test), varimax-Rotation
efa_result <- fa(tlx_data, nfactors = 1, rotate = "varimax")

print(efa_result, digits=2, cut=0.3, sort=TRUE)
Factor Analysis using method =  minres
Call: fa(r = tlx_data, nfactors = 1, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                        V   MR1   h2   u2 com
Anstrengung             5  0.87 0.76 0.24   1
Geistige.Anforderung    1  0.85 0.72 0.28   1
Zeitliche.Anforderung   3  0.76 0.58 0.42   1
Leistung                4 -0.64 0.41 0.59   1
Frustration             6  0.58 0.34 0.66   1
Körperliche.Anforderung 2  0.47 0.22 0.78   1

                MR1
SS loadings    3.02
Proportion Var 0.50

Mean item complexity =  1
Test of the hypothesis that 1 factor is sufficient.

df null model =  15  with the objective function =  2.62 with Chi Square =  254.3
df of  the model are 9  and the objective function was  0.1 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.05 

The harmonic n.obs is  101 with the empirical chi square  4.46  with prob <  0.88 
The total n.obs was  101  with Likelihood Chi Square =  9.37  with prob <  0.4 

Tucker Lewis Index of factoring reliability =  0.997
RMSEA index =  0.017  and the 90 % confidence intervals are  0 0.115
BIC =  -32.17
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                   MR1
Correlation of (regression) scores with factors   0.94
Multiple R square of scores with factors          0.89
Minimum correlation of possible factor scores     0.78
# Ggf. Faktor-Scores berechnen
factor_scores <- factor.scores(tlx_data,f=efa_result) 
head(factor_scores$scores)
            MR1
[1,] -0.7428184
[2,] -0.5742252
[3,] -0.3118977
[4,] -1.4104555
[5,] -0.7428184
[6,] -2.0731607
# Diagramm der Faktorenladungen 
fa.diagram(efa_result)

# Weitere Diagramme

# Variablen im Faktorraum
plot(factor_scores$scores)
text(factor_scores$scores, labels = c(1:101), cex = 0.9, pos = 1, font = 1, col = "black")

# Items im Faktorraum (Faktorwerte)
plot(efa_result$loadings)
text(efa_result$loadings, labels = colnames(tlx_data), cex = 0.9, pos = 1, font = 1, col = "black")

3 Aufgabe 3: Soziale Medien

library(dplyr)

sozmed_data <- read.csv("../data/fa_sozmed.csv", sep=";", dec = ",")
sozmed_data <- read.csv("https://bookdown.org/Armin_E/explorativ-multivariat-efa/data/fa_sozmed.csv", sep=";", dec = ",")

head(sozmed_data)
  X1.Twitter.X X1.Instagram X1.TikTok X1.SnapChat X1.YouTube X1.WhatsApp
1            1            6         1           1          7           7
2            1            6         1           1          6           6
3            2            4         7           6          3           7
4            5            5         1           2          3           7
5            1            1         1           1          6           5
6            1            7         2           1          6           7
  X1.Facebook X2.Twitter.X X2.Instagram X2.TikTok X2.SnapChat X2.YouTube
1           1            1            6         1           5          7
2           2            4            5         2           2          6
3           1            1            3         5           5          4
4           2            1            4         1           1          3
5           1            2            3         1           2          6
6           1            1            7         2           5          5
  X2.WhatsApp X2.Facebook X3.Twitter.X X3.Instagram X3.TikTok X3.SnapChat
1           7           4            6            7         7           6
2           6           4            5            5         6           6
3           5           1            5            5         7           2
4           6           2            4            5         7           5
5           4           3            6            7         7           5
6           7           3            6            7         6           5
  X3.YouTube X3.WhatsApp X3.Facebook X4.Twitter.X X4.Instagram X4.TikTok
1          3           3           3           30           16        16
2          5           2           5           28           18        17
3          3           3           4           20           18        20
4          3           2           3           25           20        16
5          4           4           5           25           20        16
6          4           3           5           21           16        14
  X4.SnapChat X4.YouTube X4.WhatsApp X4.Facebook Wie.alt.sind.Sie.
1          14         20          30          40                18
2          16         25          30          35                19
3          16         18          18          50                22
4          15         25          30          40                21
5          16         20          20          50                24
6          15         20          30          41                19
# TikTok & Instagram & SnapChat alles (2F -- ggf 3)
# Ergebnis: zwei Faktoren 1+2 vesus 3 → warum?!
sozmed_data_1 <- sozmed_data %>% select(2,3,4,9,10,11,16,17,18)  


# Alte SozMed (2F - 5F)
# sozmed_data_1 <- sozmed_data %>% select(1,5,6,7,8,12,13,14,15,19,20,21)  

# Mentale Gesundheit + Alter
#sozmed_data_1 <- sozmed_data %>% select(1:21,29)    

# Nutzung 
# sozmed_data_1 <- sozmed_data %>% select(1:7) 

# Alter
#sozmed_data_1 <- sozmed_data %>% select(22:28,29)


# Parallel-Test
fa.parallel(sozmed_data_1, fa="fa")

Parallel analysis suggests that the number of factors =  2  and the number of components =  NA 
kmo_result <- KMO(sozmed_data_1)
kmo_result
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = sozmed_data_1)
Overall MSA =  0.54
MSA for each item = 
X1.Instagram    X1.TikTok  X1.SnapChat X2.Instagram    X2.TikTok  X2.SnapChat 
        0.64         0.50         0.51         0.64         0.55         0.53 
X3.Instagram    X3.TikTok  X3.SnapChat 
        0.45         0.56         0.45 
sozmed_data <- sozmed_data_1

#head(sozmed_data)

# Bartlett-Test
cortest.bartlett(sozmed_data)
$chisq
[1] 103.9484

$p.value
[1] 1.628163e-08

$df
[1] 36
# Matrix der Korrelationen
cor.plot(sozmed_data)

# X? Faktoren (vgl. Parallel-Test), varimax-Rotation
efa_result <- fa(sozmed_data, nfactors = 2, rotate = "varimax")

print(efa_result, digits=2, cut=0.5, sort=TRUE)
Factor Analysis using method =  minres
Call: fa(r = sozmed_data, nfactors = 2, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
             item   MR1   MR2   h2   u2 com
X1.TikTok       2  0.73       0.63 0.37 1.3
X1.Instagram    1  0.71       0.56 0.44 1.2
X2.TikTok       5  0.71       0.62 0.38 1.5
X2.Instagram    4  0.68       0.48 0.52 1.0
X1.SnapChat     3  0.59       0.40 0.60 1.3
X2.SnapChat     6  0.57       0.32 0.68 1.0
X3.Instagram    7        0.64 0.41 0.59 1.0
X3.TikTok       8        0.61 0.38 0.62 1.0
X3.SnapChat     9        0.61 0.37 0.63 1.0

                       MR1  MR2
SS loadings           2.69 1.49
Proportion Var        0.30 0.17
Cumulative Var        0.30 0.46
Proportion Explained  0.64 0.36
Cumulative Proportion 0.64 1.00

Mean item complexity =  1.2
Test of the hypothesis that 2 factors are sufficient.

df null model =  36  with the objective function =  5.15 with Chi Square =  103.95
df of  the model are 19  and the objective function was  2.48 

The root mean square of the residuals (RMSR) is  0.11 
The df corrected root mean square of the residuals is  0.15 

The harmonic n.obs is  25 with the empirical chi square  22.73  with prob <  0.25 
The total n.obs was  25  with Likelihood Chi Square =  46.79  with prob <  0.00038 

Tucker Lewis Index of factoring reliability =  0.138
RMSEA index =  0.238  and the 90 % confidence intervals are  0.158 0.337
BIC =  -14.37
Fit based upon off diagonal values = 0.9
Measures of factor score adequacy             
                                                   MR1  MR2
Correlation of (regression) scores with factors   0.92 0.85
Multiple R square of scores with factors          0.85 0.71
Minimum correlation of possible factor scores     0.69 0.43
# Ggf. Faktor-Scores berechnen
factor_scores <- factor.scores(sozmed_data,f=efa_result) 
head(factor_scores$scores)
            MR1        MR2
[1,] -0.0172652  1.5108935
[2,] -0.4951764  0.1595249
[3,]  0.3620374 -1.4246919
[4,] -0.9312932  0.1481435
[5,] -1.5964505  0.3644238
[6,]  0.4311773  1.0686129
# Diagramm der Faktorenladungen 
fa.diagram(efa_result)

# Weitere Diagramme -- sinnvoll für k=2

# Achsen festlegen
xlim = c(-2, 2)
ylim = c(-2, 2)

# VPs im Faktorraum
plot(factor_scores$scores, xlim=xlim,ylim=ylim)
text(factor_scores$scores, labels = c(1:25), cex = 0.9, pos = 1, font = 1, col = "black")

# Achsen festlegen

xlim = c(-1, 1)
ylim = c(-1, 1)

# SozMed im Faktorraum (Faktorwerte)
plot(efa_result$loadings, xlim=xlim,ylim=ylim)
text(efa_result$loadings, labels = colnames(sozmed_data), cex = 0.9, pos = 1, font = 1, col = "black")