Chapter 4 Statistical Analysis

Data were analyzed using R version 4.2.1 (2022-06-23) for MacOS (RStudio, PBC). P values of <0.05 were considered to be statistically significant and we used the Kolmogorov-Smirnov test of normality.

-Descriptive statistics were used to investigate the characteristics of participants in this study by exploring the frequency of responses to demographic characteristics and MEBS questions.

-Results are presented as frequencies and percentages for categorical demographic variables, as well as means and standard deviations for numeric demographic variables and MEBS subscale scores.

Similar descriptive analyses were conducted for participants by their 3-hyper status

Characteristics of the participants are summarised. (in Supplementary Table S1 and S2 for categorical variables and numerical variables as well as MEBS subscale , respectively.)

Supplementary Table S2 summarizes several numeric characteristics of the participants. Four mean MEBS subscale scores were calculated.

We present the categorical socio-demographic characteristics of respondents by their 3-hyper status. Chi-square testing were used to examine the differences between participants with and without 3-hyper status on categorical socio-demographic variables. Results are presented in Table 1a.

Numeric characteristics for participants by their disease status are presented in Table 1b. Kolmogorov-Smirnov normality test on the numeric variables was performed in prior to validate the normality assumption for t-testing. We used T-testing to examine the differences between participants without 3-hyper status and those with at least 3-hyper groups on numeric socio-demographic characteristics such as age, height, BMI (Table 1b.1). To avoid problematic normality assumption on the MEBS subscale scores, we applied Mann-Whitney U testing, a nonparametric testing approach to test the differences in subscale scores between and (Table 1b.2)

Univariate logistic regression was used to preliminary assess the relationship of each socio-demographic variables of interst as well as the four MEBS subscale scores. Results are presented in Table 2.

Multivariable logistic regression analyses allow the examination of the association between the dependent/response variable and multiple independent/explanatory variables simultaneously.

Multivariable logistic regression was used to examine the relationship/ assess the association between the response variable “3-hyper” status and the explanatory variables: MEBS subscales of interest as well as the socio-demographic

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

library(jsmodule) # logistic.display2

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

#file.choose()

dataPath_ALL_Disease_group <- 
  "/Users/apple/Desktop/Xue Bookdown/bookdown-demo-main/BMI_Dlseases.xlsx"
dataPath_Healthy_group <- "/Users/apple/Desktop/Xue Bookdown/bookdown-demo-main/健康.xlsx"

All_Disease <- read_excel(dataPath_ALL_Disease_group)
Healthy <- read_excel(dataPath_Healthy_group)

Display table: Try other like tibble?

#knitr::kable(
  #head(All_Disease), caption = 'A glimpse of the xxx dataset',booktabs = TRUE
#)

Dataset mutation Begins

Create Variable: SanGaoScore

Xue_All_Disease <-
  All_Disease |>
    mutate(SanGaoScore = 高血压 + 血脂异常 + `糖尿病（血糖高）` 
                        ,.before = Q6open)

SanGaoScore will be used to specify at least one 3-hyper group.

Modify Xue_All_Disease: Create Variable: anyD

Xue_All_Disease <-
  Xue_All_Disease  |>
    mutate(anyDiseaseScore = 高血压 + 血脂异常 + `糖尿病（血糖高）` 
                        + `癌症等恶性肿瘤（不包含轻度皮肤癌）`
                        + 慢性肺部疾患 + 肝脏疾病 + 心脏病 + 中风 + 肾脏疾病 + 胃部疾病或消化道系统疾病
                        + 情感及精神方面疾病 + 与记忆相关的疾病 + 关节炎或风湿病 + 哮喘 + 亚健康
                        ,.before = Q6open)

Useful mutate skill!
.keep = “unused” 删掉之前的
select [newVar], everything() 新的移到前面

Xue_All_Disease <-
  Xue_All_Disease |>
    mutate(Age = as.numeric(年龄), Height = as.numeric(身高cm), Weight = as.numeric(体重kg), .keep = "unused") |>
      select(Age, Height, Weight, everything())

2nd MEBS subcale:改掉让很多函数报错的中文column name

Xue_All_Disease <-
  Xue_All_Disease |>
    mutate(SatietyCues = `（依赖）饥饿和饱腹感提示进食`, .keep = "unused", .after = 对食物的关注度)

Key Mutate: Create SanGao Status Status = 1 if at least 1 SanGao

Xue_All_Disease <-
  Xue_All_Disease |>
    mutate(Status = case_when(SanGaoScore > 0 ~ "Yes", # 1: at least 1 SanGao
                              SanGaoScore == 0 ~ "No"), # 0: No SanGao
           .after = SanGaoScore
           )

!Logistic Regression Key Procedure!
RESPONSE VARIABLE Status should be of type factor (See Default Example in ISLR) Status was of type character

typeof(Xue_All_Disease$Status)

## [1] "character"

Change to factor! using as. factor, with mutate_at!!!

Xue_All_Disease <-
  Xue_All_Disease |>
    mutate_at('Status', as.factor)

Now, verify the type of Status

typeof(Xue_All_Disease$Status)

## [1] "integer"

Verify:Indeed it is a factor

Dataset mutation ends