Chapter 11 Exploratory Data Analyses
Phase I of my “statistics” is usually termed “Data Exploration” or “Exploratory Data Analysis”. The goal of this step is to gain valuable insights through the data so that one can know what is going on with the data, which part needs to be cleaned, what new features can be built, build hypotheses to be tested during the model creation/validation phase, or even just knowing some fun facts about the data (src).
A few of my favorite packages to get a glimpse of the data are
- SmartEDA
- DataExplorer
- summarytools
- dataMaid
- janitor and here
11.1 Creating Report with DataExplorer
The DataExplorer
package allows you to get a preliminary look at your data. It will check for missing data
create_report(
# the name of your dataframe
df.fa, #y = 'heart_disease',
output_dir = 'output', # where do you want it to be saved relative to your project directory
output_file = 'data_explorer_fa_report.html', # the filename for the report
report_title = 'DTI (FA) Data Description' # the Title of your report
)
::ExpNumStat(tbl.desc, round = 1)
SmartEDA
ExpNumStat(
tbl.desc,by = "GA",
gp = "Group",
Qnt = c(.1, .9),
Outlier = TRUE,
round = 1
)
ExpNumViz(tbl.desc, target = 'Group')
::dfSummary(
summarytools
tbl.desc,varnumbers = FALSE,
round.digits = 2,
plain.ascii = FALSE,
style = "grid",
graph.magnif = .33,
valid.col = FALSE,
tmp.img.dir = "img"
)