Install appropriate packages and open the libraries.
library(Hmisc)
library(funModeling)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────── tidyverse 1.3.2 ──✔ tibble 3.1.8 ✔ stringr 1.5.0
✔ purrr 1.0.1 ✔ forcats 1.0.0── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ dplyr::src() masks Hmisc::src()
✖ dplyr::summarize() masks Hmisc::summarize()
source unique_domains total_URLs
Length:8 Min. : 25.0 Min. : 194.0
Class :character 1st Qu.: 61.0 1st Qu.: 205.8
Mode :character Median : 87.0 Median : 857.0
Mean : 378.2 Mean :2211.1
3rd Qu.: 451.8 3rd Qu.:2750.5
Max. :1610.0 Max. :9106.0
Basic Exploratory Data Analysis Basic EDA (Exploratory Data Analysis) is an initial and informal examination of data to understand its main characteristics. It is often the first step in a data analysis process and can help identify potential problems or interesting patterns in the data.
Basic EDA typically involves looking at the distribution of the variables in the dataset, checking for missing values or outliers, identifying patterns or relationships between variables, and summarizing key statistics such as means, medians, and standard deviations.
Warning: Skipping plot for variable 'target' (more than 100 categories)
NANA
[1] "Variables processed: target, weight"
edges
3 Variables 3026 Observations
-----------------------------------------------------------------------------------------------------------------------------------------------------
source
n missing distinct
3026 0 8
lowest : balkan_spy cyber_frontZ narodnapatrola orly_rs Prigozhin_hat
highest: orly_rs Prigozhin_hat rtbalkan_ru ruserbia russkeydomserbia
Value balkan_spy cyber_frontZ narodnapatrola orly_rs Prigozhin_hat rtbalkan_ru ruserbia russkeydomserbia
Frequency 97 1610 25 354 63 55 745 77
Proportion 0.032 0.532 0.008 0.117 0.021 0.018 0.246 0.025
-----------------------------------------------------------------------------------------------------------------------------------------------------
target
n missing distinct
3026 0 2472
lowest : /c/1732054517/3452 /c/1732054517/4875 /c/1732054517/5727 /c/1732054517/9009 @GRIP_SoundLab
highest: Ztdk3FDbbhA1Nzdi zubovskiy4 zvezdalive zvezdalive.ru zvezdanews
-----------------------------------------------------------------------------------------------------------------------------------------------------
weight
n missing distinct Info Mean Gmd .05 .10 .25 .50 .75 .90 .95
3026 0 71 0.714 5.846 9.215 1 1 1 1 2 5 9
lowest : 1 2 3 4 5, highest: 587 720 755 1179 2664
-----------------------------------------------------------------------------------------------------------------------------------------------------
df_edges <- data.frame(edges) %>%
group_by(target) %>%
summarise(sum = (sum(weight))) %>%
mutate(percent = sum / sum(sum) * 100) %>%
arrange(desc(sum))
datatable(df_edges)
write.csv(df_edges, file = 'perc_target.csv')