Chapter 2 First Republic

2.1 Electoral law

Camera
- 32 multi-member constituencies (circoscrizioni plurinominali), with seats varying according to population
- Pure proportional system (divisor method with the Imperiali quota)
- Up to 4 preferences
Senato
- Regions divided into single-member districts (collegi uninominali)
- Proportional system (D’Hondt method) with a 65% quorum that triggers a majoritarian system

After the fall of the fascist regime in 1945, there was the need for an electoral law.

During the fascist regime, elections in their democratic sense were abolished: they became plebiscites, with a single government-proposed list. Voters could only approve or reject it.

The National Council, a consultative body representing the major political forces of the Resistance, proposed a provisional proportional electoral law for the election of the Consistuent Assembly, which had the hard work of writing a new Consitution and be a provisory Parliament.

The electoral law proposed by the National Council for the election of the Constituent Assembly was, in fact, used almost unchanged for political elections throughout the First Republic. However, some adjustments were necessary to adapt it to the new institutional context: it was now required to elect two chambers (Camera and Senato) instead of just one, and to comply with Article 57 of the Constitution, which mandated the election of the Senate on a regional basis: the goal was to better reflect regional specifities within Parliament.

Thus, initially the intention was to analyze only the Senate data.

However, Italy is not a federal state like Germany or US, nad the Italian Senate has no specific role in regional legislation and essentially shares the same functions as the Chamber of Deputies (a system known as “paritary” or “perfect” bicameralism, though “perfect” is debatable).

Plus, fter the 2020 constitutional reform, which reduced the number of elected senators from 315 to 200 (and deputies from 630 to 400), smaller regions now have less representation. The minimum number of senators per region was lowered from 7 to 3 (except for smaller regions such as Molise, which has 2, and Valle d’Aosta, which has 1).

Thus, with the 2020 reform, the Senate has further lost its function of representing regional differences.

Ignoring data for the Chamber of Deputies would have led to a very partial analysis for several reasons. In particular, young people between the ages of 18 and 24 can only vote for the Chamber, and therefore would have been excluded. We will exploit this difference to analyze the electoral behavior of this age group. In line with the focus of the report, we will analyze regional differences in the youth vote.

2.2 Datasets overview

Function process_data(), that will be used for loading, manipulating and merging almost all the datasets that will be used is defined in the following code chunk

process_data <- function(file_paths, regex_pattern, a, b, wrangling_function = NULL) {
  
  # Filter file names based on the regex pattern
  regex_file_paths <- file_paths[grepl(regex_pattern, basename(file_paths), ignore.case = TRUE)]
  
  # Check if the regex pattern is correct
  if (length(regex_file_paths) == 0) {
    stop("No files match the provided regex pattern.")
  }
  
  # Initialize an empty list to store all datasets
  all_data <- list()
  
  # Loop through the files
  for (file_path in regex_file_paths) {
    # Read each file into a tibble
    data <- tibble::as_tibble(read.csv(file_path, sep = ";"), locale = locale(encoding = "UTF-8"))
    
    # Extract the year from the file name and add it as a new column
    year <- substr(basename(file_path), a, b)
    data$YEAR <- year
    
    # Apply wrangling
    if (!is.null(wrangling_function)) {
      data <- wrangling_function(data)
    }
    
    # Append the dataset to the list
    all_data[[length(all_data) + 1]] <- data
  }
  
  # Bind all datasets into one
  unified_data <- bind_rows(all_data)
  
  return(unified_data)
}

The names of the Camera’ s files we will work on are listed below:

# Generate paths for all files in the folder "camera1"
file_paths_camera <- list.files("C:\\Users\\acer\\Desktop\\ElectoralDifferences\\data\\camera1", full.names = TRUE)

# Extract file names (without extensions) to use as variable names
file_names_camera <- tools::file_path_sans_ext(basename(file_paths_camera))

# Read all files and assign each to a variable named like the file
for (i in seq_along(file_paths_camera)) {
  assign(file_names_camera[i], tibble::as_tibble(read.csv(file_paths_camera[i], sep = ";", fileEncoding = "UTF-8")))

  # Print a glimpse of the dataset
  # cat("\n--- Glimpse of", file_names[i], "---\n")
  # glimpse(get(file_names[i]))
}

print(file_names_camera)

##  [1] "camera-19480418"       "camera-19530607"       "camera-19580525"       "camera-19630428"      
##  [5] "camera-19680519"       "camera-19720507"       "camera-19760620"       "camera-19790603"      
##  [9] "camera-19830626"       "Camera-19870614"       "camera-19920405"       "camera1948_preferenze"
## [13] "camera1953_preferenze" "camera1958_preferenze" "camera1963_preferenze" "camera1968_preferenze"
## [17] "camera1972_preferenze" "camera1976_preferenze" "camera1979_preferenze" "camera1983_preferenze"
## [21] "camera1987_preferenze" "camera1992_preferenze"

All the datasets camera-yyyymmdd have this form:

tibble::glimpse(`camera-19480418`)

## Rows: 4,392
## Columns: 10
## $ CIRCOSCRIZIONE  <chr> "Milano-Pavia", "Milano-Pavia", "Milano-Pavia", "Milano-Pavia", "Milano-Pavi…
## $ PROVINCIA       <chr> "MILANO", "MILANO", "MILANO", "MILANO", "MILANO", "MILANO", "MILANO", "MILAN…
## $ COMUNE          <chr> "VIGNATE", "VILLANOVA DEL SILLARO", "VILLASANTA", "VILLAVESCO", "VIMERCATE",…
## $ ELETTORI        <int> 1234, 1165, 4379, 1536, 8423, 2592, 2070, 487, 1467, 331, 1751, 831, 641, 14…
## $ ELETTORI_MASCHI <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ VOTANTI         <int> 1203, 1140, 4288, 1483, 8144, 2493, 2021, 476, 1431, 321, 1683, 814, 624, 14…
## $ VOTANTI_MASCHI  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ SCHEDE_BIANCHE  <int> 5, 13, 22, 18, 65, 20, 13, 1, 9, 0, 20, 7, 1, 10, 17, 1, 6, 4, 2, 4, 2, 1, 1…
## $ LISTA           <chr> "MSI", "MSI", "MSI", "MSI", "MSI", "MSI", "MSI", "MSI", "MSI", "MSI", "MSI",…
## $ VOTI_LISTA      <int> 2, 8, 7, 6, 28, 11, 5, 0, 12, 0, 3, 3, 2, 1, 5, 0, 3, 1, 2, 0, 1, 0, 18, 2, …

CIRCOSCRIZIONE (<chr>):
- Indicates the electoral district to which a municipality (COMUNE) belongs.
PROVINCIA (<chr>):
- The province within the electoral district where the data was collected.
COMUNE (<chr>):
- The municipality or town where the data was collected.
ELETTORI (<int>):
- The total number of registered voters in a municipality.
ELETTORI_MASCHI (<lgl>):
- The number of registered male voters. It contains “NA” (missing values). Thus, this column will be ignored.
VOTANTI (<int>):
- The total number of voters who participated in the election.
VOTANTI_MASCHI (<lgl>):
- The number of male voters who participated in the election. It contains “NA” (missing values) and 0s. Thus, this column will be ignored.
SCHEDE_BIANCHE (<int>):
- The number of blank ballots submitted during the election.
LISTA (<chr>):
- The name or abbreviation of the political party or electoral list. In the Appendix section you will find a complete list of the abbreviations used, along with the full name and a brief description of each party or electoral list.
VOTI_LISTA (<int>):
- The number of votes received by the specific political party or list in a municipality.

Due to their similar structure, the datasets will be merged into a single unified dataset, with the additional column YEAR indicating the year of the election.

library(dplyr)

# regex pattern for files like camera-yyyymmdd
camera_I_regex <- "^[cC]amera-\\d{8}\\.txt$"


wrangling_camera <- function(data) {
  #converting variables to more appropriate data types
  data <- data %>%
    select(-c(ELETTORI_MASCHI, VOTANTI_MASCHI)) %>%
    mutate(across(c(VOTANTI, ELETTORI, VOTI_LISTA, SCHEDE_BIANCHE), as.numeric)) %>%
    mutate(across(c(CIRCOSCRIZIONE, PROVINCIA, COMUNE, LISTA), factor))
  return(data)
}

#calling the function
unified_camera_I <- process_data(file_paths_camera, camera_I_regex, 8, 11, wrangling_camera)

#Add REGIONE column

# list Regioni -> Province
regioni_list <- list(
  "ABRUZZO" = c("CHIETI", "L'AQUILA", "PESCARA", "TERAMO"),
  "BASILICATA" = c("MATERA", "POTENZA"),
  "CALABRIA" = c("CATANZARO", "COSENZA", "CROTONE", "REGGIO CALABRIA", "VIBO VALENTIA"),
  "CAMPANIA" = c("AVELLINO", "BENEVENTO", "CASERTA", "NAPOLI", "SALERNO"),
  "EMILIA-ROMAGNA" = c("BOLOGNA", "FERRARA", "FORLÌ-CESENA", "FORLI'", "MODENA", "PARMA", "PIACENZA", "RAVENNA", "REGGIO EMILIA", "REGGIO NELL'EMILIA", "RIMINI"),
  "FRIULI-VENEZIA GIULIA" = c("GORIZIA", "PORDENONE", "TRIESTE", "UDINE"),
  "LAZIO" = c("FROSINONE", "LATINA", "RIETI", "ROMA", "VITERBO"),
  "LIGURIA" = c("GENOVA", "IMPERIA", "LA SPEZIA", "SAVONA"),
  "LOMBARDIA" = c("BERGAMO", "BRESCIA", "COMO", "CREMONA", "LECCO", "LODI", "MANTOVA", "MILANO", "MONZA E BRIANZA", "PAVIA", "SONDRIO", "VARESE"),
  "MARCHE" = c("ANCONA", "ASCOLI PICENO", "FERMO", "MACERATA", "PESARO E URBINO"),
  "MOLISE" = c("CAMPOBASSO", "ISERNIA"),
  "PIEMONTE" = c("ALESSANDRIA", "ASTI", "BIELLA", "CUNEO", "NOVARA", "TORINO", "VERBANO-CUSIO-OSSOLA", "VERCELLI"),
  "PUGLIA" = c("BARI", "BARLETTA-ANDRIA-TRANI", "BRINDISI", "FOGGIA", "LECCE", "TARANTO"),
  "SARDEGNA" = c("CAGLIARI", "CARBONIA-IGLESIAS", "MEDIO CAMPIDANO", "NUORO", "ORISTANO", "SASSARI"),
  "SICILIA" = c("AGRIGENTO", "CALTANISSETTA", "CATANIA", "ENNA", "MESSINA", "PALERMO", "RAGUSA", "SIRACUSA", "TRAPANI"),
  "TOSCANA" = c("AREZZO", "FIRENZE", "GROSSETO", "LIVORNO", "LUCCA", "MASSA CARRARA", "MASSA-CARRARA", "PISA", "PISTOIA", "PRATO", "SIENA"),
  "TRENTINO-ALTO ADIGE" = c("BOLZANO", "TRENTO"),
  "UMBRIA" = c("PERUGIA", "TERNI"),
  "VALLE D'AOSTA" = c("AOSTA"),
  "VENETO" = c("BELLUNO", "PADOVA", "ROVIGO", "TREVISO", "VENEZIA", "VERONA", "VICENZA")
)

get_region <- function(provincia) {
  for (region in names(regioni_list)) {
    if (provincia %in% regioni_list[[region]]) {
      return(region)
    }
  }
  return(NA)
}

#(s)apply get_region() to each PROVINCIA
unified_camera_I$REGIONE <- sapply(unified_camera_I$PROVINCIA, get_region)

#write uniifed dataset to a RDS file, for efficiency and reproducibility reasons
saveRDS(unified_camera_I, "unified_camera_I.rds")

Results are presented for each year:

unified_camera_I_DT <- readRDS("unified_camera_I.rds") %>%
  group_by(YEAR, REGIONE, LISTA) %>%
  summarize(VOTI = sum(VOTI_LISTA)) %>%
  mutate(PERCENTAGE = (VOTI / sum(VOTI))*100) %>%
  select(-VOTI)

saveRDS(unified_camera_I_DT, "unified_camera_I_DT.rds")

DT::datatable(unified_camera_I_DT, filter = "top", options = list(
  scrollX = TRUE, autowidth = TRUE
)) %>%
  DT::formatRound("PERCENTAGE", digits = 2)

The datasets camerayyyy_mmdd_preferenze contains information about individual candidates; particularly, their gender.

We will use these informations to catch possible differences in gender gap by region.

Datasets structure:

glimpse(camera1948_preferenze)

## Rows: 5,606
## Columns: 12
## $ DATAELEZIONE    <chr> "18/4/1948 00:00:00", "18/4/1948 00:00:00", "18/4/1948 00:00:00", "18/4/1948…
## $ CODTIPOELEZIONE <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "…
## $ CIRCOSCRIZIONE  <chr> "Udine-Belluno-Gorizia", "Udine-Belluno-Gorizia", "Udine-Belluno-Gorizia", "…
## $ descrlista      <chr> "PARTITO CRISTIANO SOCIALE", "PARTITO CRISTIANO SOCIALE", "FR.DEMOCR.POPOLAR…
## $ votiLista       <int> 6197, 6197, 144679, 144679, 144679, 144679, 144679, 144679, 144679, 144679, …
## $ cognome         <chr> "PAVAN", "VIANELLO", "BELTRAME", "PRATOLONGO", "BETTIOL", "LUZZATTO ", "SOLA…
## $ nome            <chr> "GIORGIO DOMENICO", "RICCARDO", "GINO", "GIORDANO", "FRANCESCO", "LUCIO MARI…
## $ datanascita     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ luogonascita    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ sesso           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CODTIPOELETTO   <chr> "", "", "E", "E", "E", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ PREFERENZE      <int> 14, 13, 32231, 29174, 24058, 13246, 9255, 7565, 7310, 5151, 4149, 1316, 1283…

DATAELEZIONE (<chr>):
- Represents the date of the election. The format is “DD/MM/YYYY HH:MM:SS”, but the time is not relevant, since it is always “00:00:00” (Example: “18/4/1948 00:00:00”).
CODTIPOELEZIONE (<chr>):
- A code indicating the type of election. It is always “C”, which stands for “Camera dei Deputati” (Chamber of Deputies).
CIRCOSCRIZIONE (<chr>):
- The electoral district where votes were cast.
descrlista (<chr>):
- The name of the political party or electoral list.
votiLista (<int>):
- The total number of votes received by the party or list in that electoral district.
cognome (<chr>):
- The surname of the candidate.
nome (<chr>):
- The first name of the candidate.
datanascita (<lgl>):
- The date of birth of the candidate. It is missing for all rows of all the datasets, expect for the 1992 elections. Thus, it will be removed.
luogonascita(<lgl>):
- The place of birth of the candidate. It is missing for all rows of the datasets, expect for the 1992 elections. Thus, it will be ignored.
sesso(<lgl>):
- The gender of the candidate. It is missing for elections till 1958. Then it is of type <chr>. It is equal to “M” for males and “F” for females.
CODTIPOELETTO (<chr>):
- A code denoting the status of the candidate. “E” indicates an elected candidate. Empty strings indicate unelected candidates.
PREFERENZE (<int>):
- The number of votes received by the individual candidate

We will keep just the columns DATAELEZIONE (just the year will be extracted), CODTIPOELETTO and sesso. CIRCOSCRIZIONE will be used to create a new column REGIONI, with the regions to which the electoral district belongs. Those datasets with no information about sex of the candidate will be discarded.

library(dplyr)

camera_I_preferenze_regex <- "^camera\\d{4}_preferenze\\.txt$"

# valid_file_paths <- file_paths_camera[grepl(camera_I_preferenze_regex, basename(file_paths_camera))]
# base::print(valid_file_paths)


wrangling_camera_preferenze <- function(data) {
  data <- data %>%
    filter(!is.na(sesso)) %>%
    select(DATAELEZIONE, CIRCOSCRIZIONE, sesso, CODTIPOELETTO) %>%
    mutate(YEAR = sapply(strsplit(DATAELEZIONE, "/| "), function(x) x[3]))%>%
    select(-DATAELEZIONE)%>%
  mutate(CODTIPOELETTO = factor(CODTIPOELETTO)) %>%
    mutate(CIRCOSCRIZIONE = as.factor(CIRCOSCRIZIONE), 
           sesso = as.factor(sesso),
           YEAR = as.numeric(YEAR))

    
  return(data)
}

#calling the function
unified_camera_I_preferenze <- process_data(file_paths_camera, camera_I_preferenze_regex, 7, 10, wrangling_camera_preferenze)
#glimpse(unified_camera_I_preferenze)

# Mapping of circoscrizioni to regions
circoscrizione_to_regione <- data.frame(
  CIRCOSCRIZIONE = c(
    "TORINO-NOVARA-VERCELLI", "CUNEO-ALESSANDRIA-ASTI", "MILANO-PAVIA", 
    "COMO-SONDRIO-VARESE", "GENOVA-IMPERIA-LA SPEZIA-SAVONA", "BRESCIA-BERGAMO", 
    "VENEZIA-TREVISO", "UDINE-BELLUNO-GORIZIA", "BOLOGNA-FERRARA-RAVENNA-FORLI", 
    "VERONA-PADOVA-VICENZA-ROVIGO", "PARMA-MODENA-PIACENZA-REGGIO EMILIA", "AOSTA", 
    "TRIESTE", "FIRENZE-PISTOIA", "MANTOVA-CREMONA", "TRENTO-BOLZANO", 
    "PISA-LIVORNO-LUCCA-MASSA CARRARA", "ROMA-VITERBO-LATINA-FROSINONE", 
    "SIENA-AREZZO-GROSSETO", "ANCONA-PESARO-MACERATA-ASCOLI PICENO", 
    "PERUGIA-TERNI-RIETI", "NAPOLI-CASERTA", "L'AQUILA-PESCARA-CHIETI-TERAMO", 
    "BENEVENTO-AVELLINO-SALERNO", "BARI-FOGGIA", "LECCE-BRINDISI-TARANTO", 
    "POTENZA-MATERA", "CATANZARO-COSENZA-REGGIO CALABRIA", "CAMPOBASSO", 
    "CAGLIARI-SASSARI-NUORO", "CATANIA-MESSINA-SIRACUSA-RAGUSA-ENNA", 
    "PALERMO-TRAPANI-AGRIGENTO-CALTANISETTA", "ANCONA-PESARO-MACERATA-ASCOLI PICENO", 
    "PALERMO-TRAPANI-AGRIGENTO-CALTANISSETTA", "CAGLIARI-SASSARI-NUORO-ORISTANO", 
    "UDINE-BELLUNO-GORIZIA-PORDENONE", "CAMPOBASSO-ISERNIA", "BOLOGNA-FERRARA-RAVENNA-FORLI'", 
    "VALLE D'AOSTA"
  ),
REGIONE = c(
    "PIEMONTE", "PIEMONTE", "LOMBARDIA", "LOMBARDIA", "LIGURIA", "LOMBARDIA", "VENETO", 
    "FRIULI VENEZIA GIULIA", "EMILIA-ROMAGNA", "VENETO", "EMILIA-ROMAGNA", "VALLE D'AOSTA", 
    "FRIULI VENEZIA GIULIA", "TOSCANA", "LOMBARDIA", "TRENTINO-ALTO ADIGE/SÜDTIROL", "TOSCANA", 
    "LAZIO", "TOSCANA", "MARCHE", "UMBRIA", "CAMPANIA", "ABRUZZO", "CAMPANIA", "PUGLIA", 
    "PUGLIA", "BASILICATA", "CALABRIA", "MOLISE", "SARDEGNA", "SICILIA", "SICILIA", "MARCHE", 
    "SICILIA", "SARDEGNA", "FRIULI VENEZIA GIULIA", "MOLISE", "EMILIA-ROMAGNA", "VALLE D'AOSTA"
))


unified_camera_I_preferenze <- unified_camera_I_preferenze %>%
  left_join(circoscrizione_to_regione, by = "CIRCOSCRIZIONE") %>%
  select(-CIRCOSCRIZIONE) %>% filter(CODTIPOELETTO == "E") %>% group_by(YEAR, REGIONE) %>% summarize(FEMALES = sum(sesso=="F"), MALES = sum(sesso=="M")) %>% mutate(FEMALE_PERC = (FEMALES/MALES)*100)

#head(unified_camera_I_preferenze)

DT::datatable(unified_camera_I_preferenze, filter = "top") %>%
  DT::formatRound("FEMALE_PERC", digits = 2)

# Generate paths for all files in the folder "senato1"
file_paths_senato <- list.files("C:\\Users\\acer\\Desktop\\ElectoralDifferences\\data\\senato1", full.names = TRUE)

# Extract file names (without extensions) to use as variable names
file_names_senato <- tools::file_path_sans_ext(basename(file_paths_senato))

# Read all files and assign each to a variable named like the file
for (i in seq_along(file_paths_senato)) {
  assign(file_names_senato[i], dplyr::as_tibble(read.csv(file_paths_senato[i], sep = ";", fileEncoding = "UTF-8")))
}

base::print(file_names_senato)

##  [1] "senato-19480418"       "senato-19530607"       "senato-19580525"       "senato-19630428"      
##  [5] "senato-19680519"       "senato-19720507"       "senato-19760620"       "senato-19790603"      
##  [9] "senato-19830626"       "Senato-19870614"       "senato-19920405"       "Senato_1948_candlista"
## [13] "Senato_1953_candlista" "Senato_1958_candlista" "Senato_1963_candlista" "Senato_1968_candlista"
## [17] "Senato_1972_candlista" "Senato_1976_candlista" "Senato_1979_candlista" "Senato_1983_candlista"
## [21] "Senato_1987_candlista" "Senato_1992_candlista"

Files senato-yyyymmdd structure:

tibble::glimpse(`senato-19480418`)

## Rows: 32,942
## Columns: 10
## $ REGIONE         <chr> "MOLISE", "MOLISE", "MOLISE", "MOLISE", "MOLISE", "MOLISE", "SARDEGNA", "SAR…
## $ COLLEGIO        <chr> "LARINO", "LARINO", "LARINO", "LARINO", "LARINO", "LARINO", "IGLESIAS", "IGL…
## $ COMUNE          <chr> "GUGLIONESI", "GUGLIONESI", "GUGLIONESI", "GUGLIONESI", "GUGLIONESI", "GUGLI…
## $ ELETTORI        <int> 3780, 3780, 3780, 3780, 3780, 3780, 1170, 1170, 3296, 2795, 2795, 2795, 2795…
## $ ELETTORI_MASCHI <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ VOTANTI         <int> 3626, 3626, 3626, 3626, 3626, 3626, 1074, 1074, 3001, 2570, 2570, 2570, 2570…
## $ VOTANTI_MASCHI  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ SCHEDE_BIANCHE  <int> 23, 23, 23, 23, 23, 23, 63, 63, 27, 41, 41, 41, 41, 41, 49, 49, 49, 49, 49, …
## $ LISTA           <chr> "MSI", "P.NAZ.MONARCHICO", "PRI", "BLOCCO NAZIONALE", "IND", "SOCIALCOMUNIST…
## $ VOTI_LISTA      <int> 95, 106, 7, 665, 330, 733, 90, 440, 116, 70, 491, 281, 918, 657, 207, 404, 4…

REGIONE (<chr>):
The region where the votes were cast.
COLLEGIO (<chr>):
The electoral district (collegio) within the region.
COMUNE (<chr>):
The municipality (comune) where the votes were cast.
ELETTORI (<int>):
The total number of eligible voters in the given municipality (COMUNE). This represents the total potential electorate.
ELETTORI_MASCHI (<lgl>):
Data for the number of eligible male voters. This column is all NA (missing data).
VOTANTI (<int>):
The total number of people who actually voted in the given municipality. This includes all voters, regardless of gender.
VOTANTI_MASCHI (<lgl>):
Data for the number of male voters who actually voted. This column is all NA (missing data).
SCHEDE_BIANCHE (<int>):
The number of blank ballots (schede bianche) cast in the given municipality. Blank ballots indicate voters who participated but didn’t vote for any candidate or list.
LISTA (<chr>):
The name of the political list or party that received votes. Each row represents the votes received by a specific party in a municipality.
VOTI_LISTA (<int>):
The number of votes received by the specified party or list (LISTA) in the given municipality.

library(dplyr)
# Define the regex pattern for files like senato-yyyymmdd
senato_I_regex <- "^[sS]enato-\\d{8}\\.txt$"


wrangling_senato <- function(data) {
  #converting variables to more appropriate data types
  data <- data %>%
    mutate(across(c(VOTANTI, ELETTORI, VOTI_LISTA, SCHEDE_BIANCHE), as.numeric)) %>%
    mutate(across(c(REGIONE, COLLEGIO, COMUNE, LISTA), factor))
  return(data)
}

#calling the function
unified_senato_I <- process_data(file_paths_senato, senato_I_regex, 8, 11, wrangling_senato)
#glimpse(unified_camera_I)

#write uniifed dataset to a RDS file, for efficiency and reproducibility reasons
saveRDS(unified_senato_I, "unified_senato_I.rds")

Results are presented for each year:

unified_senato_I_DT <- readRDS("unified_senato_I.rds") %>%
  group_by(YEAR, REGIONE, LISTA) %>%
  summarize(VOTI = sum(VOTI_LISTA)) %>%
  mutate(PERCENTAGE = (VOTI / sum(VOTI))*100) %>%
  select(-VOTI)

saveRDS(unified_senato_I_DT, "unified_senato_I_DT.rds")

DT::datatable(unified_senato_I_DT, filter = "top") %>%
  DT::formatRound("PERCENTAGE", digits = 2)

The datasets senato_yyyy_candlista contains information about individual candidates; particularly, their gender.

We will use these informations to catch possible differences in gender gap by region.

The Datasets’ structure is very similare to camerayyy_preferenze. Here A column REGIONE is already present, we don’t need to add it:

glimpse(Senato_1948_candlista)

## Rows: 1,093
## Columns: 12
## $ DATAELEZIONE    <chr> "18/4/1948 00:00:00", "18/4/1948 00:00:00", "18/4/1948 00:00:00", "18/4/1948…
## $ CODTIPOELEZIONE <chr> "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "S", "…
## $ REGIONE         <chr> "SICILIA", "SICILIA", "SICILIA", "SICILIA", "SICILIA", "SICILIA", "SICILIA",…
## $ COLLEGIO        <chr> "PALERMO II", "PALERMO II", "PALERMO II", "CORLEONE BAGHERIA", "CORLEONE BAG…
## $ descrlista      <chr> "P.NAZ.MONARCHICO", "FR.DEMOCR.POPOLARE", "UN.MOV.FEDERALISTI", "FR.DEMOCR.P…
## $ votiLista       <int> 17747, 7389, 3045, 10342, 33104, 10265, 3234, 4873, 7588, 11743, 2325, 2941,…
## $ cognome         <chr> "LANZA FILINGERI PATERNO'", "NASI", "PETRIGNI", "DELLA VOLPE", "TRAINA", "LA…
## $ nome            <chr> "STEFANO", "ROSARIO", "VINCENZO", "GALVANO", "GIUSEPPE", "STEFANO", "GIUSEPP…
## $ datanascita     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ luogonascita    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ sesso           <chr> "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "M", "…
## $ CODTIPOELETTO   <chr> "E", "", "", "", "", "", "", "", "", "", "", "", "E", "", "", "", "", "", ""…

library(dplyr)

senato_I_preferenze_regex <- "^Senato_\\d{4}_candlista.txt$"

wrangling_senato_preferenze <- function(data) {
  data <- data %>%
    filter(!is.na(sesso)) %>%
    select(DATAELEZIONE, REGIONE, sesso, CODTIPOELETTO) %>%
    mutate(YEAR = sapply(strsplit(DATAELEZIONE, "/| "), function(x) x[3]))%>%
    select(-DATAELEZIONE)%>%
  mutate(CODTIPOELETTO = factor(CODTIPOELETTO)) %>%
    mutate(REGIONE = as.factor(REGIONE), 
           sesso = as.factor(sesso),
           YEAR = as.numeric(YEAR))

    
  return(data)
}

unified_senato_I_preferenze <- process_data(file_paths_senato, senato_I_preferenze_regex, 7, 10, wrangling_senato_preferenze)
#glimpse(unified_camera_I_preferenze)

unified_senato_I_preferenze <- unified_senato_I_preferenze %>%
filter(CODTIPOELETTO == "E") %>% group_by(YEAR, REGIONE) %>% summarize(FEMALES = sum(sesso=="F"), MALES = sum(sesso=="M")) %>% mutate(FEMALE_PERC = (FEMALES/MALES)*100)

unified_I_preferenze <- full_join(
  unified_camera_I_preferenze %>%
    rename(MALES_CAMERA = MALES, FEMALES_CAMERA = FEMALES, FEMALE_PERC_CAMERA = FEMALE_PERC),
  unified_senato_I_preferenze %>%
    rename(MALES_SENATO = MALES, FEMALES_SENATO = FEMALES, FEMALE_PERC_SENATO = FEMALE_PERC),
  by = c("YEAR", "REGIONE")
)

DT::datatable(unified_I_preferenze, filter = "top", options = list(
  scrollX = TRUE, autowidth = TRUE
)) %>%
  DT::formatRound(columns = c("FEMALE_PERC_SENATO", "FEMALE_PERC_CAMERA"), digits = 2)

unified_I <- full_join(
  unified_camera_I_DT %>%
    rename(PERCENTAGE_CAMERA = PERCENTAGE),
  unified_senato_I_DT %>%
    rename(PERCENTAGE_SENATO = PERCENTAGE),
  by = c("YEAR", "REGIONE", "LISTA")
)

saveRDS(unified_I, "unified_I.rds")

DT::datatable(unified_I, filter = "top", options = list(
  scrollX = TRUE
)) %>%
  DT::formatRound(columns = c("PERCENTAGE_SENATO", "PERCENTAGE_CAMERA"), digits = 2)

#only_camera <- sum(is.na(unified_I$PERCENTUALE_SENATO))
#only_senato <- sum(is.na(unified_I$PERCENTUALE_CAMERA))

2.3 Shiny app

You can play with the shiny app here