IR Tools test 2008-2014 data (2016 IR period of record)

Background

R packages & GitHub

DWQ packages

devtools::install_github('utah-dwq/wqTools')
devtools::install_github('ut-ir-tools/irTools')
library(wqTools)
library(irTools)

Download and import data

Data download

#setwd('C:\\Users\\jvander\\Documents\\R\\irTools-test-16')
downloadWQP(outfile_path='01-raw-data',start_date='10/01/2008', end_date='09/30/2014', zip=TRUE, unzip=TRUE, retrieve=c("narrowresult", "activity", "detquantlim"))
downloadWQP(outfile_path='01-raw-data', zip=FALSE, retrieve="sites")

Note– Having an issue downloading sites for POR date range (Unclear why exactly. Seems to be associated applying date query parameters to older sites). Downloading all sites separately, then subsetting to just those included in narrowresult.

sites=read.csv(file='01-raw-data/sites-2019-04-04.csv')
nr=read.csv(file='01-raw-data/narrowresult-2019-04-04.csv')
sites=sites[sites$MonitoringLocationIdentifier %in% nr$MonitoringLocationIdentifier,]
write.csv(file='01-raw-data/sites-2019-04-04.csv', sites, row.names=F)
rm(sites)

Data imports

Read raw data into R, remove duplicates and check for orphans

irdata <- readWQPFiles(file_select=FALSE,
            narrowresult_file = "01-raw-data\\narrowresult-2019-04-04.csv",
            sites_file = "01-raw-data\\sites-2019-04-04.csv",
            activity_file = "01-raw-data\\activity-2019-04-04.csv",
            detquantlim_file = "01-raw-data\\detquantlim-2019-04-04.csv",
            orph_check = TRUE)
## [1] "------------READING IN FILES--------------"
## [1] "----REMOVING EXACT DUPLICATES-----"
## [1] "-----PERFORMING ORPHAN RECORD CHECKS------"
## [1] "3 orphan records detected in sites file with no match to narrowresult."
## [1] "narrowresult_sites_orphans object created containing orphan records."
## [1] "Date forms between narrowresult and activity often cause erroneous orphans. Check date forms below. If date forms do not match, prior conversion using as.Date() is needed."
## [1] "narrowresult file:"
## [1] 2010-07-13 2010-07-13 2010-07-13 2010-07-13 2010-07-13 2010-07-13
## 1775 Levels: 2008-10-01 2008-10-02 2008-10-03 2008-10-04 ... 2014-09-30
## [1] "activity file:"
## [1] 2010-07-13 2010-07-13 2010-07-13 2010-07-13 2010-07-13 2010-07-13
## 1777 Levels: 2008-10-01 2008-10-02 2008-10-03 2008-10-04 ... 2014-09-30
## [1] "98 orphan records detected in activity file with no match to narrowresult."
## [1] "narrowresult_activity_orphans object created containing orphan records."
## [1] "NOTE: narrowresult will likely have many orphan records not represented in detquantlim. This occurs for a few reasons: (1) labs sometimes do not report detection quantitation limits, and (2) field measurements often do not report detection quantitation limits."
## [1] "684950 orphan records detected in narrowresult file with no match to detquantlim."
## [1] "narrowresult_detquantlim_orphans object created containing orphan records."
## [1] "Need to figure out non-numeric data in numeric columns conundrum."
## Warning in wqTools::facToNum(wqpdat$merged_results$ResultMeasureValue): NAs
## introduced by coercion
## [1] "----FILES SUCCESSFULLY ADDED TO R OBJECT LIST----"
objects(irdata)
## [1] "detquantlim"                      "merged_results"                  
## [3] "narrowresult_activity_orphans"    "narrowresult_detquantlim_orphans"
## [5] "narrowresult_site_orphans"        "sites"
attach(irdata)
## The following objects are masked from wqpdat:
## 
##     detquantlim, sites

Site and data validation

Auto site validation

autoValidateWQPsites(
    sites_object=sites,
    master_site_file="02-site-validation/IR_master_site_file.csv",
    waterbody_type_file = "lookup-tables/waterbody_type_domain_table.csv",
    polygon_path="02-site-validation/polygons",
    outfile_path="02-site-validation"
    )
## [1] "Reading in sites_file and master_sites_file and checking for new waterbody types..."
## [1] "No new monitoring location types detected."
## [1] "0 total master site records in file."
## [1] "2631 sites found in sites_file not present in master_site_file."
## Reading layer `UT_state_bnd_noTribal_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -114.0587 ymin: 36.98649 xmax: -109.0318 ymax: 42.01131
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## dist is assumed to be in decimal degrees (arc_degrees).
## Reading layer `AU_poly_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 900 features and 48 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -114.053 ymin: 36.99817 xmax: -109.0411 ymax: 42.00162
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `AU_poly_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 900 features and 48 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -114.053 ymin: 36.99817 xmax: -109.0411 ymax: 42.00162
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `Beneficial_Uses_All_2020IR_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 702 features and 11 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -114.053 ymin: 36.99791 xmax: -109.0411 ymax: 42.00162
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `SiteSpecific_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 30 features and 20 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -113.705 ymin: 36.99998 xmax: -109.2733 ymax: 42.00122
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `GSL_poly_wgs84' from data source `C:\Users\jvander\Documents\R\irTools-test-16\02-site-validation\polygons' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -113.0727 ymin: 40.66583 xmax: -111.9095 ymax: 41.70094
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## [1] "Performing attribute based site checks..."
## [1] "Attribute based site rejection reason count:"
## 
##      Attributes indicate dup, rep, blank, dummy, or QAQC site 
##                                                           127 
##                                      Horizontal datum unknown 
##                                                            20 
##      Aquifer name populated: associated with unassessed wells 
##                                                            10 
##    Formation type populated: associated with unassessed wells 
##                                                            14 
## Aquifer type name populated: associated with unassessed wells 
##                                                            14 
##                                        Non-assessed site type 
##                                                             9 
## [1] "Performing spatial site checks..."
## [1] "Spatial site rejection reason count:"
## 
##                                                 Undefined AU Non-jurisdictional: out of state or within tribal boundaries                        GSL assessed through separate program 
##                                                          162                                                           57                                                           81 
##                                  Non-assessed canal or ditch            Stream or spring site type in non-River/Stream AU 
##                                                           40                                                           47 
## [1] "Performing 100m/duplicate MLID/duplicate lat-long checks..."
## [1] "Spatial site review reason count:"
## 
##                                                           Duplicated MLID                                                    Duplicated lat or long                                              One or more sites w/in 100 m 
##                                                                         8                                                                        35                                                                       298 
## MLID type is lake/reservoir, but AU_Type is not - potential new AU needed 
##                                                                        48 
## [1] " Site validation complete and autovalidation resulted in 1803 accepted sites, 487 rejected sites, and 341 sites in need of review."
## [1] "Updated master site list has 2631 sites, with 2631 new sites added to original 0 sites in master list."
## [1] "Master site file updated and review/rejection reasons file created."
## [1] "02-site-validation/IR_master_site_file.csv"
## [1] "02-site-validation\\rev_rej_reasons.csv"

Manual site validation

Performed a test manual site validation by accepting, rejecting, and merging a number of sites through multiple iterations of the application. See ‘02-site-validation’ folder. Data from all sites still requiring review will be rejected at a later step. A ‘ReviewComment’ column was manually added to the master site list file. The site list and flat reasons files were manually added to a single .xlsx workbook to interface with the site review application (tabnames ‘sites’ and ‘reasons’).

Site review application

runSiteValApp()

Data translations & processing

Identify translation workbook

translation_wb="C:\\Users\\jvander\\Documents\\R\\irTools-test-16\\lookup-tables\\ir_translation_workbook.xlsx"

Update detection condition / limit name tables

updateDetCondLimTables(results=merged_results, detquantlim=detquantlim, translation_wb=translation_wb,
                        detConditionTable_startRow=2, detLimitTypeTable_startRow=2)
## [1] "No new ResultDetectionConditionText value(s) identified"
## [1] "No new DetectionQuantitationLimitTypeName value(s) identified."
## [1] "Translation workbook updated & saved."

Fill masked/censored values in results

merged_results_filled=fillMaskedValues(results=merged_results, detquantlim=detquantlim, translation_wb=translation_wb,
                                       detLimitTypeTable_sheetname="detLimitTypeTable", detLimitTypeTable_startRow=2,
                                       unitConvTable_sheetname="unitConvTable", unitConvTable_startRow=1, unitConvTable_startCol=1,
                                       lql_fac=0.5, uql_fac=1)
## [1] "Checking for disparities in result units and limit units..."
## [1] "Unit conversion(s) needed between detection limit unit(s) and result unit(s). Checking for new unit conversions..."
## [1] "No new unit combinations detected. Proceeding to unit conversion..."
## Warning in fillMaskedValues(results = merged_results, detquantlim = detquantlim, : FYI: There are 33 records with both upper and lower quantitation limits and is.na(result values). These records have been assigned as 'ND's
## [1] "Detection condition counts:"
## 
##    DET     ND    NRV     OD 
## 717579 229705  14567    808