The goal of the script is to fix issues in NuSEDS and to merge the two constituent datasets all_areas_nuseds (referred to as NUSEDS) and conservation_unit_system_sites (referred to as CUSS).
The dataset version corresponding to this script is:
Fisheries and Oceans Canada. 2024. NuSEDS - New Salmon Escapement Database System. Pacific Biological Station. Retrieved January 22, 2024, from https://open.canada.ca/data/en/dataset/c48669a3-045b-400d-b730-48aafe8c5ee6.
The NuSEDS (all_areas_nuseds) and CUSS (conservation_unit_system_sites) datasets are imported and duplicated rows are removed. The columns considered for NuSEDS are related to the population, location, fish counts and year of assessment:
## [1] "SPECIES" "POP_ID"
## [3] "GFE_ID" "ANALYSIS_YR"
## [5] "NATURAL_ADULT_SPAWNERS" "NATURAL_JACK_SPAWNERS"
## [7] "NATURAL_SPAWNERS_TOTAL" "ADULT_BROODSTOCK_REMOVALS"
## [9] "JACK_BROODSTOCK_REMOVALS" "TOTAL_BROODSTOCK_REMOVALS"
## [11] "OTHER_REMOVALS" "TOTAL_RETURN_TO_RIVER"
The following 1 duplicated rows in NuSEDS when considering these columns are removed:
## SPECIES POP_ID GFE_ID ANALYSIS_YR NATURAL_ADULT_SPAWNERS
## 190541 Chum 51316 1174 2011 NA
## NATURAL_JACK_SPAWNERS NATURAL_SPAWNERS_TOTAL ADULT_BROODSTOCK_REMOVALS
## 190541 NA NA NA
## JACK_BROODSTOCK_REMOVALS TOTAL_BROODSTOCK_REMOVALS OTHER_REMOVALS
## 190541 NA NA NA
## TOTAL_RETURN_TO_RIVER
## 190541 NA
For CUSS, 0 duplicated rows are found:
## [1] MAP_LABEL GFE_ID SYSTEM_SITE GFE_TYPE
## [5] SPECIES_QUALIFIED Y_LAT X_LONGT FAZ_ACRO
## [9] MAZ_ACRO JAZ_ACRO CU_NAME CU_ACRO
## [13] CU_LAT CU_LONGT CU_TYPE CU_INDEX
## [17] FULL_CU_IN SBJ_ID POP_ID IS_INDICATOR
## [21] CMNTS EFFECTIVE_DT WATERSHED_CDE FWA_WATERSHED_CDE
## <0 rows> (or 0-length row.names)
The list of conservation units (CUs) as shown in the Pacific Salmon
Explorer (PSE) is imported.
The columns cu_name_pse
and cu_name_dfo
correspond to CU_NAME
in CUSS, cu_index
corresponds to FULL_CU_INDEX
.
## region species_name species_abbr cuid cu_name_pse cu_name_dfo
## 1 Skeena Lake sockeye SEL 171 Alastair Alastair
## 2 Skeena Lake sockeye SEL 174 Ecstall/Lower Ecstall/Lower
## 3 Skeena Lake sockeye SEL 175 Johnston Johnston
## 4 Skeena Lake sockeye SEL 176 Kitsumkalum Kitsumkalum
## 5 Skeena Lake sockeye SEL 177 Lakelse Lakelse
## 6 Skeena Lake sockeye SEL 172 Mcdonell/Dennis/Aldrich Aldrich
## cu_index
## 1 SEL-20-01
## 2 SEL-20-04
## 3 SEL-20-05
## 4 SEL-20-06
## 5 SEL-20-07
## 6 SEL-20-02
The DFO file of the stream locations and coordinates (emailed from Wu
Zhipeng, DFO, 09/04/2024) is imported. The columns NME
and
ID
correspond to SYSTEM_SITE
and
GFE_ID
in CUSS, respectively.
## # A tibble: 6 × 4
## NME ID X_LONGT Y_LAT
## <chr> <dbl> <chr> <chr>
## 1 STANLEY CREEK 1557 -132.691420767338 53.9746116925906
## 2 DAVIDSON CREEK 1558 -132.682181062338 53.9544135568635
## 3 NADEN RIVER 1559 -132.672499482506 53.9448571061959
## 4 LIGNITE CREEK 1560 -132.601901312242 53.97114945092
## 5 OTUN RIVER 1561 -132.260106013 54.07328672
## 6 BILL CREEK 1562 -132.395009112518 53.7494926521361
We remove from NUSEDS and CUSS data related to steelhead, Atlantic
salmon and Kokanee salmon, corresponding to 941 rows and 133 populations
(POP_ID
) in NUSEDS and 0 in CUSS.
We create the column IndexId
, which the the combination
of the species acronym (i.e., CO, CM, CN, PKO, PKE, SX) and
POP_ID
:
## [1] "CN_48442" "CO_590" "CO_1794" "CO_1844" "CO_2172" "PKE_51264"
We create the column MAX_ESTIMATE
which is the maximum
value of the fish count columns:
## [1] "NATURAL_ADULT_SPAWNERS" "NATURAL_JACK_SPAWNERS"
## [3] "NATURAL_SPAWNERS_TOTAL" "ADULT_BROODSTOCK_REMOVALS"
## [5] "JACK_BROODSTOCK_REMOVALS" "TOTAL_BROODSTOCK_REMOVALS"
## [7] "OTHER_REMOVALS" "TOTAL_RETURN_TO_RIVER"
There are multiple locations in CUSS that have
different SYSTEM_SITE
and GFE_ID
but exactly
the same geographic coordinates. This is because the coordinates are not
defined as the survey location but the “Location of the mouth of the
waterbody if flowing, or the centroid if not.”. There can be large
distances between the coordinates in CUSS and the real survey locations
for large water bodies. We consequently manually update the coordinates
of certain of these locations.
There are 21 unique coordinates with multiple
SYSTEM_SITE
and GFE_ID
(n = 41). We group
locations with identical coordinates and for each of the group
(identified with a letter in the table below), we manually modify the
coordinates of the locations relatively to each other using
SYSTEM_SITE
:
## SYSTEM_SITE GFE_ID Y_LAT X_LONGT group
## 1 FENNELL CREEK AND SASKUM CREEK 2746 51.352577 -119.723268 A
## 2 FENNELL CREEK 261 51.352577 -119.723268 A
## 3 CENTRE CREEK 82 49.103185 -121.536585 B
## 4 CENTRE CHANNEL 2533 49.103185 -121.536585 B
## 5 FOLEY CREEK 78 49.106138 -121.635958 C
## 6 FOLEY CREEK SIDE CHANNEL 381 49.106138 -121.635958 C
## 7 NORRISH CREEK 51 49.173172 -122.135356 D
## 8 HAWKINS CREEK 54 49.173172 -122.135356 D
## 9 BAXTER BEACH 11513 52.722503 -120.860114 E
## 10 BEAR BEACH - SHORE 11514 52.722503 -120.860114 E
## 11 BETTY FRANK'S - SHORE 11515 52.722503 -120.860114 E
## 12 LILLOOET RIVER - LOWER 177 49.747468 -122.145828 F
## 13 LILLOOET RIVER 2475 49.747468 -122.145828 F
## 14 CAYOOSH CREEK 131 50.668727 -121.972610 G
## 15 SETON AND CAYOOSH CREEKS 7990579 50.668727 -121.972610 G
## 16 BOISE CREEK 24 49.624229 -122.659958 H
## 17 BOISE CREEK - NORTH 26776 49.624229 -122.659958 H
## 18 FRASER RIVER 1 49.118876 -123.191011 I
## 19 FRASER RIVER AND TRIBUTARIES 7990634 49.118876 -123.191011 I
## 20 CHILCOTIN RIVER - LOWER 2463 51.739953 -122.401309 J
## 21 CHILCOTIN RIVER 285 51.739953 -122.401309 J
## 22 SOMASS RIVER 1261 49.246397 -124.819840 K
## 23 SOMASS-SPROAT-GC SYSTEM 11485 49.246397 -124.819840 K
## 24 CLAYOQUOT RIVER - LOWER 1310 49.183381 -125.537647 L
## 25 CLAYOQUOT RIVER 2487 49.183381 -125.537647 L
## 26 PINKUT CREEK 501255 54.446302 -125.460072 M
## 27 PINKUT CREEK - BELOW WEIR 2115 54.446302 -125.460072 M
## 28 FORESTRY CREEK 2678 52.376885 -126.584031 N
## 29 MARTY'S CREEK 2680 52.376885 -126.584031 N
## 30 CAYEGHLE CREEK 1450 50.342470 -127.439898 O
## 31 CAYEGHLE SYSTEM 2302 50.342470 -127.439898 O
## 32 TAKLA LAKE 2439 55.075216 -125.538582 P
## 33 TAKLA - SHORE 19719 55.075216 -125.538582 P
## 34 FULTON RIVER 501254 54.814315 -126.146194 Q
## 35 FULTON RIVER - BELOW WEIR 2111 54.814315 -126.146194 Q
## 36 JENNY BAY CREEKS 1798 52.224030 -127.598389 R
## 37 JENNY BAY SOUTH CREEK 2693 52.224030 -127.598389 R
## 38 FANNIE COVE LEFT HAND CREEK 2687 52.043911 -128.068542 S
## 39 FANNIE COVE RIGHT HAND CREEK 2708 52.043911 -128.068542 S
## 40 SEWELL INLET CREEK L.H. #2 1594 52.876978 -131.949569 T
## 41 THORSEN CREEK 1595 52.876978 -131.949569 T
For group A:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new
## 1 2746 FENNELL CREEK AND SASKUM CREEK 51.352577 -119.723268 NA
## 2 261 FENNELL CREEK 51.352577 -119.723268 51.344756
## X_LONGT_new
## 1 NA
## 2 -119.797289
For group B:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 3 82 CENTRE CREEK 49.103185 -121.536585 NA NA
## 4 2533 CENTRE CHANNEL 49.103185 -121.536585 49.090741 -121.531078
For group C:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 5 78 FOLEY CREEK 49.106138 -121.635958 49.107077 -121.636804
## 6 381 FOLEY CREEK SIDE CHANNEL 49.106138 -121.635958 NA NA
For group D:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 7 51 NORRISH CREEK 49.173172 -122.135356 NA NA
## 8 54 HAWKINS CREEK 49.173172 -122.135356 49.169178 -122.19174
For group E:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 9 11513 BAXTER BEACH 52.722503 -120.860114 NA NA
## 10 11514 BEAR BEACH - SHORE 52.722503 -120.860114 52.720154 -120.867586
## 11 11515 BETTY FRANK'S - SHORE 52.722503 -120.860114 52.746393 -120.837075
For group F:
For group F:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 12 177 LILLOOET RIVER - LOWER 49.747468 -122.145828 NA NA
## 13 2475 LILLOOET RIVER 49.747468 -122.145828 49.740213 -122.14739
For group G:
For group G:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 14 131 CAYOOSH CREEK 50.668727 -121.97261 50.680016 -121.929476
## 15 7990579 SETON AND CAYOOSH CREEKS 50.668727 -121.97261 NA NA
For group H:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 16 24 BOISE CREEK 49.624229 -122.659958 NA NA
## 17 26776 BOISE CREEK - NORTH 49.624229 -122.659958 49.626847 -122.671116
For group I:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new
## 18 1 FRASER RIVER 49.118876 -123.191011 NA
## 19 7990634 FRASER RIVER AND TRIBUTARIES 49.118876 -123.191011 49.111242
## X_LONGT_new
## 18 NA
## 19 -123.168505
For group J:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 20 2463 CHILCOTIN RIVER - LOWER 51.739953 -122.401309 NA NA
## 21 285 CHILCOTIN RIVER 51.739953 -122.401309 51.745035 -122.413504
For group K:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 22 1261 SOMASS RIVER 49.246397 -124.81984 NA NA
## 23 11485 SOMASS-SPROAT-GC SYSTEM 49.246397 -124.81984 49.294603 -124.879074
For group L:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 24 1310 CLAYOQUOT RIVER - LOWER 49.183381 -125.537647 NA NA
## 25 2487 CLAYOQUOT RIVER 49.183381 -125.537647 49.201198 -125.51245
For group M:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 26 501255 PINKUT CREEK 54.446302 -125.460072 54.445124 -125.458809
## 27 2115 PINKUT CREEK - BELOW WEIR 54.446302 -125.460072 NA NA
For group N:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 28 2678 FORESTRY CREEK 52.376885 -126.584031 52.378979 -126.581766
## 29 2680 MARTY'S CREEK 52.376885 -126.584031 NA NA
For group O:
For group O:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 30 1450 CAYEGHLE CREEK 50.34247 -127.439898 NA NA
## 31 2302 CAYEGHLE SYSTEM 50.34247 -127.439898 50.342682 -127.437795
For group P:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 32 2439 TAKLA LAKE 55.075216 -125.538582 NA NA
## 33 19719 TAKLA - SHORE 55.075216 -125.538582 55.08134 -125.560056
For group Q:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 34 501254 FULTON RIVER 54.814315 -126.146194 NA NA
## 35 2111 FULTON RIVER - BELOW WEIR 54.814315 -126.146194 54.81601 -126.163338
For group R:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new X_LONGT_new
## 36 1798 JENNY BAY CREEKS 52.22403 -127.598389 52.225501 -127.598108
## 37 2693 JENNY BAY SOUTH CREEK 52.22403 -127.598389 NA NA
For group S:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new
## 38 2687 FANNIE COVE LEFT HAND CREEK 52.043911 -128.068542 52.044541
## 39 2708 FANNIE COVE RIGHT HAND CREEK 52.043911 -128.068542 NA
## X_LONGT_new
## 38 -128.068576
## 39 NA
For group T:
## GFE_ID SYSTEM_SITE Y_LAT X_LONGT Y_LAT_new
## 40 1594 SEWELL INLET CREEK L.H. #2 52.876978 -131.949569 52.877146
## 41 1595 THORSEN CREEK 52.876978 -131.949569 NA
## X_LONGT_new
## 40 -132.0007
## 41 NA
The fields X_LONGT
and Y_LAT
are updated in
CUSS, and the new field coordinates_changed
is added to
indicate which locations are concerned.
# check is there still are different GFE_IDs with same coordinates:
coord_duplicated <- locations[,c("X_LONGT","Y_LAT")][duplicated(locations[,c("X_LONGT","Y_LAT")]),]
nrow(coord_duplicated) # 0
## [1] 0
This procedure removes 102845 rows in NUSEDS, corresponding to 24.9% of the original dataset.
The data removed concerns 4491 time series, which are referenced in
removed_all
:
## IndexId GFE_ID dataset comment
## 1 CM_46125 204 NUSEDS Only NAs for MAX_ESTIMATE
## 2 PKO_48406 427 NUSEDS Only NAs for MAX_ESTIMATE
## 3 PKO_50417 1061 NUSEDS Only NAs for MAX_ESTIMATE
## 4 CM_48735 765 NUSEDS Only NAs for MAX_ESTIMATE
## 5 CN_45966 415 NUSEDS Only NAs for MAX_ESTIMATE
## 6 SX_50720 835 NUSEDS Only NAs for MAX_ESTIMATE
Note that we do not remove these series in CUSS at this stage because they could be alternative series for the series in NUSEDS that do not appear in CUSS. It is only once series in NUSEDS absent in CUSS were found alternative series for, that we remove series in CUSS that have only NAs and/or 0s in NUSEDS.
Look for each time series in CUSS (i.e., unique
IndexId
& GFE_ID
association) and:
check if there are IndexId
associated to multiple
locations (i.e., GFE_ID
); if yes: trouble shoot manually
because this should not happen: IndexId
should be
associated to one unique location.
else look if there is a time series with the same
IndexId
& GFE_ID
in
NUSEDS; if yes: all good :-)
else: that could be due to the wrong attribution of either (i)
the IndexId
or (ii) the GFE_ID
. The rest of
the code looks for potential alternative series in
NUSEDS with either a different IndexId
(but with the same species) or a different GFE_ID
.
Alternative series identified NOT present in CUSS are
kept, the ones present are not considered.
The procedure returns a simple data frame with the
IndexId
& GFE_ID
series concerned and
associated comment and eventual potential alternative series that have
to be checked manually after:
## i IndexId GFE_ID in_cuss in_nused
## 1 23 PKE_40049 1211 yes no
## 2 32 PKE_52934 1143 yes no
## 3 60 PKE_40149 1221 yes no
## 4 62 PKE_50494 812 yes no
## 5 112 PKO_46644 262 yes no
## 6 176 CN_46842 285 yes no
## comment
## 1 There is no alternative series in NUSEDS
## 2 There is no alternative series in NUSEDS
## 3 There is no alternative series in NUSEDS
## 4 There is no alternative series in NUSEDS
## 5 There is no alternative series in NUSEDS
## 6 Alternative series: CN_46842 & 2463, CN_46841 & 285
There are 259 out of 7145 time series in CUSS that are not present in NUSEDS (i.e., 3.6%), and the vast majority of them do not have an alternative series in NUSEDS:
table(trackRecord$comment[cond])
##
## Alternative series: CN_46842 & 2463, CN_46841 & 285
## 1
## Alternative series: CO_46835 & 2463
## 1
## Alternative series: PKE_54793433 & 1490
## 1
## Alternative series: SX_47954 & 21
## 1
## In CUSS: there are multiple GFE_IDs for CN_7479 : 133, 2373
## 1
## There is no alternative series in NUSEDS
## 254
The goal here is to go over all the cases where alternative series are available. The title of the figures below shows the focal series that is in CUSS but not in NUSEDS. The series on the top left corner are potential alternative series (i.e., they are present in NUSEDS but not in CUSS).
comment <- "Alternative series:"
trackRecord_cuss_alternative <- trackRecord[grepl(comment,trackRecord$comment),]
## [1] 4
## i IndexId GFE_ID in_cuss in_nused
## 1 176 CN_46842 285 yes no
## 2 3489 PKE_42409 1490 yes no
## 3 4724 CO_46835 285 yes no
## 4 6685 SX_7763 21 yes no
## comment
## 1 Alternative series: CN_46842 & 2463, CN_46841 & 285
## 2 Alternative series: PKE_54793433 & 1490
## 3 Alternative series: CO_46835 & 2463
## 4 Alternative series: SX_47954 & 21
The focal CUSS time series CN_46842 & GFE_ID =
285 corresponds to the CU_NAME
:
## [1] "MIDDLE FRASER RIVER_SP_1.3"
The two alternative time series are noted as two different runs,
according to NUSEDS’s POPULATION
:
## [1] "Chilcotin River (Williams Lake Area) Chinook Run 2"
## [2] "Chilcotin River (Williams Lake Area) Chinook Run 1"
The distance between the two locations (i.e., the two
GFE_ID
) is 1.01 km.
Decision: It was decided to merge the two time series CN_46842 & GFE_ID = 2463 and CN_46841 & GFE_ID = 285 to the series CN_46842 & GFE_ID = 285 in NUSEDS because there is only one timing for this Middle Fraser River (Spring 5-2) Chinook CU and the two series are complementary.
Decision: we remove series PKE_42409 & GFE_ID = 1490 from CUSS and PKE_54793433 & GFE_ID = 1490 from NUSEDS.
These two locations are from the same river according the
CUSS’s SYSTEM_SITE
:
## [1] "CHILCOTIN RIVER - LOWER" "CHILCOTIN RIVER"
The distance between the two locations is 1.01 km (same as before).
Decision: Replace CO_46835 & GFE_ID = 2463 by CO_46835 & GFE_ID = 285 in NUSEDS.
The two time series have the same CU_NAME
in
CUSS:
## SX_7763 SX_47954
## "WIDGEON" "WIDGEON"
Decision: Replace SX_47954 & GFE_ID = 21 by SX_7763 & GFE_ID = 21 in NUSEDS.
The goal is to remove these InsexId
&
GFE_ID
references from CUSS and to specify
if they were simply not present in the orginal NUSEDS
or only contained NAs and/or 0s.
comment <- "There is no alternative series in NUSEDS"
trackRecord_cuss_noAlernative <- trackRecord[grepl(comment,trackRecord$comment),]
## [1] 254
## IndexId GFE_ID in_cuss in_nused comment
## 1 PKE_40049 1211 yes no There is no alternative series in NUSEDS
## 2 PKE_52934 1143 yes no There is no alternative series in NUSEDS
## 3 PKE_40149 1221 yes no There is no alternative series in NUSEDS
## 4 PKE_50494 812 yes no There is no alternative series in NUSEDS
## 5 PKO_46644 262 yes no There is no alternative series in NUSEDS
## 6 CM_3261 2637 yes no There is no alternative series in NUSEDS
253 of the 254 InsexId
& GFE_ID
references were removed previously from NUSEDS because
they contained only NAs and/or 0s. Only the following reference was not
in NUSEDS initially:
## GFE_ID SYSTEM_SITE SPECIES_QUALIFIED CU_NAME FULL_CU_IN
## 1 150 SEEBACH CREEK SER UPPER FRASER SER-06
There should not be any InsexId
associated to multiple
GFE_ID
(it is a MANY TO ONE relationship).
comment <- "In CUSS: there are multiple GFE_IDs for"
trackRecord_cuss_multi_fgeid <- trackRecord[grepl(comment,trackRecord$comment),]
This happens for 2 populations:
## i IndexId GFE_ID in_cuss in_nused
## 1 1169 CN_7479 133 yes no
## 2 1169 CN_7479 2373 yes yes
## 3 6288 SX_45525 2444 yes yes
## 4 6288 SX_45525 303 yes yes
## comment
## 1 In CUSS: there are multiple GFE_IDs for CN_7479 : 133, 2373
## 2 In CUSS: there are multiple GFE_IDs for CN_7479 : 133, 2373
## 3 In CUSS: there are multiple GFE_IDs for SX_45525 : 2444, 303
## 4 In CUSS: there are multiple GFE_IDs for SX_45525 : 2444, 303
The figure above only show the time series present in NUSEDS. The other time series is removed from CUSS.
These two time series of this CU (CU_NAME
=
NADINA/FRANCOIS-EARLY SUMMER TIMING, NADINA/FRANCOIS-EARLY SUMMER
TIMING) should have a different IndexId
(i.e. POP_ID
). We keep them as is and do not make any
change to NUSEDS and CUSS.
The goal is to look for each IndexId
&
GFE_ID
series in NUSEDS that are not in
CUSS and to:
Remove series with < 4 data points (too much work for not enough data points; eventually rescue this data in future)
Look if there are alternative series in NUSEDS that are also present in CUSS
if not, add the IndexId
& GFE_ID
reference in CUSS
if yes, see if the alternative series can be merged; if not, add
the IndexId
& GFE_ID
reference to
CUSS
There are 171 IndexId
& GFE_ID
references in NUSEDS that are not present in
CUSS, for instance:
## IndexId GFE_ID nb_dataPt alternative_IndexId alternative_GFE_ID
## 1 CM_1622 11 2 none none
## 2 CM_2434 4 11 none none
## 3 CM_297 14 1 CM_47925 none
## 4 CM_3305 11486 15 none 11485
## 5 CM_3305 11487 2 none 11485
## 6 CM_3305 11488 3 none 11485
There are 94 populations removed because they have < 4 data points; for instance:
The procedure removes 1387 rows in NUSEDS.
The section concerns focal IndexId
&
GFE_ID
time series in NUSEDS that are not
in CUSS and for which there are alternative series in
CUSS with the same IndexId
but a different
GFE_ID
. The goal is to compare the focal vs. the
alternative time series and take the following actions:
if the time series are 100% complementary: merge them (i.e., edit
IndexId
in NUSEDS)
if the time series are 100% duplicated: remove the focal series in NUSEDS
if the series are partially conflictual or if there are multiple potential alternative series: trubble shoot manually
There are 6 time series concerned:
## IndexId GFE_ID nb_dataPt alternative_IndexId
## 1 CN_3331 1156 13 CN_53069 ; CN_53068
## 2 CN_3334 1194 36 CN_51519 ; CN_51518
## 3 CN_39983 1204 7 CN_39984
## 4 CN_7809 442 17 CN_48442
## 5 CO_7776 2243 5 CO_3136
## 6 SX_46308 227 14 SX_46309
The figures below show for each of these cases the decision made. The
figures also show the CU_MAME
of focal and alternative
IndexId
. The i
at the bottom left of each plot
is used after to reference the figures when fixing time series manually.
Several cases are set aside to be manually fixed after.
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 CN_53069 1156 100 0 0 NA 13
## 2 CN_53068 1156 100 0 0 NA 13
## [1] "***"
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 CN_51519 1194 90 0 10 NA 36
## 2 CN_51518 1194 100 0 0 NA 36
## [1] "***"
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 CN_39984 1204 70 0 30 NA 7
## [1] "***"
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 CN_48442 442 100 0 0 NA 17
## [1] "***"
## [1] "Number of rows removed from dataframe = 5"
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 CO_3136 2243 0 100 0 NA 5
## [1] "***"
## IndexId GFE_ID complementary duplicate conflict decision nb_dataPt
## 1 SX_46309 227 40 40 20 NA 14
## [1] "***"
We now assess the cases set aside.
In the case i =
1 above, series CN_3331 - 1156 was not
merged to any of the alternative series because of its unknown run
timing (see below). It is consequently removed from
NUSEDS.
## IndexId type POPULATION
## 1 CN_3331 focal Puntledge River Chinook (unknown timing)
## 2 CN_53069 alternative Puntledge River Fall Chinook
## 3 CN_53068 alternative Puntledge River Spring Chinook
In the case i =
2 above, series CN_3334 - 1194 was
initially going to be merged to the alternative series with which is was
100% compatible. But is was decided to remove it from
NUSEDS because its run timing is unknown:
## IndexId type POPULATION
## 1 CN_3334 focal Nanaimo River Chinook (unknown timing)
## 2 CN_51519 alternative Nanaimo River Fall Chinook
## 3 CN_51518 alternative Nanaimo River Summer Chinook
In the case i =
3 above, series CN_39983 - 1204 was not
merged to the alternative series because of different run timings:
## IndexId type POPULATION
## 1 CN_39983 focal Chemainus River Summer Chinook
## 2 CN_39984 alternative Chemainus River Fall Chinook
It was decided instead to associate the population CN_39983 to the CU
East Vancouver Island-Georgia Strait (Summer 4-1) in the PSE
because of the same run timing and the fact that the Chemainus river is
connected to this CU. The corresponding CU_NAME
in
CUSS is:
## [1] "EAST VANCOUVER ISLAND-GEORGIA STRAIT_SU_0.3"
The reference of the CN_39983 - 1204 time series is added to
CUSS. The CU-related fields in CUSS
are filled with the values corresponding to the above
CU_NAME
. Here below is the row added to
CUSS:
## MAP_LABEL GFE_ID SYSTEM_SITE GFE_TYPE SPECIES_QUALIFIED Y_LAT
## 1 NA 1204 CHEMAINUS RIVER Stream CK 48.897926
## X_LONGT FAZ_ACRO MAZ_ACRO JAZ_ACRO
## 1 -123.673181 EVI GStr EVI+GStr
## CU_NAME CU_ACRO CU_LAT
## 1 EAST VANCOUVER ISLAND-GEORGIA STRAIT_SU_0.3 EVIGStr-sum 49.5622666
## CU_LONGT CU_TYPE CU_INDEX FULL_CU_IN SBJ_ID POP_ID IS_INDICATOR CMNTS
## 1 -125.201122 Current 83 CK-83 NA 39983 <NA> <NA>
## EFFECTIVE_DT WATERSHED_CDE
## 1 <NA> 920-303500-00000-00000-0000-0000-000-000-000-000-000-000
## FWA_WATERSHED_CDE
## 1 920-302830-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000-000000
## SPECIES species_acronym_ncc IndexId coordinates_changed
## 1 Chinook CN CN_39983 FALSE
In the case i =
4 above, series CN_7809 - 442 can be
merged to the alternative series CN_48442 - 442 with which it is 100%
compatible. The run timing indicated in POPULATION
in
NUSEDS are not incompatible:
## CN_7809
## "Okanagan River (Salmon Arm) Summer Chinook"
## CN_48442
## "Okanagan River (Salmon Arm) Chinook Run 1"
However, we decide to merge the alternative time series to the focal one (i.e., change CN_48442 - 442 to CN_7809 - 442) instead of the contrary because the latter is longer and more recent. Consequently both NUSEDS and CUSS are edited.
## [1] "1 rows were edited in conservation_unit_system_sites at the following fields: POP_ID, SPECIES, species_acronym_ncc, IndexId"
## [1] "4 rows were edited in all_areas_nuseds at the following fields: SPECIES, POPULATION, RUN_TYPE, POP_ID, species_acronym_ncc, IndexId"
In the case i =
5 above, the focal series CO_7776 - 2243
is removed from NUSEDS because it is 100% duplicated
with the alternative series CO_3136 - 2243.
In the case i =
6 above, the focal series SX_46308 - 227
is removed from NUSEDS because of a different run
timing (in POPULATION
) compared to the alternative series
(see below) and most values are 0s.
## SX_46308
## "Ross Creek (Salmon Arm) Early Summer Sockeye"
## SX_46309
## "Ross Creek (Salmon Arm) Late Sockeye"
The focal time series (i.e., a unique IndexId
-
GFE_ID
not in CUSS) are compared to
alternative series with a different GFE_ID
and potentially
a different IndexId
(but of the same species).
There are 34 time series concerned:
## IndexId GFE_ID nb_dataPt alternative_IndexId alternative_GFE_ID
## 1 CM_3305 11486 15 none 11485
## 2 CM_47925 15 8 CM_47928 14
## 3 CM_47925 31516 8 none 14
## 4 CM_47925 31740 8 none 14
## 5 CM_50616 2618 8 none 824
## 6 CN_2178 129 5 CN_47189 2476
## 7 CN_3306 11486 38 none 11485
## 8 CN_3306 11488 4 none 11485
## 9 CN_46891 2467 5 none 290
## 10 CN_46892 2466 22 none 290
## 11 CN_47277 2464 46 none 142
## 12 CN_50618 2618 8 none 824
## 13 CO_44539 129 5 CO_47183 2476
## 14 CO_44539 2451 12 none 2476
## 15 CO_46170 2461 7 none 213
## 16 CO_46170 54282 5 none 213
## 17 CO_46582 1754239945 14 none 256
## 18 CO_46582 2049268617 20 none 256
## 19 CO_46582 33045 20 none 256
## 20 CO_46582 57041 5 none 256
## 21 CO_46582 719256264 21 none 256
## 22 CO_46602 212716981 20 none 258
## 23 CO_46602 2746 6 CO_46632 258
## 24 CO_46632 261 7 none 2746
## 25 CO_46795 1921661712 15 none 281
## 26 CO_46795 33103 16 none 281
## 27 CO_50612 2618 6 none 824
## 28 PKE_50614 2618 6 none 824
## 29 PKO_3304 11486 4 none 11485
## 30 PKO_50614 2618 7 none 824
## 31 SX_3302 3416 23 SX_3310 11485
## 32 SX_3302 3444 23 SX_3325 11485
## 33 SX_3416 2746 41 none 261
## 34 SX_50610 2618 13 none 824
Notice in the table above that several time series have the same
IndexId
. Again this is problematic because normally only
one GFE_ID
can be associated to a given
IndexId
(i.e., POP_ID
). We show here after the
focal time series grouped with their alternative series. On top of each
figure is shown the POPULATION
and WATERBODY
associated with each IndexId
(i.e., POP_ID
)
and GFE_ID
, respectively. Belong each figure is shown a
summary table providing the number of data points
(nb_dataPt
) and the % of them that are
complementary
, duplicate
and in
conflict
compared with the alternative series. The decision
made in each case are provided after the figures, using the index
i
shown at the bottom left of the plots for reference.
## [1] "*** i = 1 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CM_3305 11486 NA 11485 15 100 0
## conflict
## 1 0
## [1] "*** i = 2 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_3306 11486 NA 11485 38 100 0
## 2 CN_3306 11488 NA 11485 4 100 0
## conflict
## 1 0
## 2 0
## [1] "*** i = 3 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 PKO_3304 11486 NA 11485 4 100 0
## conflict
## 1 0
## [1] "*** i = 4 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 SX_3302 3416 <NA> 11485 23 100 0
## 2 SX_3302 3444 <NA> 11485 23 100 0
## 3 SX_3302 3416 SX_3310 <NA> 23 100 0
## 4 SX_3302 3444 SX_3325 <NA> 23 100 0
## conflict
## 1 0
## 2 0
## 3 0
## 4 0
## [1] "*** i = 5 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CM_47925 15 <NA> 14 8 90 0
## 2 CM_47925 31516 <NA> 14 8 90 10
## 3 CM_47925 31740 <NA> 14 8 90 0
## 4 CM_47925 15 CM_47928 <NA> 8 100 0
## conflict
## 1 10
## 2 0
## 3 10
## 4 0
## [1] "*** i = 6 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_47277 2464 NA 142 46 100 0
## conflict
## 1 0
## [1] "*** i = 7 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CO_46170 2461 NA 213 7 100 0
## 2 CO_46170 54282 NA 213 5 100 0
## conflict
## 1 0
## 2 0
## [1] "*** i = 8 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_2178 129 <NA> 2476 5 80 0
## 2 CN_2178 129 CN_47189 <NA> 5 0 100
## conflict
## 1 20
## 2 0
## [1] "*** i = 9 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CO_44539 129 <NA> 2476 5 100 0
## 2 CO_44539 2451 <NA> 2476 12 90 0
## 3 CO_44539 129 CO_47183 <NA> 5 100 0
## conflict
## 1 0
## 2 10
## 3 0
## [1] "*** i = 10 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary
## 1 CO_46582 1754239945 NA 256 14 100
## 2 CO_46582 2049268617 NA 256 20 100
## 3 CO_46582 33045 NA 256 20 100
## 4 CO_46582 57041 NA 256 5 100
## 5 CO_46582 719256264 NA 256 21 100
## duplicate conflict
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## [1] "*** i = 11 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary
## 1 CO_46602 212716981 <NA> 258 20 70
## 2 CO_46602 2746 <NA> 258 6 80
## 3 CO_46602 2746 CO_46632 <NA> 6 100
## duplicate conflict
## 1 0 30
## 2 0 20
## 3 0 0
## [1] "*** i = 12 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 SX_3416 2746 NA 261 41 100 0
## conflict
## 1 0
## [1] "*** i = 13 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CO_46632 261 NA 2746 7 100 0
## conflict
## 1 0
## [1] "*** i = 14 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary
## 1 CO_46795 1921661712 NA 281 15 100
## 2 CO_46795 33103 NA 281 16 100
## duplicate conflict
## 1 0 0
## 2 0 0
## [1] "*** i = 15 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_46891 2467 NA 290 5 0 100
## conflict
## 1 0
## [1] "*** i = 16 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_46892 2466 NA 290 22 80 10
## conflict
## 1 0
## [1] "*** i = 17 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CM_50616 2618 NA 824 8 40 0
## conflict
## 1 60
## [1] "*** i = 18 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CN_50618 2618 NA 824 8 0 0
## conflict
## 1 100
## [1] "*** i = 19 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 CO_50612 2618 NA 824 6 0 0
## conflict
## 1 100
## [1] "*** i = 20 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 PKE_50614 2618 NA 824 6 0 0
## conflict
## 1 100
## [1] "*** i = 21 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 PKO_50614 2618 NA 824 7 10 0
## conflict
## 1 90
## [1] "*** i = 22 ***"
## IndexId GFE_ID IndexId_alter GFE_ID_alter nb_dataPt complementary duplicate
## 1 SX_50610 2618 NA 824 13 20 0
## conflict
## 1 80
In case i = 1: series CM_3305 - 11486 is not merged to alternative series CM_3305 - 11485 because these chum salmon would not be able to make it up the fall. Consequently, the reference of the focal series is added to CUSS.
Note that GFE_ID
= 11485 is present in DFO streams file
(under ID
) but coordinates are not:
## # A tibble: 1 × 4
## NME ID X_LONGT Y_LAT
## <chr> <dbl> <chr> <chr>
## 1 WATERSHED ABOVE STAMP FALLS 11486 <NA> <NA>
We consequently attribute coordinates to this location using Google Maps. We selecte the location just above Stamp Falls, on the left of the Stamp river fish ladder:
## Y_LAT X_LONGT
## 49.332224 -124.919717
In case i = 2: series CN_3306 - 11486 in green is not merged to alternative series CN_3306 - 11485 in red for the same reason as above; its reference are instead added to CUSS. The other focal series CN_3306 - 11488 in blue is deleted because it only has 4 data points.
In case i = 3: series PKO_3304 - 11486 in blue is not merged to alternative series PKO_3304 - 11485 in red for the same reason as above; its reference are instead added to CUSS.
In case i = 4: series PKO_3302 - 3416 in black (bottom plot) is merged to alternative series SX_3310 - 3416 in red and series PKO_3302 - 3444 in blue (bottom plot) is merged to alternative series SX_3325 - 3444 in green.
In case i = 5: the focal series CM_47925 - 15 in blue (bottom plot) is merged to the alternative series CM_47928 - 15 in red (bottom plot). The other two focal series CM_47925 - 31516 in black (top plot) and CM_47925 - 31740 in blue (top plot) are added to CUSS.
In case i = 6: the focal series CN_47277 - 2464 in blue is not merged to the alternative series CN_47277 - 142 in red ; it is instead added to CUSS.
The GFE_ID
is present in the DFO streams file but
coordinates are not:
## # A tibble: 1 × 4
## NME ID X_LONGT Y_LAT
## <chr> <dbl> <chr> <chr>
## 1 COTTONWOOD RIVER - LOWER 2464 <NA> <NA>
The coordinate for this location are consequently defined manually using Google Maps; they corresponds to the mouth of Cottonwood river:
## SYSTEM_SITE GFE_ID X_LONGT Y_LAT
## 1 COTTONWOOD RIVER - LOWER 2464 -122.610584 53.119965
In case i = 7: the focal series CO_46170 - 2461 in green is merged to the alternative series CO_46170 - 213 in red ; while the other focal series CO_46170 - 54282 in blue is instead added to CUSS.
In case i = 8: the focal series CN_2178 - 129 in blue is a duplicate of the atlernative series CN_47189 - 129 in red (bottom plot); it is consequently removed from NUSEDS.
In case i = 9: the focal series CO_44539 - 2451 in blue (top plot) is not merged to the alternative series CO_44539 - 2476 in red because of a conflicting point; its reference is instead added to CUSS. The second focal series CO_44539 - 129 in blue (bottom plot) is merged to the alternative series CO_47183 - 129 in red.
In case i = 10: all five focal series are added to
CUSS. Their GFE_ID
are present in the DFO
streams but the coordinates are not:
## # A tibble: 5 × 4
## NME ID X_LONGT Y_LAT
## <chr> <dbl> <chr> <chr>
## 1 BIRCH ISLAND CHANNEL 33045 <NA> <NA>
## 2 UPPER NORTH THOMPSON RIVER MAINSTEM 2049268617 <NA> <NA>
## 3 MIDDLE NORTH THOMPSON RIVER MAINSTEM 1754239945 <NA> <NA>
## 4 PIG CHANNEL COMBINED 719256264 <NA> <NA>
## 5 UPPER NORTH THOMPSON RIVER-MILEDGE CONFLUENCE 57041 <NA> <NA>
The coordinates are consequently defined manually using Google Maps and the map in page 31 of Arbeider et al. 2020. Interior Fraser Coho Salmon Recovery Potential Assessment:
## SYSTEM_SITE GFE_ID X_LONGT
## 1 BIRCH ISLAND CHANNEL 33045 -119.889197
## 2 UPPER NORTH THOMPSON RIVER MAINSTEM 2049268617 -119.177796
## 3 MIDDLE NORTH THOMPSON RIVER MAINSTEM 1754239945 -119.701668
## 4 PIG CHANNEL COMBINED 719256264 -119.810014
## 5 UPPER NORTH THOMPSON RIVER-MILEDGE CONFLUENCE 57041 -119.173514
## Y_LAT
## 1 51.598602
## 2 52.345236
## 3 51.593091
## 4 51.579465
## 5 52.281312
In case i = 11: the focal series CO_46602 - 2746 in blue (bottom plot) is merged to the alternative series CO_46632 - 2476 in red. The other focal series CO_46602 - 212716981 in green (top plot) is added to CUSS.
In case i = 12: the focal series SX_3416 - 2746 in blue (bottom plot) is merged to the alternative series SX_3416 - 261 in red.
In case i = 13: the focal series CO_46632 - 261 in blue (bottom plot) is merged to the alternative series CO_46632 - 2746 in red.
In case i = 14: the focal series CO_46795 - 33103 in blue is merged
to the alternative series CO_46795 - 281 in red. The other focal series
CO_46795 - 1921661712 in green is added to CUSS. Its
GFE_ID
is in the DFO streams file but coordinates are
not:
## # A tibble: 2 × 4
## NME ID X_LONGT Y_LAT
## <chr> <dbl> <chr> <chr>
## 1 BLUE RIVER - LOWER 33103 -119.27964147 52.104610605
## 2 BLUE RIVER - UPPER 1921661712 <NA> <NA>
The coordinates are consequently defined manually using Google Maps:
## SYSTEM_SITE GFE_ID Y_LAT X_LONGT
## 1 BLUE RIVER - UPPER 1921661712 52.112252 -119.289006
In case i = 15: the focal series CN_46891 - 2467 in blue is a duplicate of the alternative series CN_46891 - 290 in red; it is consequently removed from NUSEDS.
In case i = 16: the focal series CN_46892 - 2466 in blue has three duplicated points with the alternative time series CN_46892 - 290 in red, and a conflicting point. However, this conflicting point in the alternative time series is a duplicated point in the third time series CN_46891 - 290 in purple in the bottom plot below. We consequently (1) remove the duplicated point in the alternative series (in red) and (2) merge the focal series to the latter.
In the remaining cases (i = 17, 18, 19, 20, 21, 22): the focal series
have the same IndexId
(i.e. POP_ID
) but a
different GFE_ID
compared to their alternative series: the
location CLEAR WATER CREEK is not present in CUSS and
the location PHILLIPS RIVER is proposed as an alternative. We do not
merge the focal series to their alternative series and add them to
CUSS instead.
There remain 37 series in NUSEDS with no alternative
series, i.e., both their IndexId
(POP_ID
) and
SPECIES & GFE_ID
are absent from CUSS.
These series are removed from NUSEDS because we cannot
associate them to a CU_NAME
. It future, it would be worth
attempting to salvage the longest series in the list. The figures below
show these time series.
At this stage NUSEDS and CUSS have
the same 6911 time series and can consequently be merged using the
common fields IndexId
(i.e., POP_ID
) and
GFE_ID
.
col_common <- c("IndexId","POP_ID","GFE_ID")
col_nuseds <- c("SPECIES","WATERBODY","AREA","Year","MAX_ESTIMATE",
"ENUMERATION_METHODS", #
"ESTIMATE_CLASSIFICATION",
"ESTIMATE_METHOD",
"GAZETTED_NAME",
"LOCAL_NAME_1",
"LOCAL_NAME_2",
"ADULT_PRESENCE",
"JACK_PRESENCE")
col_cuss <- c("SPECIES_QUALIFIED","CU_NAME","CU_TYPE","FAZ_ACRO","JAZ_ACRO","MAZ_ACRO",
"FULL_CU_IN","SYSTEM_SITE","Y_LAT","X_LONGT","IS_INDICATOR",
"CU_LAT","CU_LONGT","coordinates_changed")
nuseds_final <- base::merge(x = all_areas_nuseds[,c(col_common,col_nuseds)],
y = conservation_unit_system_sites[,c(col_common,col_cuss)],
by = col_common,
all.x = T)
So far we have not been able to determine which zeros are actual 0s and those which should be NAs. We consequently implemented the option to replace all 0s by NAs (n = 2996, or 0.98% of the data points).
Decision made:
## [1] "Zeros are replaced by NAs."
There are many cases where a CU in a given location (i.e., a unique
GFE_ID
) is associated to multiple series
(i.e. IndexId
/POP_ID
), which should not
happen. Observing these cases reveals clear duplicated data points or
single data point that are not worth keeping. To fix these issues we
proceed as follow:
Case 1: one of the duplicated series has only one data point:
if it is complementary: merge to the other (longer) series
if it is in conflict or a duplicate: remove the focal series
Case 2: the shorter series is 100% duplicated: removed the focal series
Case 3: for the rest of the duplicated series,
points that are conflictual or duplicated are summed up
points that are complementary are merged
The figures below show the time series before and after correction with the action made (i.e., “MERGED”, “DELETED”, “SUMMED”) indicated
There are three instance where a special fix is done, as indicated below in the corresponding figures.
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "Special fix (above): the blue points before 1950 are deleted, the other ones are summed to the red series."
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "Special fix (above): the blue points is deleted, the green series is merged to the red one."
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "Special fix (above): the two blue points are deleted; one is a duplicate, there other one is very close to be one."
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
## [1] "***"
The following files are exported:
1_NuSEDS_escapement_data_collated_DATE.csv: the cleaned combined NUSEDS and CUSS datasets
1_series_inNUSEDS_noInCUSS_DATE.csv: information about the series in NUSEDS and not in CUSS
1_series_removed_DATE.csv: the series removed from either NUSEDS or CUSS and why
1_series_added_DATE.csv: the series added to either NUSEDS or CUSS and why
1_log_file.csv: the log file reporting the name of the original NUSEDS or CUSS files and the choices related to removing zeros or not.