You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seems to be a discrepancy in the data coming from gfdata::get_survey_sets() and gfdata::get_ll_hook_data(). From get_ll_hook_data(), it looks like count_target_species and count_non_target_species are not always correct. And I am unclear of where bait counts come from for these fishing_event_id values that correspond with these incongruent values.
Setup data to combine hook counts (baited, empty, total) with survey data
1. Number of rows from get_survey_sets() and get_ll_hook_data() should be equal
2. total_hooks should equal hook_count
3. count_target_species should equal catch_count
Current results
More rows are returned from get_ll_hook_data() than get_survey_sets().
From what I can tell, I think that this is because of a difference in checking for usable sets in get_ll_hook_data.
nrow(ye_survey) - nrow(ye_hook_dat)
dat |> filter(is.na(catch_count))
db <- "GFBioSQL"
# check first 5 fishing_event_id values that are extra in ye_hook_dat
use_check <- gfdata::run_sql(db,
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID,
C.SPECIES_CODE, C.CATCH_COUNT,
LS.LGLSP_HOOK_COUNT, LS.LGLSP_HOOKS_SET_COUNT, LS.LGLSP_HOOKS_LOST_COUNT, LS.USABILITY_CODE,
U.USABILITY_DESC
FROM FISHING_EVENT_CATCH FEC
INNER JOIN CATCH C ON
FEC.CATCH_ID = C.CATCH_ID
INNER JOIN LONGLINE_SPECS LS ON
FEC.FISHING_EVENT_ID = LS.FISHING_EVENT_ID
INNER JOIN USABILITY U ON
LS.USABILITY_CODE = U.USABILITY_CODE
LEFT JOIN HOOK_SPECS HS ON
FEC.FISHING_EVENT_ID = HS.FISHING_EVENT_ID
WHERE FEC.FISHING_EVENT_ID IN (309481, 309483, 309514, 309516, 309519, 309521);")
# Includes usability codes not in (0, 1, 2, 6)
glimpse(use_check)
**2. total_hooks does not equal hook_count (from get_survey_sets())in some cases. hook_count looks like the correct value. **
3. catch_count which comes from get_survey_sets() looks like the correct value. While catch information from get_ll_hook_data() looks wrong.
Some examples of 2 and 3:
fishing_event_id == 2193813: count of target species (count_non_target_species) is 0, but should be 3
fishing_event_id == 309482: count of target species is 15 but should be 16
fishing_event_id == 1722030: count of target species is 19 but should be 18
fishing_event_id == 1722028: count of target species is 2 but should be 1
fishing_event_id == 4590043: count of non_target is 0, but 141 dogfish were caught, which looks like why total_hooks is incorrect
count_diffs <- dat |>
drop_na(catch_count) |> # ignore the usability code issue
mutate(c_diff = count_target_species - catch_count,
h_diff = total_hooks - hook_count) |>
select(fishing_event_id, count_non_target_species, count_target_species,
count_bait_only, count_empty_hooks, count_bent_broken, total_hooks,
fishing_event_id, catch_count, hook_count,
c_diff, h_diff) |> # simplify table view
rename(non_target = 'count_non_target_species', target = 'count_target_species',
bait = 'count_bait_only', empty = 'count_empty_hooks',
bent = 'count_bent_broken') # simplify table view
count_diffs |>
arrange(-abs(c_diff), -abs(h_diff)) |> # show big diffs at top of table
slice(1:5) |>
View()
catch_dat <- gfdata::run_sql(db,
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID,
C.SPECIES_CODE, C.CATCH_COUNT, C.CATCH_WEIGHT,
S.SPECIES_COMMON_NAME
FROM FISHING_EVENT_CATCH FEC
INNER JOIN CATCH C ON
FEC.CATCH_ID = C.CATCH_ID
LEFT OUTER JOIN SPECIES S ON
C.SPECIES_CODE = S.SPECIES_CODE
WHERE FEC.FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043)")
View(catch_dat)
4. When confirming that the bait and empty counts are correct, I cannot find the fishing_event_ids from the example above in HOOK_SPECS, so where is the count_bait_only coming from in get_ll_hook_data()? I might just be confused here. How do the other values in the FISHING_EVENT table come into play?
hs <- run_sql(db,
"SELECT *
FROM HOOK_SPECS
WHERE FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043);")
hs
The text was updated successfully, but these errors were encountered:
This is NOT for the IPHC survey, correct? (Then, sorry, I'm not sure I can help).
I know I had similar issues for IPHC survey (and maybe for data that was in gfbio, not just in gfiphc). Is this only happening occasionally? I'm guessing not, but sometimes a line breaks or something, and not all the hooks get fished. Someone dropped their recording sheet over the side of the boat once, and it took me half a day to track that problem down.
Aha - total_hooks might be total hooks deployed? And hence always <= hook_count.
Correct, I only checked this difference for HBLL INS and specifically only compared information using Yelloweye.
Interestingly... total_hooks is NOT always <= hook_count... And the total_hooks calculation subtracts out bent/broken hooks. But I hadn't thought about those other reasons for missing hooks 😆that is helpful.
There seems to be a discrepancy in the data coming from
gfdata::get_survey_sets()
andgfdata::get_ll_hook_data()
. Fromget_ll_hook_data()
, it looks likecount_target_species
andcount_non_target_species
are not always correct. And I am unclear of where bait counts come from for thesefishing_event_id
values that correspond with these incongruent values.Setup data to combine hook counts (baited, empty, total) with survey data
Expected results
1. Number of rows from
get_survey_sets()
andget_ll_hook_data()
should be equal2.
total_hooks
should equalhook_count
3.
count_target_species
should equalcatch_count
Current results
get_ll_hook_data()
thanget_survey_sets()
.From what I can tell, I think that this is because of a difference in checking for usable sets in
get_ll_hook_data
.**2.
total_hooks
does not equalhook_count
(fromget_survey_sets()
)in some cases.hook_count
looks like the correct value. **3.
catch_count
which comes fromget_survey_sets()
looks like the correct value. While catch information fromget_ll_hook_data()
looks wrong.Some examples of 2 and 3:
fishing_event_id == 2193813
: count of target species (count_non_target_species
) is 0, but should be 3fishing_event_id == 309482
: count of target species is 15 but should be 16fishing_event_id == 1722030
: count of target species is 19 but should be 18fishing_event_id == 1722028
: count of target species is 2 but should be 1fishing_event_id == 4590043
: count of non_target is 0, but 141 dogfish were caught, which looks like whytotal_hooks
is incorrect4. When confirming that the bait and empty counts are correct, I cannot find the
fishing_event_id
s from the example above inHOOK_SPECS
, so where is thecount_bait_only
coming from inget_ll_hook_data()
? I might just be confused here. How do the other values in theFISHING_EVENT
table come into play?The text was updated successfully, but these errors were encountered: