Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrepancy in data from get_ll_hook_data() and get_survey_sets() #12

Open
jdunic opened this issue Jul 5, 2023 · 2 comments
Open

discrepancy in data from get_ll_hook_data() and get_survey_sets() #12

jdunic opened this issue Jul 5, 2023 · 2 comments

Comments

@jdunic
Copy link
Contributor

jdunic commented Jul 5, 2023

There seems to be a discrepancy in the data coming from gfdata::get_survey_sets() and gfdata::get_ll_hook_data(). From get_ll_hook_data(), it looks like count_target_species and count_non_target_species are not always correct. And I am unclear of where bait counts come from for these fishing_event_id values that correspond with these incongruent values.

Setup data to combine hook counts (baited, empty, total) with survey data

library(dplyr)

ye_survey <- gfdata::get_survey_sets(species = "yelloweye rockfish", ssid = c(39, 40))

ye_hook_dat <- 
  gfdata::get_ll_hook_data(species = "yelloweye rockfish", ssid = c(39, 40)) |>
    mutate(total_hooks = count_target_species + count_non_target_species +
            count_bait_only + count_empty_hooks - count_bent_broken)

dat <- left_join(ye_hook_dat, ye_survey)

Expected results

1. Number of rows from get_survey_sets() and get_ll_hook_data() should be equal

2. total_hooks should equal hook_count

3. count_target_species should equal catch_count

Current results

  1. More rows are returned from get_ll_hook_data() than get_survey_sets().

From what I can tell, I think that this is because of a difference in checking for usable sets in get_ll_hook_data.

nrow(ye_survey) - nrow(ye_hook_dat)
dat |> filter(is.na(catch_count))

db <- "GFBioSQL"

# check first 5 fishing_event_id values that are extra in ye_hook_dat
use_check <- gfdata::run_sql(db, 
  "SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID, 
    C.SPECIES_CODE, C.CATCH_COUNT, 
    LS.LGLSP_HOOK_COUNT, LS.LGLSP_HOOKS_SET_COUNT, LS.LGLSP_HOOKS_LOST_COUNT, LS.USABILITY_CODE,
    U.USABILITY_DESC
   FROM FISHING_EVENT_CATCH FEC
       INNER JOIN CATCH C ON
       FEC.CATCH_ID = C.CATCH_ID
       INNER JOIN LONGLINE_SPECS LS ON
       FEC.FISHING_EVENT_ID = LS.FISHING_EVENT_ID
       INNER JOIN USABILITY U ON
       LS.USABILITY_CODE = U.USABILITY_CODE
       LEFT JOIN HOOK_SPECS HS ON
       FEC.FISHING_EVENT_ID = HS.FISHING_EVENT_ID
   WHERE FEC.FISHING_EVENT_ID IN (309481, 309483, 309514, 309516, 309519, 309521);")

# Includes usability codes not in (0, 1, 2, 6)
glimpse(use_check)

**2. total_hooks does not equal hook_count (from get_survey_sets())in some cases. hook_count looks like the correct value. **

3. catch_count which comes from get_survey_sets() looks like the correct value. While catch information from get_ll_hook_data() looks wrong.

Some examples of 2 and 3:

  • fishing_event_id == 2193813: count of target species (count_non_target_species) is 0, but should be 3
  • fishing_event_id == 309482: count of target species is 15 but should be 16
  • fishing_event_id == 1722030: count of target species is 19 but should be 18
  • fishing_event_id == 1722028: count of target species is 2 but should be 1
  • fishing_event_id == 4590043: count of non_target is 0, but 141 dogfish were caught, which looks like why total_hooks is incorrect
count_diffs <- dat |>
  drop_na(catch_count) |> # ignore the usability code issue
  mutate(c_diff = count_target_species - catch_count, 
         h_diff = total_hooks - hook_count) |>
  select(fishing_event_id, count_non_target_species, count_target_species, 
         count_bait_only, count_empty_hooks, count_bent_broken, total_hooks, 
         fishing_event_id, catch_count, hook_count, 
         c_diff, h_diff) |> # simplify table view
  rename(non_target = 'count_non_target_species', target = 'count_target_species', 
         bait = 'count_bait_only', empty = 'count_empty_hooks', 
         bent = 'count_bent_broken') # simplify table view

count_diffs |>
  arrange(-abs(c_diff), -abs(h_diff)) |> # show big diffs at top of table
  slice(1:5) |>
  View()

catch_dat <- gfdata::run_sql(db, 
"SELECT FEC.TRIP_ID, FEC.FISHING_EVENT_ID, FEC.CATCH_ID, 
    C.SPECIES_CODE, C.CATCH_COUNT, C.CATCH_WEIGHT, 
    S.SPECIES_COMMON_NAME
 FROM FISHING_EVENT_CATCH FEC
 INNER JOIN CATCH C ON
    FEC.CATCH_ID = C.CATCH_ID
 LEFT OUTER JOIN SPECIES S ON
    C.SPECIES_CODE = S.SPECIES_CODE
 WHERE FEC.FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043)")

View(catch_dat)

4. When confirming that the bait and empty counts are correct, I cannot find the fishing_event_ids from the example above in HOOK_SPECS, so where is the count_bait_only coming from in get_ll_hook_data()? I might just be confused here. How do the other values in the FISHING_EVENT table come into play?

hs <- run_sql(db, 
"SELECT *
 FROM HOOK_SPECS 
 WHERE FISHING_EVENT_ID IN (2193813, 309482, 1722030, 1722028, 4590043);")
hs
@andrew-edwards
Copy link
Member

This is NOT for the IPHC survey, correct? (Then, sorry, I'm not sure I can help).

I know I had similar issues for IPHC survey (and maybe for data that was in gfbio, not just in gfiphc). Is this only happening occasionally? I'm guessing not, but sometimes a line breaks or something, and not all the hooks get fished. Someone dropped their recording sheet over the side of the boat once, and it took me half a day to track that problem down.

Aha - total_hooks might be total hooks deployed? And hence always <= hook_count.

@jdunic
Copy link
Contributor Author

jdunic commented Jul 13, 2023

Correct, I only checked this difference for HBLL INS and specifically only compared information using Yelloweye.

Interestingly... total_hooks is NOT always <= hook_count... And the total_hooks calculation subtracts out bent/broken hooks. But I hadn't thought about those other reasons for missing hooks 😆that is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants