Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle a question whose type changes but name doesn't. #187

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

noliveleger
Copy link
Contributor

@noliveleger noliveleger commented Dec 14, 2018

Fixes #151.
Fixes kobotoolbox/kpi#2109
Covers #183 and #185.

@noliveleger noliveleger requested a review from jnm December 14, 2018 20:33
@coveralls
Copy link

coveralls commented Dec 14, 2018

Coverage Status

Coverage increased (+0.07%) to 82.839% when pulling 785ebda on 151-question-with-different-types into 29bd37f on master.

src/formpack/pack.py Outdated Show resolved Hide resolved
raw_value = entry.get(field.path)
field_contextual_name = "{}_{}_{}".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(self: revise this) i guess this is the expected contextual name, used for comparison not assignment?
possible to use the create_unique_name() method to avoid hard-coding this here?
https://github.com/kobotoolbox/formpack/pull/187/files#diff-7845025cfa4cd9a06d7a4399e1ba9c0bR58

or, maybe if field.contextual_name.startswith("{}_{}_v".format( below could be changed to something like if field.use_unique_name?

Copy link
Contributor Author

@noliveleger noliveleger Dec 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnm: I now remember why I did this comparison this way.

We have

                        if field.contextual_name.startswith("{}_{}_v".format(
                            field.name,
                            field.data_type
                        )):
                            # We have a match, we won't evaluate other fields with same name
                            # for this submission.
                            if field.contextual_name == field_contextual_name:
                                fields_to_skip.append(field.name)
                            else:
                                continue

Let's explain:

All fields that are added by get_fields_for_versions (because another type has been detected in the latest version) match the pattern name_data_type_version_id.

  • Trivial case: Pattern doesn't match. Data belongs to field. Skip first if statement.
  • Other case: We have a match with this pattern
    1. version_id is the same. Data belongs to this field. All other fields with same name will be skipped for this submission.
    2. version_id is not the same, we want to iterate over loop immediately because the data
      should be appended to the corresponding field (or column in the export) only - which is a further occurrence of the loop.

The {}_{}_v pattern is used to avoid matching another field name.
For example:
- Question 1: restaurant
- Question 2: restaurant_text

Adding the "_v" in the pattern decreases the risk of matching the wrong field.
Obviously if all these rules are true:

  • Form has several versions
  • Form has a question named restaurant has multiple types among versions
  • One of its types is text
  • Form has a question name restaurant_text_v

Export will have data shifted again. The comparison would be a regex to decrease the risk a little bit more.
Something like _v[\d\w]{21,}. What do you think?
Having this said: Your suggestion to use create_unique_name is really good idea.

About the idea of maybe if field.contextual_name.startswith("{}_{}_v".format( below could be changed to something like if field.use_unique_name?
Actually it can't.

In case the form has more than 3 versions and each version contains a question with the same name but 3 different types.
In that case fields should be:

[<FormField type="text" contextual_name="question_text_v123456">,
  <FormField type="integer" contextual_name="question_integer_v234567">,
  <FormField type="date_time" contextual_name="question">]

Let's say, we have 1 submission for each version where value of the question question would be

  • Submission 1: answer
  • Submission 2: 1
  • Submission 3: 2018-12-18 00:00:00

Using field.use_unique_name, submission 1 and 2 would be matched with field question_text_v123456 because both fields question_text_v123456 and question_integer_v234567 have use_unique_name property equals True. Using the string comparison submission 2 can't be matched with question_text_v123456 (because its version_id is not the same)

src/formpack/reporting/export.py Outdated Show resolved Hide resolved
src/formpack/schema/datadef.py Show resolved Hide resolved
self.has_stats = has_stats

def __repr__(self):
return "<%s name='%s'>" % (self.__class__.__name__, self.name)
return "<%s name='%s'>" % (self.__class__.__name__, self.contextual_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: everywhere else, this is changed to contextual_name='%s'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have hesitated between using self.name for FormDataDef or changing the representation string because contextual_name is more related to FormField than its parent class.
Still wondering...

tests/test_autoreport.py Outdated Show resolved Hide resolved
@jnm
Copy link
Member

jnm commented Dec 20, 2018

There's still a problem when disaggregating (using "group by") in the autoreport after I changed an integer field to a select_multiple:
image

ValueError: invalid literal for int() with base 10: '1000 xs'
  File "django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "rest_framework/viewsets.py", line 87, in view
    return self.dispatch(request, *args, **kwargs)
  File "rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "rest_framework/mixins.py", line 58, in retrieve
    return Response(serializer.data)
  File "rest_framework/serializers.py", line 239, in data
    self._data = self.to_representation(self.instance)
  File "kobo/apps/reports/serializers.py", line 27, in to_representation
    _list = report_data.data_by_identifiers(obj, vnames, split_by=split_by)
  File "kobo/apps/reports/report_data.py", line 179, in data_by_identifiers
    split_by=split_by)
  File "formpack/reporting/autoreport.py", line 238, in get_stats
    self.versions, lang, split_by_field)
  File "formpack/reporting/autoreport.py", line 178, in _disaggregate_stats
    for value in values:
  File "formpack/schema/fields.py", line 546, in parse_values
    yield int(raw_values)

Sentry has the details, but make sure to switch to "Full" view instead of "App Only": https://sentry.kbtdev.org/kobo/kpi-backend/issues/18374/

I'm guessing _disaggregate_stats() needs the same treatment you gave _calculate_stats()?

Bonus: what the heck causes No exception message supplied? It's a good thing Sentry captured the details of the exception because I have no idea.
image

@noliveleger
Copy link
Contributor Author

@jnm: Nice catch.

@noliveleger
Copy link
Contributor Author

@jnm: Sentry is savior but he is perhaps the bad guy as well.
Locally I don't activate Sentry and I do see the real error invalid literal for int() with base 10:

@noliveleger noliveleger self-assigned this Dec 20, 2018
jnm added a commit that referenced this pull request Feb 4, 2019
whitespace and `contextual_name` changes
@joshuaberetta
Copy link
Member

@noliveleger what's the way forward on this one?

@noliveleger
Copy link
Contributor Author

@noliveleger what's the way forward on this one?

@joshuaberetta last discussion about that with @jnm was, we wanted to think about it a little bit more.
We could have this discussion with the whole back-end team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants