Handle a question whose type changes but name doesn't. #187

noliveleger · 2018-12-14T20:33:31Z

Fixes #151.
Fixes kobotoolbox/kpi#2109
Covers #183 and #185.

…51-question-with-different-types

… with different types

… but different type

coveralls · 2018-12-14T20:40:13Z

Coverage increased (+0.07%) to 82.839% when pulling 785ebda on 151-question-with-different-types into 29bd37f on master.

src/formpack/pack.py

jnm · 2018-12-15T02:19:57Z

src/formpack/reporting/autoreport.py

                    raw_value = entry.get(field.path)
+                    field_contextual_name = "{}_{}_{}".format(


~~(self: revise this) i guess this is the expected contextual name, used for comparison not assignment?~~
possible to use the create_unique_name() method to avoid hard-coding this here?
https://github.com/kobotoolbox/formpack/pull/187/files#diff-7845025cfa4cd9a06d7a4399e1ba9c0bR58

or, maybe if field.contextual_name.startswith("{}_{}_v".format( below could be changed to something like if field.use_unique_name?

@jnm: I now remember why I did this comparison this way.

We have

if field.contextual_name.startswith("{}_{}_v".format( field.name, field.data_type )): # We have a match, we won't evaluate other fields with same name # for this submission. if field.contextual_name == field_contextual_name: fields_to_skip.append(field.name) else: continue

Let's explain:

All fields that are added by get_fields_for_versions (because another type has been detected in the latest version) match the pattern name_data_type_version_id.

Trivial case: Pattern doesn't match. Data belongs to field. Skip first if statement.

Other case: We have a match with this pattern

version_id is the same. Data belongs to this field. All other fields with same name will be skipped for this submission.

version_id is not the same, we want to iterate over loop immediately because the data
should be appended to the corresponding field (or column in the export) only - which is a further occurrence of the loop.

The {}_{}_v pattern is used to avoid matching another field name.
For example:
- Question 1: restaurant
- Question 2: restaurant_text

Adding the "_v" in the pattern decreases the risk of matching the wrong field.
Obviously if all these rules are true:

Form has several versions

Form has a question named restaurant has multiple types among versions

One of its types is text

Form has a question name restaurant_text_v

Export will have data shifted again. The comparison would be a regex to decrease the risk a little bit more.
Something like _v[\d\w]{21,}. What do you think?
Having this said: Your suggestion to use create_unique_name is really good idea.

About the idea of maybe if field.contextual_name.startswith("{}_{}_v".format( below could be changed to something like if field.use_unique_name?
Actually it can't.

In case the form has more than 3 versions and each version contains a question with the same name but 3 different types.
In that case fields should be:

[<FormField type="text" contextual_name="question_text_v123456">, <FormField type="integer" contextual_name="question_integer_v234567">, <FormField type="date_time" contextual_name="question">]

Let's say, we have 1 submission for each version where value of the question question would be

Submission 1: answer

Submission 2: 1

Submission 3: 2018-12-18 00:00:00

Using field.use_unique_name, submission 1 and 2 would be matched with field question_text_v123456 because both fields question_text_v123456 and question_integer_v234567 have use_unique_name property equals True. Using the string comparison submission 2 can't be matched with question_text_v123456 (because its version_id is not the same)

src/formpack/reporting/export.py

src/formpack/schema/datadef.py

jnm · 2018-12-15T02:39:55Z

src/formpack/schema/datadef.py

        self.has_stats = has_stats

    def __repr__(self):
-        return "<%s name='%s'>" % (self.__class__.__name__, self.name)
+        return "<%s name='%s'>" % (self.__class__.__name__, self.contextual_name)


note: everywhere else, this is changed to contextual_name='%s'

I have hesitated between using self.name for FormDataDef or changing the representation string because contextual_name is more related to FormField than its parent class.
Still wondering...

tests/test_autoreport.py

jnm · 2018-12-20T09:25:36Z

There's still a problem when disaggregating (using "group by") in the autoreport after I changed an integer field to a select_multiple:

ValueError: invalid literal for int() with base 10: '1000 xs'
  File "django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "django/views/decorators/csrf.py", line 58, in wrapped_view
    return view_func(*args, **kwargs)
  File "rest_framework/viewsets.py", line 87, in view
    return self.dispatch(request, *args, **kwargs)
  File "rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "rest_framework/views.py", line 463, in dispatch
    response = handler(request, *args, **kwargs)
  File "rest_framework/mixins.py", line 58, in retrieve
    return Response(serializer.data)
  File "rest_framework/serializers.py", line 239, in data
    self._data = self.to_representation(self.instance)
  File "kobo/apps/reports/serializers.py", line 27, in to_representation
    _list = report_data.data_by_identifiers(obj, vnames, split_by=split_by)
  File "kobo/apps/reports/report_data.py", line 179, in data_by_identifiers
    split_by=split_by)
  File "formpack/reporting/autoreport.py", line 238, in get_stats
    self.versions, lang, split_by_field)
  File "formpack/reporting/autoreport.py", line 178, in _disaggregate_stats
    for value in values:
  File "formpack/schema/fields.py", line 546, in parse_values
    yield int(raw_values)

Sentry has the details, but make sure to switch to "Full" view instead of "App Only": https://sentry.kbtdev.org/kobo/kpi-backend/issues/18374/

I'm guessing _disaggregate_stats() needs the same treatment you gave _calculate_stats()?

Bonus: what the heck causes No exception message supplied? It's a good thing Sentry captured the details of the exception because I have no idea.

noliveleger · 2018-12-20T13:54:24Z

@jnm: Nice catch.

…rder to match data with correct field

noliveleger · 2018-12-20T15:18:17Z

@jnm: Sentry is savior but he is perhaps the bad guy as well.
Locally I don't activate Sentry and I do see the real error invalid literal for int() with base 10:

whitespace and `contextual_name` changes

…ting stats

joshuaberetta · 2021-07-20T21:00:08Z

@noliveleger what's the way forward on this one?

noliveleger · 2021-07-29T16:42:07Z

@noliveleger what's the way forward on this one?

@joshuaberetta last discussion about that with @jnm was, we wanted to think about it a little bit more.
We could have this discussion with the whole back-end team.

noliveleger added 8 commits December 13, 2018 15:10

Creates a unique_name when field is added to FormVersion

7fb1ad9

Merge branch '182-shifted-value-with-deleted-multiple-options' into 1…

c15fc18

…51-question-with-different-types

Exports supports data for same questions with different types

5af7218

Changed unittest to match new representation of FormField

9652455

Fixed stats calculation when multiple versions contains same question…

5953cde

… with different types

Fixed typo

cc58e06

Used property 'contextual_name' instead of 'name'

f4809a4

Changed unittests for multiple versions with questions with same name…

dc69313

… but different type

noliveleger requested a review from jnm December 14, 2018 20:33

Removed useless try/except when trying parsing values of NumField

3985489

jnm mentioned this pull request Dec 17, 2018

autoreport ignores non-numeric values in numeric field #186

Merged

Merge branch 'master' into 151-question-with-different-types

a042146

jnm requested changes Dec 17, 2018

View reviewed changes

noliveleger added 5 commits December 18, 2018 09:54

Merge branch 'master' into 151-question-with-different-types

3b0db4d

Applied requested changes for PR #187

dc05c1d

Changed FormDataDef representation string

50b5a2e

Applied PEP-8 guidance

16a896a

Removed useless import

d1c5429

Applied logic of '_calculate_stats()' to '_disaggregate_stats()' in o…

f0ddf7d

…rder to match data with correct field

noliveleger self-assigned this Dec 20, 2018

jnm added a commit that referenced this pull request Feb 4, 2019

Copy @noliveleger's tests from #187, except for

d0a15d8

whitespace and `contextual_name` changes

noliveleger added 2 commits July 8, 2019 10:05

Merge 'master' branch into 151-question-with-different-types

bd25c51

Restored try/except to handle unexpected values as blank when calcula…

785ebda

…ting stats

noliveleger mentioned this pull request Oct 29, 2019

Uses FormPack.FormField new property contextual name when create reports kobotoolbox/kpi#2124

Open

noliveleger assigned jnm and unassigned noliveleger Nov 6, 2019

noliveleger added the bug-fix label Nov 19, 2019

jnm mentioned this pull request Mar 16, 2021

Warn when unnamed groups are uploaded via XLSForm kobotoolbox/kpi#3075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle a question whose type changes but name doesn't. #187

Handle a question whose type changes but name doesn't. #187

noliveleger commented Dec 14, 2018 •

edited

Loading

coveralls commented Dec 14, 2018 •

edited

Loading

jnm Dec 15, 2018

noliveleger Dec 18, 2018 •

edited

Loading

jnm Dec 15, 2018

noliveleger Dec 18, 2018

jnm commented Dec 20, 2018

noliveleger commented Dec 20, 2018

noliveleger commented Dec 20, 2018

joshuaberetta commented Jul 20, 2021

noliveleger commented Jul 29, 2021

		raw_value = entry.get(field.path)
		field_contextual_name = "{}_{}_{}".format(

Handle a question whose type changes but name doesn't. #187

Are you sure you want to change the base?

Handle a question whose type changes but name doesn't. #187

Conversation

noliveleger commented Dec 14, 2018 • edited Loading

coveralls commented Dec 14, 2018 • edited Loading

jnm Dec 15, 2018

Choose a reason for hiding this comment

noliveleger Dec 18, 2018 • edited Loading

Choose a reason for hiding this comment

jnm Dec 15, 2018

Choose a reason for hiding this comment

noliveleger Dec 18, 2018

Choose a reason for hiding this comment

jnm commented Dec 20, 2018

noliveleger commented Dec 20, 2018

noliveleger commented Dec 20, 2018

joshuaberetta commented Jul 20, 2021

noliveleger commented Jul 29, 2021

noliveleger commented Dec 14, 2018 •

edited

Loading

coveralls commented Dec 14, 2018 •

edited

Loading

noliveleger Dec 18, 2018 •

edited

Loading