Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Commit

Permalink
refactor!: Remove block relationships
Browse files Browse the repository at this point in the history
We've found that putting section/subsection/unit in
the blocks themselves to be a better solution than
computing this downstream.
  • Loading branch information
bmtcril committed Jan 2, 2024
1 parent 8156417 commit cc9d074
Show file tree
Hide file tree
Showing 5 changed files with 9 additions and 165 deletions.
43 changes: 4 additions & 39 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
Event Sink ClickHouse
#####################

|pypi-badge| |ci-badge| |codecov-badge| |doc-badge| |pyversions-badge|
|license-badge| |status-badge|

Purpose
*******

This project acts as a plugin to the `Edx Platform`_, listens for
configured `Open edX events`_, and sends them to a `ClickHouse`_ database for
analytics or other processing. This is being maintained as part of the Open
Analytics Reference System (`OARS`_) project.
analytics or other processing. This is being maintained as part of the
`Aspects`_ project.

OARS consumes the data sent to ClickHouse by this plugin as part of data
enrichment for reporting, or capturing data that otherwise does not fit in
Expand All @@ -21,9 +18,7 @@ Sinks

Currently the only sink is in the CMS. It listens for the ``COURSE_PUBLISHED``
signal and serializes a subset of the published course blocks into one table
and the relationships between blocks into another table. With those we are
able to recreate the "graph" of the course and get relevant data, such as
block names, for reporting.
in ClickHouse.

Commands
********
Expand All @@ -44,7 +39,7 @@ Please see the command help for details:
.. _Open edX events: https://github.com/openedx/openedx-events
.. _Edx Platform: https://github.com/openedx/edx-platform
.. _ClickHouse: https://clickhouse.com
.. _OARS: https://docs.openedx.org/projects/openedx-oars/en/latest/index.html
.. _Aspects: https://docs.openedx.org/projects/openedx-aspects/en/latest/index.html

Getting Started
***************
Expand Down Expand Up @@ -212,33 +207,3 @@ Reporting Security Issues
*************************

Please do not report security issues in public. Please email security@openedx.org.

.. |pypi-badge| image:: https://img.shields.io/pypi/v/openedx-event-sink-clickhouse.svg
:target: https://pypi.python.org/pypi/openedx-event-sink-clickhouse/
:alt: PyPI

.. |ci-badge| image:: https://github.com/openedx/openedx-event-sink-clickhouse/workflows/Python%20CI/badge.svg?branch=main
:target: https://github.com/openedx/openedx-event-sink-clickhouse/actions
:alt: CI

.. |codecov-badge| image:: https://codecov.io/github/openedx/openedx-event-sink-clickhouse/coverage.svg?branch=main
:target: https://codecov.io/github/openedx/openedx-event-sink-clickhouse?branch=main
:alt: Codecov

.. |doc-badge| image:: https://readthedocs.org/projects/openedx-event-sink-clickhouse/badge/?version=latest
:target: https://openedx-event-sink-clickhouse.readthedocs.io/en/latest/
:alt: Documentation

.. |pyversions-badge| image:: https://img.shields.io/pypi/pyversions/openedx-event-sink-clickhouse.svg
:target: https://pypi.python.org/pypi/openedx-event-sink-clickhouse/
:alt: Supported Python versions

.. |license-badge| image:: https://img.shields.io/github/license/openedx/openedx-event-sink-clickhouse.svg
:target: https://github.com/openedx/openedx-event-sink-clickhouse/blob/main/LICENSE.txt
:alt: License

.. TODO: Choose one of the statuses below and remove the other status-badge lines.
.. |status-badge| image:: https://img.shields.io/badge/Status-Experimental-yellow
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Maintained-brightgreen
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Deprecated-orange
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Unsupported-red
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def dump_target_courses_to_clickhouse(

class Command(BaseCommand):
"""
Dump course block and relationship data to a ClickHouse instance.
Dump course block data to a ClickHouse instance.
"""

help = dedent(__doc__).strip()
Expand Down
63 changes: 2 additions & 61 deletions event_sink_clickhouse/sinks/course_published.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Does the following:
- Pulls the course structure from modulestore
- Serialize the xblocks and their parent/child relationships
- Serialize the xblocks
- Sends them to ClickHouse in CSV format
Note that the serialization format does not include all fields as there may be things like
Expand All @@ -25,27 +25,6 @@
}


class XBlockRelationshipSink(ModelBaseSink):
"""
Sink for XBlock relationships
"""

clickhouse_table_name = "course_relationships"
name = "XBlock Relationships"
timestamp_field = "time_last_dumped"
unique_key = "parent_location"

def dump_related(self, serialized_item, dump_id, time_last_dumped):
self.dump(
serialized_item,
many=True,
initial={"dump_id": dump_id, "time_last_dumped": time_last_dumped},
)

def serialize_item(self, item, many=False, initial=None):
return item


class XBlockSink(ModelBaseSink):
"""
Sink for XBlock model
Expand Down Expand Up @@ -112,45 +91,7 @@ def serialize_item(self, item, many=False, initial=None):
XBlockSink.strip_branch_and_version(block.location)
] = fields

nodes = list(location_to_node.values())

self.serialize_relationships(
items,
location_to_node,
course_key,
initial["dump_id"],
initial["time_last_dumped"],
)

return nodes

def serialize_relationships(
self, items, location_to_node, course_id, dump_id, dump_timestamp
):
"""Serialize the relationships between XBlocks"""
relationships = []
for item in items:
for index, child in enumerate(item.get_children()):
parent_node = location_to_node.get(
XBlockSink.strip_branch_and_version(item.location)
)
child_node = location_to_node.get(
XBlockSink.strip_branch_and_version(child.location)
)

if parent_node is not None and child_node is not None: # pragma: no cover
relationship = {
"course_key": str(course_id),
"parent_location": str(parent_node["location"]),
"child_location": str(child_node["location"]),
"order": index,
"dump_id": dump_id,
"time_last_dumped": dump_timestamp,
}
relationships.append(relationship)
XBlockRelationshipSink(self.connection_overrides, self.log).dump_related(
relationships, dump_id, dump_timestamp
)
return list(location_to_node.values())

def serialize_xblock(
self, item, index, detached_xblock_types, dump_id, time_last_dumped
Expand Down
52 changes: 1 addition & 51 deletions test_utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@
from opaque_keys.edx.keys import CourseKey
from opaque_keys.edx.locator import BlockUsageLocator

from event_sink_clickhouse.sinks.course_published import XBlockSink

ORIG_IMPORT = __import__
ORG = "testorg"
COURSE = "testcourse"
Expand Down Expand Up @@ -193,13 +191,8 @@ def get_clickhouse_http_params():
"input_format_allow_errors_ratio": 0.1,
"query": "INSERT INTO cool_data.course_blocks FORMAT CSV"
}
relationships_params = {
"input_format_allow_errors_num": 1,
"input_format_allow_errors_ratio": 0.1,
"query": "INSERT INTO cool_data.course_relationships FORMAT CSV"
}

return overview_params, blocks_params, relationships_params
return overview_params, blocks_params


def course_factory():
Expand Down Expand Up @@ -342,46 +335,3 @@ def match(request):
return False, f"Mismatch in row {i}: {e}"
return True, ""
return match


def check_relationship_csv_matcher(course):
"""
Match the relationship CSV against the test course.
This is a matcher for the "responses" library. It returns a function
that actually does the matching.
"""
# Build our own copy of the test relationships first
relationships = []
for block in course:
course_key = str(block.location.course_key)
for _, child in enumerate(block.get_children()):
parent_node = str(XBlockSink.strip_branch_and_version(block.location))
child_node = str(XBlockSink.strip_branch_and_version(child.location))
relationships.append((course_key, parent_node, child_node))

def match(request):
body = request.body.decode("utf-8")
lines = body.split("\n")[:-1]

# The relationships CSV should have the same number of relationships as our test
if len(lines) != len(relationships):
return False, f"Body has {len(lines)} lines but there are {len(relationships)} relationships"

f = StringIO(body)
reader = csv.reader(f)

i = 0
try:
# The CSV should be in the same order as our relationships, make sure
# everything matches
for row in reader:
relation = relationships[i]
assert row[0] == relation[0]
assert row[1] == relation[1]
assert row[2] == relation[2]
i += 1
except AssertionError as e:
return False, f"Mismatch in row {i}: {e}"
return True, ""
return match
14 changes: 1 addition & 13 deletions tests/test_course_published.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from test_utils.helpers import (
check_block_csv_matcher,
check_overview_csv_matcher,
check_relationship_csv_matcher,
course_factory,
course_str_factory,
fake_course_overview_factory,
Expand Down Expand Up @@ -54,7 +53,7 @@ def test_course_publish_success(mock_modulestore, mock_detached, mock_overview,
# Use the responses library to catch the POSTs to ClickHouse
# and match them against the expected values, including CSV
# content
course_overview_params, blocks_params, relationships_params = get_clickhouse_http_params()
course_overview_params, blocks_params = get_clickhouse_http_params()

responses.post(
"https://foo.bar/",
Expand All @@ -63,13 +62,6 @@ def test_course_publish_success(mock_modulestore, mock_detached, mock_overview,
check_overview_csv_matcher(course_overview)
],
)
responses.post(
"https://foo.bar/",
match=[
matchers.query_param_matcher(relationships_params),
check_relationship_csv_matcher(course)
],
)
responses.post(
"https://foo.bar/",
match=[
Expand Down Expand Up @@ -232,8 +224,6 @@ def test_xblock_tree_structure(mock_modulestore, mock_detached):
fake_serialized_course_overview = fake_serialize_fake_course_overview(course_overview)
sink = XBlockSink(connection_overrides={}, log=MagicMock())

# Remove the relationships sink, we're just checking the structure here.
sink.serialize_relationships = MagicMock()
initial_data = {"dump_id": "xyz", "time_last_dumped": "2023-09-05"}
results = sink.serialize_item(fake_serialized_course_overview, initial=initial_data)

Expand Down Expand Up @@ -289,8 +279,6 @@ def test_xblock_graded_completable_mode(mock_modulestore, mock_detached):
fake_serialized_course_overview = fake_serialize_fake_course_overview(course_overview)
sink = XBlockSink(connection_overrides={}, log=MagicMock())

# Remove the relationships sink, we're just checking the structure here.
sink.serialize_relationships = MagicMock()
initial_data = {"dump_id": "xyz", "time_last_dumped": "2023-09-05"}
results = sink.serialize_item(fake_serialized_course_overview, initial=initial_data)

Expand Down

0 comments on commit cc9d074

Please sign in to comment.