Replace materialized views with incremental roll ups by ramyaragupathy · Pull Request #45 · AmericanRedCross/osm-stats-workers

ramyaragupathy · 2019-07-03T13:18:09Z

From AmericanRedCross/osm-stats#56, started with replacing the materialized views approach. Here's what has been done so far:

Use tables instead of views in sql scripts. Initially defined columns for table without populating data, but later just used the original select command to populate the table with data during the first housekeeping run
Replaced refresh queries with upsert queries in js file. There was a conflict in running the if condition for josm editor count. For now using the filled up with josm with dummy values.

On running housekeeping script with this modification I could see new updates in hashtag_stats
Next Actions:

Talk through the approach w/ @mojodna
Identify and fix query related issues in src/houseekeeping.js

cc @dakotabenjamin @smit1678

mojodna · 2019-07-03T17:28:41Z

sql/changesets.sql

@@ -1,5 +1,4 @@
-- view with a schema that matches the legacy changesets table
-CREATE VIEW changesets AS
+CREATE TABLE changesets AS


Why this change? As-is, this will duplicate data in raw_changesets.

The historical reason for raw_changesets (table) vs changesets (view) was that I needed new table names for replacement data when I was updating v1. The API (etc.) assume the existence of relations (tables or views) without the raw_ prefix, so creating views was the easiest way to preserve compatibility.

raw_* should probably go away in favor of tables without the prefix.

Are you planning on updating the parts of this project that write raw changeset stats to write to these tables (retiring raw_changesets)? If so, the API project may also need to be updated to refer to the correct tables.

mojodna · 2019-07-03T17:31:37Z

src/housekeeping.js

+    JOIN raw_changesets c ON c.id = ch.changeset_id
+    JOIN raw_hashtags h ON h.id = ch.hashtag_id
+    GROUP BY hashtag
+    ON CONFLICT do nothing;`


I see where you're going with this..nice.

The key to the incremental updates is the ON CONFLICT clause as well as tracking which augmented diffs / changesets have already had their stats contributed.

mojodna · 2019-07-03T17:33:58Z

sql/changesets.sql

-- view with a schema that matches the legacy changesets table
-CREATE VIEW changesets AS
+CREATE TABLE changesets AS
  SELECT


Rather than SELECTing (and subsequently truncating, if all you wanted was the structure), CREATE TABLE changesets LIKE raw_changesets INCLUDING ALL will replicate the structure and indices without any of the data.

(Though in this case, some of the column names changed, so yeah... ;-)

mojodna · 2019-08-07T19:49:01Z

sql/changesets.sql

@@ -1,5 +1,4 @@
-- view with a schema that matches the legacy changesets table
-CREATE VIEW changesets AS
+CREATE TABLE changesets AS


Are you planning on updating the parts of this project that write raw changeset stats to write to these tables (retiring raw_changesets)? If so, the API project may also need to be updated to refer to the correct tables.

mojodna · 2019-08-07T19:54:49Z

src/housekeeping.js

+    JOIN raw_changesets c ON c.id = ch.changeset_id
+    JOIN raw_hashtags h ON h.id = ch.hashtag_id
+    GROUP BY hashtag
+    ON CONFLICT do update;`


This looks incomplete.

The other aggregated tables also need to be updated incrementally.

More importantly, this doesn't actually update incrementally; it's effectively the same as what REFRESH MATERIALIZED VIEW does, rewriting each of the rows in the aggregated table (it'll take just as long on a fully-populated table). Instead, it should detect which changesets rows changed since the last run and INSERT new aggregated values / UPDATE existing aggregated values by adding to them / updating counts / updating max values.

ramyaragupathy added 3 commits July 3, 2019 14:56

Replace materialized views with roll up tables

46b3da6

Refresh -> Insert

de02f69

Handle conflict

c63119f

mojodna reviewed Jul 3, 2019

View reviewed changes

On conflict action

c0a00b7

ramyaragupathy changed the title ~~[WIP] Replace materialized views with incremental roll ups~~ Replace materialized views with incremental roll ups Aug 6, 2019

mojodna suggested changes Aug 7, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace materialized views with incremental roll ups#45

Replace materialized views with incremental roll ups#45
ramyaragupathy wants to merge 4 commits intoAmericanRedCross:masterfrom
ramyaragupathy:fix/replace-mat-views

ramyaragupathy commented Jul 3, 2019

Uh oh!

mojodna Jul 3, 2019

Uh oh!

mojodna Aug 7, 2019

Uh oh!

mojodna Jul 3, 2019

Uh oh!

mojodna Jul 3, 2019

Uh oh!

mojodna Jul 3, 2019

Uh oh!

mojodna Aug 7, 2019

Uh oh!

mojodna Aug 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ramyaragupathy commented Jul 3, 2019

Uh oh!

mojodna Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Aug 7, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Jul 3, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Aug 7, 2019

Choose a reason for hiding this comment

Uh oh!

mojodna Aug 7, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants