DO NOT MERGE. Intro serializer lib with json and protobuf. #107

yb01 · 2022-07-27T03:35:14Z

this PR contains changes in two major commits for

a new serializer lib which is pretty much take from the apimachinary lib from k8s
refactor of the types folder to make current pb auto-gen easier
other fixes for using protobuf and ONLY applied to the ResponseFromRRM object for now for 730 experiments

fix in issue #102

use gogo protobuf

use bytes for rv map in transit for protobuf not support struct as map key separate type def separate type def separate type def

…ling data from region manager protobuf auto-gen files and hard-coded to protobuf serializer for pulling data from region manager handle time in protobuf

only wait for empty pulls

resource-management/pkg/common-lib/types/typeDef.go

Sindica · 2022-07-27T16:13:52Z

resource-management/pkg/aggregrator/aggregator.go

@@ -118,6 +112,9 @@ func (a *Aggregator) Run() (err error) {
 					if eventProcess {
 						a.postCRV(c, crv)
 					}
+				} else {


Typically, there will be a comparison between actual data size and batch size. If data size == batch size, no need to wait for subsequent pull. Wait a bit if data size < batch size. Without waiting, even a single event can cause immediate subsequent pull. This seems a bit misaligned with 100ms for empty pull.

good point. will fix.

currently the batch size is not used, so comparing with expected ( batch size ) and the actually got data ( the length ) won't help here for now.

since the goal is to avoid waitless pull()s from the aggregator to avoid busy cpu spins, we will check the durations of pull() and/or the processEvent() and make adjustment of wait time here.

resource-management/pkg/common-lib/types/time.go

resource-management/pkg/common-lib/types/typeDef.go

Sindica · 2022-07-27T17:49:49Z

FYI, this change caused 20% performance lost for distributor_concurrency_test (for 2M events, 200K events seem ok) when metrics is disabled.

q131172019 · 2022-07-28T05:03:00Z

resource-management/test/resourceRegionMgrSimulator/handlers/regionNodeEvents.go


 // NewRegionNodeEvents creates a Region Node Events handler with the given logger
 //
 func NewRegionNodeEventsHander() *RegionNodeEventHandler {
-	return &RegionNodeEventHandler{}
+	return &RegionNodeEventHandler{
+		//	serializer: localJson.NewSerializer("foo", false),


This commented line should be removed, right?

yeah, i should have removed it. will do.

resource-management/pkg/common-lib/types/logicalNode.go

resource-management/pkg/common-lib/types/generated.pb.go

q131172019

Suggest to add an instruction/steps about how to use protobuf to automatically generate files.

Reference: CentaurusInfra/arktos#1281.

Also, before merge into main branch, can this PR be tested in GCE env because too many files are modified by this PR?

yb01 · 2022-07-31T01:41:24Z

some perf comparison for the event queue change in commit fb9b951
-- client side end to end time:

root@ip-172-31-10-115:/work/src/global-resource-service/resource-management# tail cent-main-c.log 
W0731 01:31:33.748972   31700 singleClientTest.go:225] Prolonged watch node from server: 459bc842-ee00-4f9d-aacc-97609d18b840 with time (2.970314753s)
W0731 01:31:33.749024   31700 singleClientTest.go:225] Prolonged watch node from server: 6c9e7ab9-6cfc-4be3-9770-54ec6e54fb71 with time (2.970374009s)
W0731 01:31:33.749048   31700 singleClientTest.go:225] Prolonged watch node from server: e6e457f6-593d-4e89-82d5-4fa34b663a7c with time (2.970332255s)
W0731 01:31:33.749085   31700 singleClientTest.go:225] Prolonged watch node from server: 71393aa6-ade8-44e6-bd51-8aa47a73e4f6 with time (2.970361383s)
I0731 01:34:22.024268   31700 streamwatcher.go:115] Unexpected EOF during watch stream event decoding: unexpected EOF
I0731 01:34:22.024318   31700 singleClientTest.go:184] End of results
I0731 01:34:22.024333   31700 stats.go:28] [Metrics][Register]RegisterClientDuration: 120.607243ms
I0731 01:34:22.024345   31700 stats.go:41] [Metrics][List]ListDuration: 3.480990221s. Number of nodes listed: 25001
I0731 01:34:22.024355   31700 stats.go:60] [Metrics][Watch]Watch session last: 6m50.706639813s. Number of nodes Added :0, Updated: 3362, Deleted: 0. watch prolonged than 1s: 3362
I0731 01:34:22.024608   31700 stats.go:65] [Metrics][Watch] perc50 2.818915196s, perc90 2.925316215s, perc99 2.969276625s. Total count 3362
root@ip-172-31-10-115:/work/src/global-resource-service/resource-management# tail cent-main-event-queue-c.log 
W0731 00:36:35.287190   31090 singleClientTest.go:225] Prolonged watch node from server: b5b58a57-30fc-48a2-9e4f-922e6f1bf927 with time (2.740699344s)
W0731 00:36:35.287232   31090 singleClientTest.go:225] Prolonged watch node from server: 417ff6e8-92a1-408b-88ec-125fb6746650 with time (2.740691147s)
W0731 00:36:35.287247   31090 singleClientTest.go:225] Prolonged watch node from server: 72594f20-670f-46d1-8466-e11c9e8e3b95 with time (2.740718864s)
W0731 00:36:35.287286   31090 singleClientTest.go:225] Prolonged watch node from server: be59e4c5-9e2a-4346-98c7-676dee694fb8 with time (2.740677759s)
I0731 00:39:04.908875   31090 streamwatcher.go:115] Unexpected EOF during watch stream event decoding: unexpected EOF
I0731 00:39:04.908912   31090 singleClientTest.go:184] End of results
I0731 00:39:04.908928   31090 stats.go:28] [Metrics][Register]RegisterClientDuration: 130.604209ms
I0731 00:39:04.908938   31090 stats.go:41] [Metrics][List]ListDuration: 2.888267345s. Number of nodes listed: 25108
I0731 00:39:04.908947   31090 stats.go:60] [Metrics][Watch]Watch session last: 6m40.960074136s. Number of nodes Added :0, Updated: 2513, Deleted: 0. watch prolonged than 1s: 2513
I0731 00:39:04.909115   31090 stats.go:65] [Metrics][Watch] perc50 2.64036751s, perc90 2.736392793s, perc99 2.740376538s. Total count 2513
root@ip-172-31-10-115:/work/src/global-resource-service/resource-management# tail pr107-c.log 
W0731 01:12:35.181080   31407 singleClientTest.go:224] Prolonged watch node from server: 4a7e1c70-887c-4375-8678-66aaa72fd267 with time (2.181077207s)
W0731 01:12:35.181096   31407 singleClientTest.go:224] Prolonged watch node from server: 430cb301-4ce5-4ba9-9f19-8a8928fb7a56 with time (2.18109399s)
W0731 01:12:35.181140   31407 singleClientTest.go:224] Prolonged watch node from server: e2257edb-7944-4f67-85e9-88ad59e2f2ca with time (2.18113716s)
W0731 01:12:35.181166   31407 singleClientTest.go:224] Prolonged watch node from server: 02390311-6041-4e8d-adeb-222b10f97ab2 with time (2.181163647s)
I0731 01:13:10.864430   31407 streamwatcher.go:114] Unexpected EOF during watch stream event decoding: unexpected EOF
I0731 01:13:10.864482   31407 singleClientTest.go:183] End of results
I0731 01:13:10.864498   31407 stats.go:28] [Metrics][Register]RegisterClientDuration: 122.791053ms
I0731 01:13:10.864511   31407 stats.go:41] [Metrics][List]ListDuration: 2.370043357s. Number of nodes listed: 25053
I0731 01:13:10.864521   31407 stats.go:60] [Metrics][Watch]Watch session last: 5m15.717064136s. Number of nodes Added :0, Updated: 2880, Deleted: 0. watch prolonged than 1s: 2880
I0731 01:13:10.864671   31407 stats.go:65] [Metrics][Watch] perc50 2.118337951s, perc90 2.17130846s, perc99 2.180023096s. Total count 2880
root@ip-172-31-10-115:/work/src/global-resource-service/resource-management#

--- server side metrics:

root@ip-172-31-36-170:/work/src/global-resource-service/resource-management# tail cent-main-t.log 
I0731 01:27:31.342370    5204 installer.go:169] Serving watch for client: Client-6efc8aeb-4520-415a-8492-d7528fb54936
I0731 01:27:31.342427    5204 installer.go:189] Start watching distributor for client: Client-6efc8aeb-4520-415a-8492-d7528fb54936
I0731 01:27:31.342471    5204 installer.go:211] Start processing watch event for client: Client-6efc8aeb-4520-415a-8492-d7528fb54936
I0731 01:31:33.160481    5204 aggregator.go:97] Total (25000) region node events are pulled successfully in (10) RPs
I0731 01:31:54.682923    5204 event_metrics.go:105] [Metrics][AGG_RECEIVED] perc50 2.397479655s, perc90 2.410722505s, perc99 2.41372357s. Total count 3362
I0731 01:31:54.682959    5204 event_metrics.go:106] [Metrics][DIS_RECEIVED] perc50 2.40112165s, perc90 2.414844607s, perc99 2.41783481s. Total count 3362
I0731 01:31:54.682966    5204 event_metrics.go:107] [Metrics][DIS_SENDING] perc50 2.416017464s, perc90 2.419745487s, perc99 2.420972926s. Total count 3362
I0731 01:31:54.682971    5204 event_metrics.go:108] [Metrics][DIS_SENT] perc50 2.416026118s, perc90 2.419745713s, perc99 2.420973144s. Total count 3362
I0731 01:31:54.682976    5204 event_metrics.go:109] [Metrics][SER_ENCODED] perc50 2.416831977s, perc90 2.420030218s, perc99 2.421060159s. Total count 3362
I0731 01:31:54.682981    5204 event_metrics.go:110] [Metrics][SER_SENT] perc50 2.416832056s, perc90 2.42003035s, perc99 2.421060264s. Total count 3362           
root@ip-172-31-36-170:/work/src/global-resource-service/resource-management# tail cent-main-event-queue-t.log 
I0731 00:32:23.972636    3652 installer.go:169] Serving watch for client: Client-db1d707d-25cb-40fc-9381-2e51587f49e4
I0731 00:32:23.972687    3652 installer.go:189] Start watching distributor for client: Client-db1d707d-25cb-40fc-9381-2e51587f49e4
I0731 00:32:23.972737    3652 installer.go:211] Start processing watch event for client: Client-db1d707d-25cb-40fc-9381-2e51587f49e4
I0731 00:36:34.801489    3652 aggregator.go:97] Total (25000) region node events are pulled successfully in (10) RPs
I0731 00:36:53.778701    3652 event_metrics.go:105] [Metrics][AGG_RECEIVED] perc50 2.27149438s, perc90 2.284506426s, perc99 2.287312368s. Total count 2513
I0731 00:36:53.778760    3652 event_metrics.go:106] [Metrics][DIS_RECEIVED] perc50 2.274924693s, perc90 2.288500822s, perc99 2.291312045s. Total count 2513
I0731 00:36:53.778769    3652 event_metrics.go:107] [Metrics][DIS_SENDING] perc50 2.286279444s, perc90 2.293075514s, perc99 2.294217557s. Total count 2513
I0731 00:36:53.778776    3652 event_metrics.go:108] [Metrics][DIS_SENT] perc50 2.286287825s, perc90 2.293075714s, perc99 2.294217727s. Total count 2513
I0731 00:36:53.778784    3652 event_metrics.go:109] [Metrics][SER_ENCODED] perc50 2.287111212s, perc90 2.293252343s, perc99 2.294351256s. Total count 2513
I0731 00:36:53.778791    3652 event_metrics.go:110] [Metrics][SER_SENT] perc50 2.287111289s, perc90 2.293252401s, perc99 2.294351381s. Total count 2513
root@ip-172-31-36-170:/work/src/global-resource-service/resource-management# tail pr107-t.log 
I0731 01:07:55.175301    4600 installer.go:168] Serving watch for client: Client-f46a4a18-1155-42ef-b8cc-cbaf2e28a9f8
I0731 01:07:55.175370    4600 installer.go:188] Start watching distributor for client: Client-f46a4a18-1155-42ef-b8cc-cbaf2e28a9f8
I0731 01:07:55.175401    4600 installer.go:210] Start processing watch event for client: Client-f46a4a18-1155-42ef-b8cc-cbaf2e28a9f8
I0731 01:12:34.693864    4600 aggregator.go:92] Total (25000) region node events are pulled successfully in (10) RPs. pull duration 941.800415ms
I0731 01:12:39.487791    4600 event_metrics.go:105] [Metrics][AGG_RECEIVED] perc50 1.691580379s, perc90 1.693398308s, perc99 1.693814964s. Total count 2880
I0731 01:12:39.487834    4600 event_metrics.go:106] [Metrics][DIS_RECEIVED] perc50 1.695657601s, perc90 1.696835842s, perc99 1.697109142s. Total count 2880
I0731 01:12:39.487840    4600 event_metrics.go:107] [Metrics][DIS_SENDING] perc50 1.706647292s, perc90 1.715874898s, perc99 1.717099747s. Total count 2880
I0731 01:12:39.487846    4600 event_metrics.go:108] [Metrics][DIS_SENT] perc50 1.706647377s, perc90 1.715875037s, perc99 1.717127428s. Total count 2880
I0731 01:12:39.487851    4600 event_metrics.go:109] [Metrics][SER_ENCODED] perc50 1.707397986s, perc90 1.71634485s, perc99 1.717605537s. Total count 2880
I0731 01:12:39.487856    4600 event_metrics.go:110] [Metrics][SER_SENT] perc50 1.707398056s, perc90 1.716344969s, perc99 1.717605596s. Total count 2880
root@ip-172-31-36-170:/work/src/global-resource-service/resource-management# mv cent-main-envqueue-change-t.log cent-main-event-queue-t.log
root@ip-172-31-36-170:/work/src/global-resource-service/resource-management#

yb01 · 2022-08-30T18:52:07Z

on hold, should extra perf for the GRS is needed.

Sindica · 2022-09-14T23:01:07Z

resource-management/pkg/distributor/cache/eventqueue.go

@@ -201,7 +202,7 @@ func (eq *NodeEventQueue) getAllEventsSinceResourceVersion(rvs types.InternalRes
 		}
 	}

-	nodeEvents := make([]*types.NodeEvent, 0)
+	nodeEvents := make([]*types.NodeEvent, 0, 1000)


By test, this change does not improve performance.

Sindica · 2022-09-14T23:01:32Z

yb01 · 2022-09-20T17:25:33Z

put this PR on hold.

yb01 added 6 commits July 26, 2022 20:03

serialier lib

7358dbd

use gogo protobuf

refactor to flat type folder for easy protobuf code gen

c9606ae

use bytes for rv map in transit for protobuf not support struct as map key separate type def separate type def separate type def

protobuf auto-gen files and hard-coded to protobuf serializer for pul…

2546484

…ling data from region manager protobuf auto-gen files and hard-coded to protobuf serializer for pulling data from region manager handle time in protobuf

ensure correct length of the events set-- fix panic in auto-gen-ed pb.go

ada6187

only wait for empty pulls

fmt change and remove unused commented code for now

bbb7060

Merge branch 'cent-main' into seriliazer-works-2

17ed647

yb01 requested review from Sindica, q131172019 and sonyafenge and removed request for Sindica, q131172019 and sonyafenge July 27, 2022 03:35

yb01 added 3 commits July 26, 2022 23:05

remove uneeded rvmap filed for ResponseFromRRM struct

8522950

change back to localhost

f1b4b3f

construct protobuf serializer in aggregator object level

6b00652

Sindica reviewed Jul 27, 2022

View reviewed changes

yb01 added 2 commits July 27, 2022 12:29

add missing headers of files

5c8b908

Merge branch 'cent-main' into seriliazer-works-2

7dbcb74

q131172019 reviewed Jul 28, 2022

View reviewed changes

resource-management/pkg/common-lib/types/logicalNode.go Show resolved Hide resolved

q131172019 reviewed Jul 28, 2022

View reviewed changes

resource-management/pkg/common-lib/types/generated.pb.go Show resolved Hide resolved

q131172019 reviewed Jul 28, 2022

View reviewed changes

yb01 added 5 commits July 28, 2022 10:32

use protobuf in redis serialization of logical nodes

11eb067

Merge branch 'cent-main' into seriliazer-works-2

8620ce8

fix merge conflict

6653e59

Merge branch 'cent-main' into seriliazer-works-2

2c61791

eventqueue optimization

fb9b951

yb01 mentioned this pull request Aug 30, 2022

use local serialization lib, part-1 #30

Closed

Sindica reviewed Sep 14, 2022

View reviewed changes

yb01 changed the title ~~Intro serializer lib with json and protobuf.~~ DO NOT MERGE. Intro serializer lib with json and protobuf. Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DO NOT MERGE. Intro serializer lib with json and protobuf. #107

DO NOT MERGE. Intro serializer lib with json and protobuf. #107

yb01 commented Jul 27, 2022

Sindica Jul 27, 2022

yb01 Jul 27, 2022

yb01 Jul 27, 2022

Sindica commented Jul 27, 2022 •

edited

Loading

q131172019 Jul 28, 2022

yb01 Jul 28, 2022

q131172019 left a comment •

edited

Loading

yb01 commented Jul 31, 2022 •

edited

Loading

yb01 commented Aug 30, 2022

Sindica Sep 14, 2022

Sindica commented Sep 14, 2022

yb01 commented Sep 20, 2022

DO NOT MERGE. Intro serializer lib with json and protobuf. #107

Are you sure you want to change the base?

DO NOT MERGE. Intro serializer lib with json and protobuf. #107

Conversation

yb01 commented Jul 27, 2022

Sindica Jul 27, 2022

Choose a reason for hiding this comment

yb01 Jul 27, 2022

Choose a reason for hiding this comment

yb01 Jul 27, 2022

Choose a reason for hiding this comment

Sindica commented Jul 27, 2022 • edited Loading

q131172019 Jul 28, 2022

Choose a reason for hiding this comment

yb01 Jul 28, 2022

Choose a reason for hiding this comment

q131172019 left a comment • edited Loading

Choose a reason for hiding this comment

yb01 commented Jul 31, 2022 • edited Loading

yb01 commented Aug 30, 2022

Sindica Sep 14, 2022

Choose a reason for hiding this comment

Sindica commented Sep 14, 2022

yb01 commented Sep 20, 2022

Sindica commented Jul 27, 2022 •

edited

Loading

q131172019 left a comment •

edited

Loading

yb01 commented Jul 31, 2022 •

edited

Loading