Skip to content

[opt](agg) Optimize the execution of GROUP BY count(*).#61260

Open
Mryange wants to merge 4 commits intoapache:masterfrom
Mryange:opt-inline-agg-count
Open

[opt](agg) Optimize the execution of GROUP BY count(*).#61260
Mryange wants to merge 4 commits intoapache:masterfrom
Mryange:opt-inline-agg-count

Conversation

@Mryange
Copy link
Contributor

@Mryange Mryange commented Mar 12, 2026

What problem does this PR solve?

For SQL like:

SELECT xxx, count(*) FROM table GROUP BY ...

we can apply the following optimization:

In the agg hashmap<key, value>, the value is a char*, which happens to be 64-bit. We can treat this pointer directly as a uint64 counter. This avoids creating an AggState. However, this introduces extra if/else branches in many places where we operate on the hashmap. We plan to refactor this area in the future.

MySQL [hits]> SELECT ClientIP,sum(ClientIP) AS c
    -> FROM hits
    -> GROUP BY ClientIP
    -> ORDER BY c DESC
    -> LIMIT 10;

10 rows in set (0.374 sec)

MySQL [hits]> SELECT ClientIP, COUNT(*) AS c
    -> FROM hits
    -> GROUP BY ClientIP
    -> ORDER BY c DESC
    -> LIMIT 10;

10 rows in set (0.312 sec)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Contributor Author

Mryange commented Mar 12, 2026

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 14.06% (45/320) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.60% (19680/37414)
Line Coverage 36.17% (183664/507793)
Region Coverage 32.27% (141658/438958)
Branch Coverage 33.45% (61861/184919)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.52% (26220/36659)
Line Coverage 54.27% (275168/507045)
Region Coverage 51.21% (227209/443642)
Branch Coverage 52.80% (98083/185763)

@Mryange
Copy link
Contributor Author

Mryange commented Mar 12, 2026

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 27585 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8497d2aa28aad9c1db55c8b24e70401e448c02a3, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17635	4462	4318	4318
q2	q3	10634	819	529	529
q4	4675	362	247	247
q5	7611	1217	1044	1044
q6	188	180	151	151
q7	813	831	671	671
q8	10144	1464	1343	1343
q9	5620	4810	4742	4742
q10	6343	1926	1683	1683
q11	489	268	247	247
q12	737	565	473	473
q13	18037	2686	1955	1955
q14	231	226	222	222
q15	942	808	837	808
q16	737	720	688	688
q17	711	861	435	435
q18	6063	5525	5253	5253
q19	1390	998	589	589
q20	491	490	390	390
q21	4760	1942	1531	1531
q22	400	335	266	266
Total cold run time: 98651 ms
Total hot run time: 27585 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4803	4643	4583	4583
q2	q3	3895	4360	3866	3866
q4	862	1177	780	780
q5	4051	4390	4341	4341
q6	179	178	142	142
q7	1754	1664	1502	1502
q8	2512	2757	2582	2582
q9	7582	7353	7381	7353
q10	3763	4062	3776	3776
q11	527	436	412	412
q12	484	585	432	432
q13	2399	3056	2052	2052
q14	308	314	297	297
q15	847	811	792	792
q16	702	777	723	723
q17	1141	1494	1410	1410
q18	7069	6708	6689	6689
q19	875	851	842	842
q20	2113	2152	2002	2002
q21	3974	3428	3357	3357
q22	494	434	377	377
Total cold run time: 50334 ms
Total hot run time: 48310 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 154734 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8497d2aa28aad9c1db55c8b24e70401e448c02a3, data reload: false

query5	4341	626	499	499
query6	332	227	215	215
query7	4204	465	269	269
query8	351	249	235	235
query9	8701	2771	2735	2735
query10	542	397	334	334
query11	7408	5888	5655	5655
query12	187	128	127	127
query13	1251	442	338	338
query14	6009	3814	3621	3621
query14_1	2792	2791	2797	2791
query15	205	194	176	176
query16	1009	469	457	457
query17	1078	698	598	598
query18	2432	437	337	337
query19	208	215	178	178
query20	142	133	131	131
query21	222	137	120	120
query22	4841	4904	4936	4904
query23	17129	16472	16743	16472
query23_1	16213	16424	16328	16328
query24	7275	1597	1249	1249
query24_1	1254	1196	1242	1196
query25	524	456	412	412
query26	1244	250	149	149
query27	2800	481	293	293
query28	4515	1864	1862	1862
query29	855	560	478	478
query30	329	247	201	201
query31	1360	1284	1235	1235
query32	88	68	71	68
query33	502	325	275	275
query34	935	887	575	575
query35	622	668	600	600
query36	1076	1152	958	958
query37	134	95	82	82
query38	2907	2943	2907	2907
query39	871	872	846	846
query39_1	841	830	822	822
query40	231	154	135	135
query41	65	62	58	58
query42	308	302	297	297
query43	239	254	215	215
query44	
query45	201	189	184	184
query46	871	976	602	602
query47	2139	2156	2046	2046
query48	308	307	229	229
query49	636	460	383	383
query50	676	282	209	209
query51	4121	4122	4094	4094
query52	280	302	287	287
query53	288	333	292	292
query54	300	271	277	271
query55	99	89	80	80
query56	316	323	319	319
query57	1386	1346	1304	1304
query58	294	279	282	279
query59	1313	1491	1271	1271
query60	333	340	334	334
query61	149	152	145	145
query62	641	577	552	552
query63	312	276	276	276
query64	4988	1326	1120	1120
query65	
query66	1483	485	373	373
query67	16228	16347	16224	16224
query68	
query69	405	320	296	296
query70	1009	954	986	954
query71	333	314	302	302
query72	2997	2840	2692	2692
query73	543	554	322	322
query74	9969	9967	9771	9771
query75	2822	2761	2438	2438
query76	2272	1029	659	659
query77	370	384	303	303
query78	11198	11318	10655	10655
query79	3075	807	606	606
query80	1729	611	532	532
query81	586	287	244	244
query82	977	150	119	119
query83	329	277	239	239
query84	258	117	103	103
query85	894	491	451	451
query86	497	313	303	303
query87	3124	3118	2993	2993
query88	3510	2664	2649	2649
query89	419	375	347	347
query90	2045	177	166	166
query91	164	155	136	136
query92	84	75	71	71
query93	1539	830	495	495
query94	637	353	306	306
query95	596	406	318	318
query96	640	513	230	230
query97	2489	2499	2472	2472
query98	242	228	222	222
query99	969	995	914	914
Total cold run time: 237783 ms
Total hot run time: 154734 ms

@Mryange Mryange force-pushed the opt-inline-agg-count branch from 8497d2a to ce542e0 Compare March 15, 2026 13:40
@Mryange
Copy link
Contributor Author

Mryange commented Mar 15, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26408 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ce542e06a38f506a34a230ada69fb06d88cca94e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17678	4526	4288	4288
q2	q3	10632	847	523	523
q4	4678	358	251	251
q5	7551	1203	1014	1014
q6	174	174	146	146
q7	775	846	670	670
q8	9305	1469	1308	1308
q9	4888	4807	4698	4698
q10	6255	1920	1660	1660
q11	459	259	242	242
q12	703	575	479	479
q13	18032	2738	1923	1923
q14	228	243	214	214
q15	q16	740	725	668	668
q17	708	851	466	466
q18	6041	5468	5183	5183
q19	1149	993	623	623
q20	538	486	379	379
q21	4379	1824	1387	1387
q22	333	286	319	286
Total cold run time: 95246 ms
Total hot run time: 26408 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4887	4633	4588	4588
q2	q3	3867	4339	3818	3818
q4	875	1184	774	774
q5	4090	4397	4321	4321
q6	176	173	140	140
q7	1772	1646	1521	1521
q8	2491	2722	2593	2593
q9	7632	7303	7509	7303
q10	3768	4149	3602	3602
q11	532	439	437	437
q12	498	575	452	452
q13	2461	2818	1970	1970
q14	285	288	278	278
q15	q16	719	840	701	701
q17	1158	1291	1594	1291
q18	7070	6817	6541	6541
q19	865	866	858	858
q20	2089	2129	1974	1974
q21	3998	3448	3427	3427
q22	441	431	395	395
Total cold run time: 49674 ms
Total hot run time: 46984 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169263 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ce542e06a38f506a34a230ada69fb06d88cca94e, data reload: false

query5	4341	644	505	505
query6	339	227	205	205
query7	4224	468	264	264
query8	331	247	224	224
query9	8690	2708	2704	2704
query10	494	395	349	349
query11	6982	5089	4860	4860
query12	185	137	126	126
query13	1262	448	338	338
query14	5762	3677	3449	3449
query14_1	2863	2781	2804	2781
query15	207	197	174	174
query16	981	477	449	449
query17	908	733	611	611
query18	2454	455	352	352
query19	214	209	182	182
query20	132	133	134	133
query21	216	144	113	113
query22	13236	13968	14803	13968
query23	16817	16205	16147	16147
query23_1	16081	16312	16020	16020
query24	7694	1665	1256	1256
query24_1	1316	1330	1256	1256
query25	563	492	437	437
query26	1292	273	163	163
query27	3562	507	334	334
query28	4608	1854	1848	1848
query29	849	545	459	459
query30	297	223	189	189
query31	1014	943	878	878
query32	80	67	73	67
query33	498	332	277	277
query34	885	860	507	507
query35	627	677	594	594
query36	1087	1165	1028	1028
query37	131	91	82	82
query38	2900	2903	2913	2903
query39	857	818	813	813
query39_1	789	777	800	777
query40	231	154	133	133
query41	61	58	57	57
query42	255	263	253	253
query43	250	247	217	217
query44	
query45	200	187	182	182
query46	866	981	609	609
query47	2120	2194	2066	2066
query48	307	303	233	233
query49	634	456	380	380
query50	684	267	211	211
query51	4080	4032	4095	4032
query52	267	262	256	256
query53	295	335	285	285
query54	298	284	258	258
query55	87	85	84	84
query56	301	310	314	310
query57	1949	1848	1734	1734
query58	284	269	271	269
query59	2810	2921	2733	2733
query60	343	347	329	329
query61	147	143	145	143
query62	634	593	527	527
query63	305	274	280	274
query64	4973	1273	1009	1009
query65	
query66	1461	451	369	369
query67	24252	24322	24185	24185
query68	
query69	410	297	278	278
query70	965	953	964	953
query71	339	305	307	305
query72	2769	2649	2407	2407
query73	531	536	319	319
query74	9652	9542	9390	9390
query75	2852	2749	2425	2425
query76	2263	1020	654	654
query77	369	371	308	308
query78	10860	11079	10497	10497
query79	1105	844	556	556
query80	947	603	540	540
query81	521	262	226	226
query82	1373	148	118	118
query83	367	262	240	240
query84	296	111	103	103
query85	872	509	492	492
query86	418	341	304	304
query87	3140	3133	3072	3072
query88	3548	2671	2664	2664
query89	428	378	349	349
query90	1851	181	176	176
query91	186	181	154	154
query92	81	75	75	75
query93	887	846	515	515
query94	513	332	302	302
query95	615	411	326	326
query96	646	514	229	229
query97	2453	2487	2388	2388
query98	254	225	220	220
query99	1024	989	916	916
Total cold run time: 250708 ms
Total hot run time: 169263 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.61% (19709/37465)
Line Coverage 36.20% (184230/508881)
Region Coverage 32.30% (141959/439482)
Branch Coverage 33.53% (62121/185273)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.55% (26246/36684)
Line Coverage 54.41% (276040/507324)
Region Coverage 51.61% (228940/443619)
Branch Coverage 53.07% (98621/185835)

@Mryange Mryange force-pushed the opt-inline-agg-count branch from ce542e0 to 5ce94ed Compare March 18, 2026 11:05
@Mryange
Copy link
Contributor Author

Mryange commented Mar 18, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 27072 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7fa5b91bec7738488ef5d28594ffe843bb784abd, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17591	4499	4326	4326
q2	q3	10635	822	544	544
q4	4691	359	252	252
q5	7634	1231	1012	1012
q6	185	172	148	148
q7	812	849	694	694
q8	10471	1511	1450	1450
q9	5487	4898	4770	4770
q10	6300	1938	1684	1684
q11	464	263	265	263
q12	738	580	479	479
q13	18027	2711	1930	1930
q14	236	235	207	207
q15	q16	736	710	668	668
q17	746	896	442	442
q18	5981	5463	5247	5247
q19	1123	1008	666	666
q20	566	484	372	372
q21	4557	1999	1640	1640
q22	388	342	278	278
Total cold run time: 97368 ms
Total hot run time: 27072 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4747	4629	4655	4629
q2	q3	4001	4507	3848	3848
q4	1025	1231	779	779
q5	4076	4433	4384	4384
q6	189	177	141	141
q7	1772	1716	1509	1509
q8	2565	2789	2649	2649
q9	7758	7378	7518	7378
q10	3760	4027	3659	3659
q11	507	429	422	422
q12	510	601	461	461
q13	2480	3040	2059	2059
q14	293	315	294	294
q15	q16	779	775	735	735
q17	1204	1392	1505	1392
q18	7340	7040	6710	6710
q19	1006	975	986	975
q20	2143	2176	2010	2010
q21	4060	3540	3431	3431
q22	454	417	385	385
Total cold run time: 50669 ms
Total hot run time: 47850 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169621 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7fa5b91bec7738488ef5d28594ffe843bb784abd, data reload: false

query5	4317	634	519	519
query6	332	224	207	207
query7	4209	490	272	272
query8	354	256	232	232
query9	8713	2708	2700	2700
query10	531	432	348	348
query11	6947	5125	4917	4917
query12	188	124	127	124
query13	1287	469	347	347
query14	5741	3758	3452	3452
query14_1	2910	2851	2844	2844
query15	203	193	186	186
query16	984	478	466	466
query17	898	751	632	632
query18	2453	461	368	368
query19	214	213	188	188
query20	165	130	122	122
query21	211	140	104	104
query22	13311	13938	14541	13938
query23	16957	16385	16101	16101
query23_1	16142	16374	15979	15979
query24	7154	1608	1213	1213
query24_1	1231	1209	1238	1209
query25	536	465	420	420
query26	1245	256	151	151
query27	2786	482	291	291
query28	4485	1843	1829	1829
query29	816	564	477	477
query30	339	223	188	188
query31	1015	940	873	873
query32	83	69	74	69
query33	524	342	282	282
query34	881	866	517	517
query35	662	691	598	598
query36	1072	1123	907	907
query37	142	97	83	83
query38	2888	2972	2865	2865
query39	876	823	827	823
query39_1	804	787	808	787
query40	230	151	139	139
query41	63	58	61	58
query42	270	257	262	257
query43	243	253	225	225
query44	
query45	200	185	185	185
query46	874	978	625	625
query47	2104	2121	2046	2046
query48	297	308	240	240
query49	637	459	385	385
query50	681	276	212	212
query51	4128	4166	4035	4035
query52	263	271	256	256
query53	291	343	296	296
query54	310	275	266	266
query55	97	92	90	90
query56	312	333	318	318
query57	1902	1874	1616	1616
query58	282	277	271	271
query59	2814	2982	2791	2791
query60	344	332	331	331
query61	164	154	149	149
query62	635	592	543	543
query63	310	291	274	274
query64	5016	1290	970	970
query65	
query66	1455	469	363	363
query67	24340	24351	24252	24252
query68	
query69	409	304	297	297
query70	1004	974	942	942
query71	340	313	307	307
query72	2801	2680	2646	2646
query73	551	548	337	337
query74	9639	9598	9403	9403
query75	2897	2786	2481	2481
query76	2291	1046	693	693
query77	387	395	328	328
query78	11074	11162	10503	10503
query79	1120	766	578	578
query80	1315	651	591	591
query81	559	263	241	241
query82	1044	157	125	125
query83	351	273	253	253
query84	301	135	109	109
query85	966	563	450	450
query86	413	335	298	298
query87	3179	3145	2996	2996
query88	3583	2675	2669	2669
query89	427	366	360	360
query90	2014	193	181	181
query91	168	164	135	135
query92	82	77	72	72
query93	995	836	493	493
query94	646	305	305	305
query95	608	400	324	324
query96	648	510	228	228
query97	2446	2452	2399	2399
query98	256	229	221	221
query99	1003	992	922	922
Total cold run time: 249691 ms
Total hot run time: 169621 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.69% (19787/37551)
Line Coverage 36.22% (184874/510426)
Region Coverage 32.45% (142927/440518)
Branch Coverage 33.65% (62548/185880)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.72% (26370/36767)
Line Coverage 54.55% (277605/508868)
Region Coverage 51.85% (230548/444652)
Branch Coverage 53.26% (99306/186442)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants