Skip to content

[fix](search) Fix slash character in search query_string terms#61599

Open
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:fix-search-slash-in-term
Open

[fix](search) Fix slash character in search query_string terms#61599
airborne12 wants to merge 1 commit intoapache:masterfrom
airborne12:fix-search-slash-in-term

Conversation

@airborne12
Copy link
Member

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

The ANTLR lexer in the search() DSL parser excluded / from TERM_CHAR, causing terms like AC/DC to be incorrectly tokenized. The slash was silently skipped by ANTLR's default error recovery, splitting AC/DC into two separate terms AC and DC instead of treating it as a single term.

This caused inconsistent behavior compared to Elasticsearch's query_string parsing, where AC\/DC (escaped slash) is handled as a single analyzed term.

Fix: Add / to the TERM_CHAR fragment in SearchLexer.g4. This allows / to appear within terms (e.g., AC/DC -> single term) while regex patterns like /[a-z]+/ still work correctly since / remains excluded from TERM_START_CHAR.

Release note

Fix search() function incorrectly handling slash (/) character within query terms (e.g., AC/DC). The slash is now treated as a regular character within terms instead of being silently dropped.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 22, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

The ANTLR lexer excluded '/' from TERM_CHAR, causing terms like
'AC/DC' to be incorrectly tokenized — the slash was silently skipped,
splitting it into two separate terms. Add '/' to TERM_CHAR so it can
appear within terms while regex patterns (/pattern/) still work since
'/' remains excluded from TERM_START_CHAR.
@airborne12 airborne12 force-pushed the fix-search-slash-in-term branch from 1e7f9c4 to 67a7155 Compare March 22, 2026 14:08
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26636 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 67a7155dcc6c4fa81cbe436d4fff683d45c9d9ae, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17632	4484	4279	4279
q2	q3	10637	773	517	517
q4	4677	351	249	249
q5	7556	1197	1027	1027
q6	180	172	147	147
q7	773	853	664	664
q8	9299	1447	1308	1308
q9	4829	4766	4612	4612
q10	6320	1911	1657	1657
q11	480	252	245	245
q12	745	595	467	467
q13	18028	2910	2196	2196
q14	228	236	209	209
q15	q16	742	759	671	671
q17	737	841	428	428
q18	5834	5419	5270	5270
q19	1130	981	620	620
q20	541	492	370	370
q21	4445	1801	1403	1403
q22	345	297	428	297
Total cold run time: 95158 ms
Total hot run time: 26636 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4812	4589	4550	4550
q2	q3	3983	4403	3831	3831
q4	880	1247	808	808
q5	4112	4523	4420	4420
q6	186	171	143	143
q7	1729	1597	1523	1523
q8	2493	2719	2599	2599
q9	7511	7353	7259	7259
q10	3726	4035	3709	3709
q11	525	426	420	420
q12	490	590	453	453
q13	2919	3233	2340	2340
q14	303	304	281	281
q15	q16	728	786	732	732
q17	1223	1319	1376	1319
q18	7207	6816	6615	6615
q19	872	906	977	906
q20	2055	2181	1993	1993
q21	3897	3480	3287	3287
q22	456	425	366	366
Total cold run time: 50107 ms
Total hot run time: 47554 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168909 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 67a7155dcc6c4fa81cbe436d4fff683d45c9d9ae, data reload: false

query5	4317	621	507	507
query6	348	229	214	214
query7	4207	460	262	262
query8	348	250	234	234
query9	8714	2693	2686	2686
query10	541	390	326	326
query11	7015	5106	4869	4869
query12	191	130	124	124
query13	1271	464	356	356
query14	5727	3696	3511	3511
query14_1	2878	2853	2834	2834
query15	204	192	175	175
query16	986	466	447	447
query17	1129	750	631	631
query18	2450	453	351	351
query19	216	215	191	191
query20	135	126	132	126
query21	217	135	111	111
query22	13251	14172	14473	14172
query23	16149	15844	15663	15663
query23_1	15872	15804	15798	15798
query24	7532	1618	1227	1227
query24_1	1227	1210	1210	1210
query25	545	460	440	440
query26	1228	261	147	147
query27	2757	480	291	291
query28	4470	1820	1833	1820
query29	810	551	480	480
query30	299	222	192	192
query31	999	958	869	869
query32	78	71	72	71
query33	510	341	304	304
query34	873	870	531	531
query35	659	677	600	600
query36	1123	1179	1028	1028
query37	144	105	86	86
query38	2942	2937	2823	2823
query39	849	826	805	805
query39_1	803	788	784	784
query40	232	151	138	138
query41	61	60	59	59
query42	260	255	263	255
query43	230	246	219	219
query44	
query45	198	184	189	184
query46	890	971	622	622
query47	3217	2125	2090	2090
query48	306	309	225	225
query49	625	459	390	390
query50	675	276	208	208
query51	4039	4014	4018	4014
query52	261	270	251	251
query53	292	336	277	277
query54	294	265	268	265
query55	94	88	86	86
query56	312	328	325	325
query57	1972	1814	1820	1814
query58	288	278	264	264
query59	2833	2952	2727	2727
query60	344	344	322	322
query61	156	157	169	157
query62	632	580	548	548
query63	312	278	273	273
query64	5058	1254	983	983
query65	
query66	1443	461	357	357
query67	24178	24479	24339	24339
query68	
query69	418	318	290	290
query70	962	1011	937	937
query71	331	313	308	308
query72	2777	2673	2423	2423
query73	543	537	321	321
query74	9585	9520	9383	9383
query75	2850	2748	2434	2434
query76	2293	1022	707	707
query77	362	363	301	301
query78	10887	11016	10444	10444
query79	2626	766	583	583
query80	1738	635	539	539
query81	550	256	228	228
query82	1011	154	116	116
query83	331	259	238	238
query84	255	114	98	98
query85	915	577	525	525
query86	424	338	296	296
query87	3126	3102	2984	2984
query88	3523	2654	2649	2649
query89	423	368	347	347
query90	2009	177	174	174
query91	167	160	141	141
query92	72	73	72	72
query93	1174	830	495	495
query94	647	329	292	292
query95	590	340	373	340
query96	657	511	232	232
query97	2419	2491	2368	2368
query98	251	222	223	222
query99	1013	998	920	920
Total cold run time: 252212 ms
Total hot run time: 168909 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants