exasol · PeggySchmidtMittenzwei · Jun 30, 2025 · Jun 30, 2025 · Jun 30, 2025 · Jun 30, 2025
diff --git a/Database-Features/zone-map-usage-optimizer-question.md b/Database-Features/zone-map-usage-optimizer-question.md
@@ -0,0 +1,201 @@
+# Why did engine decide not to use the zone map?
+
+* even though zone map is forced and
+* the WHERE clause can be used to prune data?
+
+## Question
+
+Even if you force zone map usage and our WHERE clause is prunable, engine might still avoid using the zone map:
+
+We have given the following DDL:
+
+```sql
+-- data model preparation
+CREATE SCHEMA TEST;
+
+CREATE OR REPLACE TABLE TEST.T1  (time_zones timestamp, sid int);
+ALTER TABLE TEST.T1  DISTRIBUTE BY sid;
+```
+
+We insert data as follows:
+
+```sql
+INSERT INTO 
+    TEST.T1
+SELECT 
+      ADD_DAYS(TIMESTAMP '2025-06-30 00:00:00', - round(RANGE_VALUE mod 4*365,0)) as mydate
+    , RANGE_VALUE
+FROM
+    VALUES BETWEEN 1 AND 1000000*200*NPROC()+1;
+```
+
+We also include the following data to better explain the engine’s decisions:
+
+```sql
+INSERT INTO TEST.T1 VALUES
+('2024-01-01 00:00:00',105 ),                         
+('2024-01-01 00:00:00',20717),
+('2024-01-01 00:00:00',5),
+('2025-04-18 00:00:00',111),
+('2025-06-02 00:00:00',5),
+('2025-06-03 00:00:00',105),
+('2025-06-04 00:00:00',20717),
+('2025-06-05 00:00:00',5);
+```
+
+Next we add zone map enforcement:
+
+```sql
+ENFORCE ZONEMAP ON TEST.T1 (time_zones);
+```
+
+and add an index to the SID, which in real use cases can be created by join queries:
+
+```sql
+ENFORCE LOCAL INDEX ON  TEST.T1 (sid);
+```
+
+Let's run the following statement to see if the zonemap was used.
+
+```sql
+SELECT DISTINCT sid
+FROM TEST.T1
+WHERE time_zones between '2021-06-01 00:00:00' and '2025-06-06 00:00:00';
+```
+
+### Profiling of the execution plan without zone map usage
+
+|PART_ID|PART_NAME|PART_INFO|OBJECT_SCHEMA|OBJECT_NAME|REMARKS|DURATION|
+|---|---|---|---|---|---|---|
+|1|COMPILE / EXECUTE| - | - | - | - |0.203|
+|2|SCAN| - |TEST|T1| - |0.484|
+|3|GROUP BY|on TEMPORARY table| - |tmp_subselect0| - |$${\color{red}0.491}$$|
+
+Regarding [Exasol Zone maps documentation](https://docs.exasol.com/db/latest/performance/zonemaps.htm) in profiling, an operation that utilized the zone records will show WITH ZONEMAP in the PART_INFO field. Why we do not see zone map usage?
+
+## Explanation
+
+The engine computes the data based on the execution plan created by the compiler. During filter stages, the Engine decides on the use of zone maps.
+
+> [!IMPORTANT]
+> The engine may decide not to use the zone map because it estimates that another access path or index is more efficient.
+
+Let's break down why this can happen in Exasol:
+
+Let us do the following test, and lets remove the index:
+
+```sql
+DROP LOCAL INDEX ON  TEST.T1 (sid);
+```
+
+> [!CAUTION]
+> Only perform such manual indexing operations based on the advice and guidance of Exasol Support.
+
+We run the query again:
+
+```sql
+SELECT DISTINCT sid
+FROM TEST.T1
+WHERE time_zones between '2021-06-01 00:00:00' and '2025-06-06 00:00:00';
+```
+
+### Profiling of the execution plan with zone map usage
+
+|PART_ID|PART_NAME|PART_INFO|OBJECT_SCHEMA|OBJECT_NAME|REMARKS|DURATION|
+|---|---|---|---|---|---|---|
+|1|COMPILE / EXECUTE| - | - | - | - |0.667|
+|2|SCAN|$${\color{red}WITH \space ZONEMAP}$$|TEST|T1|T1(TIME_ZONES)|0.117|
+|3|GROUP BY|on TEMPORARY table| - |tmp_subselect0|T1(SID)|$${\color{red}5.271}$$|
+
+> [!NOTE]
+> From this example we see that this execution speeds up the SCAN from 0.484 sec to 0.117 sec but slows down the GROUP BY from 0.491 sec to 5.271 sec.
+
+In order to explain the problem sufficiently and simply, we have simplified the execution somewhat and use the additional data sets.
+
+### Explanation of the execution plan without zone map usage
+
+Let us go the first execution:
+
+The SCAN uses the index on SID to identify the best ROWID order for the GROUP BY. So the data block layout after
+the SCAN is:
+
+|ROWID|TIMES_SEGMENT_ID|TIMES_VALUE|SID_VALUE|
+|---|---|---|---|
+|46|1|'2020-01-01'|5|
+|65|6|'2025-06-05'|5|
+|7348|5|'2025-06-02'|5|
+|23|1|'2020-01-01'|105|
+|3|5|'2025-06-03'|105|
+|1234|5|'2025-04-18'|111|
+|12|1|'2020-01-01'|20717|
+|128|5|'2025-06-04'|20717|
+
+The input is ordered as above. This means, engine reads the first entries and sees TIMES_SEGMENT_ID 1, 6, 5, 1, ...
+Accessing the metadata of a segment has an overhead and also consumes memory.  So our engine only accesses the metadata if consecutive rows in the data block have the same SEGMENT_ID.
+
+> [!NOTE]
+> Due to that mixed up TIMES_SEGMENT_IDs, our database engine does not use zone maps for the SCAN although zone maps are available.
+
+Now let us continue with the example.
+After the PIPE FILTER filter stage down the data block layout is:
+
+|ROWID|TIMES_SEGMENT_ID|TIMES_VALUE|SID|
+|---|---|---|---|
+|65|6|'2025-06-05'|$${\color{green}5}$$|
+|7348|5|'2025-06-02'|$${\color{green}5}$$|
+|3|5|'2025-06-03'|105|
+|128|5|'2025-06-04'|20717|
+
+> [!IMPORTANT]
+> This is the perfect input for the GROUP BY and considerable speeds up the following GROUP BY computation.
+
+$\textsf{\color{green}{Reason: GROUP BY execution is faster if entries with the same key are executed together.}}$  
+
+### Explanation of the execution plan with zone map usage
+
+Here the SCAN scans the column TIMES and executes the filter. The scan part leads to the following data block layout:
+
+|ROWID|TIMES_SEGMENT_ID|TIMES_VALUE|SID_VALUE|
+|---|---|---|---|
+|23|1|'2020-01-01'|105|
+|12|1|'2020-01-01'|20717|
+|46|1|'2020-01-01'|5|
+|1234|5|'2025-04-18'|111|
+|7348|5|'2025-06-02'|5|
+|3|5|'2025-06-03'|105|
+|128|5|'2025-06-04'|20717|
+|65|6|'2025-06-05'|5|
+
+Let us assume the zone maps for the TIMES_SEGEMENT_IDs (1,2,3,4) have min = '2020-01-01' and max = '2020-01-01'.
+This means the filter part of the scan can remove the segments 1,2,3,4 with zone maps and only needs to filter the rest.
+Thus the data block layout for the next stage after the filter part of FILTERSCAN is
+
+|ROWID|TIMES_SEGMENT_ID|TIMES_VALUE|SID|
+|---|---|---|---|
+|7348|5|'2025-06-02'|$${\color{red}5}$$|
+|3|5|'2025-06-03'|105|
+|128|5|'2025-06-04'|20717|
+|65|6|'2025-06-05'|$${\color{red}5}$$|
+
+Now we do a GROUP BY POSTPROCESSING. Here the issue starts because GROUP BY is a time consuming
+operation because the data is not already correctly pre-sorted by the scan.
+
+> [!IMPORTANT]
+> This is not the perfect input for the GROUP BY.
+
+$\textsf{\color{red}{Reason: GROUP BY execution is slower if entries with the same key are not executed together.}}$  
+
+> [!NOTE]
+> In general, GROUP BY is more time consuming than a SCAN.
+
+## References
+
+* [Exasol Zone Maps documentation](https://docs.exasol.com/db/latest/performance/zonemaps.htm)
+* [Profiling](https://docs.exasol.com/db/latest/database_concepts/profiling.htm)
+
+## Authors
+
+* [Peggy Schmidt-Mittenzwei](https://github.com/PeggySchmidtMittenzwei)
+* [Georg Dotzler](https://github.com/narmion)
+
+*We appreciate your input! Share your knowledge by contributing to the Knowledge Base directly in [GitHub](https://github.com/exasol/public-knowledgebase).*