apache · Jefffrey · Mar 31, 2024 · wgtmac · Apr 2, 2024 · Jefffrey
diff --git a/site/_docs/types.md b/site/_docs/types.md
@@ -69,7 +69,9 @@ create table Foobar (
 
 ORC includes two different forms of timestamps from the SQL world:
 
-* **Timestamp** is a date and time without a time zone, which does not change based on the time zone of the reader.
+* **Timestamp** is a date and time without a time zone, where the timestamp value is stored in the writer timezone
+encoded at the stripe level, if present. ORC readers will read this value back into the reader's timezone. Usually
+both writer and reader timezones default to UTC, however older ORC files may contain non-UTC writer timezones
 * **Timestamp with local time zone** is a fixed instant in time, which does change based on the time zone of the reader.
 
 Unless your application uses UTC consistently, **timestamp with
@@ -78,7 +80,7 @@ use cases. When users say an event is at 10:00, it is always in
 reference to a certain timezone and means a point in time, rather than
 10:00 in an arbitrary time zone.
 
-| Type        | Value in America/Los_Angeles | Value in America/New_York |
-| ----------- | ---------------------------- | ------------------------- |
-| **timestamp** | 2014-12-12 6:00:00           | 2014-12-12 6:00:00        |
-| **timestamp with local time zone** | 2014-12-12 9:00:00 | 2014-12-12 6:00:00 |
+| Type                               | Value in America/Los_Angeles | Value in America/New_York |
+| ---------------------------------- | ---------------------------- | ------------------------- |
+| **timestamp**                      | 2014-12-12 6:00:00           | 2014-12-12 6:00:00        |
+| **timestamp with local time zone** | 2014-12-12 9:00:00           | 2014-12-12 6:00:00        |
diff --git a/site/specification/ORCv1.md b/site/specification/ORCv1.md
@@ -1155,6 +1155,9 @@ records non-null values, a DATA stream that records the number of
 seconds after 1 January 2015, and a SECONDARY stream that records the
 number of nanoseconds.
 
+* Note that if writer timezone is set, 1 January 2015 is according to
+this timezone and not according to UTC
     /** 
      * Get the number of seconds between the ORC epoch in this timezone 
      * and Unix epoch. 
      * ORC epoch is 1 Jan 2015 00:00:00 local. 
      * Unix epoch is 1 Jan 1970 00:00:00 UTC. 
      */ 
     virtual int64_t getEpoch() const = 0; 
 int64_t writerTime = secsBuffer[i] + epochOffset; 
 if (writeTimeZone) { 
   if (useUTCTimeZone) { 
     builder.setWriterTimezone("UTC"); 
   } else { 
     builder.setWriterTimezone(TimeZone.getDefault().getID()); 
   } 
 } 
     /** 
      * Get the number of seconds between the ORC epoch in this timezone 
      * and Unix epoch. 
      * ORC epoch is 1 Jan 2015 00:00:00 local. 
      * Unix epoch is 1 Jan 1970 00:00:00 UTC. 
      */ 
     virtual int64_t getEpoch() const = 0; 
 int64_t writerTime = secsBuffer[i] + epochOffset; 
 if (writeTimeZone) { 
   if (useUTCTimeZone) { 
     builder.setWriterTimezone("UTC"); 
   } else { 
     builder.setWriterTimezone(TimeZone.getDefault().getID()); 
   } 
 } 
+
 Because the number of nanoseconds often has a large number of trailing
 zeros, the number has trailing decimal zero digits removed and the
 last three bits are used to record how many zeros were removed. if the
@@ -1170,6 +1173,35 @@ DIRECT_V2     | PRESENT         | Yes      | Boolean RLE
               | DATA            | No       | Signed Integer RLE v2
               | SECONDARY       | No       | Unsigned Integer RLE v2
 
+Due to ORC-763, values before the UNIX epoch which have nanoseconds greater
+than 999,999 are adjusted to have 1 second less.
 if (secsBuffer[i] < 0 && nanoBuffer[i] > 999999) { 
   secsBuffer[i] -= 1; 
 } 
 if (secsBuffer[i] < 0 && nanoBuffer[i] > 999999) { 
   secsBuffer[i] -= 1; 
 } 
+
+For example, given a stripe with a TIMESTAMP column with a writer timezone
+of US/Pacific, and a reader timezone of UTC, we have the decoded integer values
+of -1,440,851,103 from the DATA stream and 199,900,000 from the SECONDARY stream.
+
+First we must adjust the DATA value to be relative to the UNIX epoch. The ORC
+epoch is 1 January 2015 00:00:00 US/Pacific, since we must take into account the writer
+timezone. This translates to 1 January 2015 08:00:00 UTC, as US/Pacific is equivalent
+to a -08:00 offset from UTC at that date (no daylight savings). The number of seconds
+from 1 January 1970 00:00:00 UTC to 1 January 2015 08:00:00 UTC is 1,420,099,200. This is
+added to the DATA value to produce a value of -20,751,903. As this is before the
+UNIX epoch (since it is negative), and the SECONDARY value, 199,900,000, is
+greater than 999,999, then this DATA value is adjusted to become -20,751,904
+(1 second subtracted).
+
+This value by itself represents 5 May 1969 19:34:56.1999, which now needs to be adjusted
+from US/Pacific (the writer's timezone) to UTC (the reader's timezone). As the value is
+within daylight savings for US/Pacific, 7 hours are subtracted to give the final value
+of 5 May 1969 12:34:56.1999.
 int64_t writerTime = secsBuffer[i] + epochOffset; 
 if (!sameTimezone) { 
   // adjust timestamp value to same wall clock time if writer and reader 
   // time zones have different rules, which is required for Apache Orc. 
   const auto& wv = writerTimezone->getVariant(writerTime); 
   const auto& rv = readerTimezone->getVariant(writerTime); 
   if (!wv.hasSameTzRule(rv)) { 
     // If the timezone adjustment moves the millis across a DST boundary, 
     // we need to reevaluate the offsets. 
     int64_t adjustedTime = writerTime + wv.gmtOffset - rv.gmtOffset; 
     const auto& adjustedReader = readerTimezone->getVariant(adjustedTime); 
     writerTime = writerTime + wv.gmtOffset - adjustedReader.gmtOffset; 
   } 
 } 
 int64_t writerTime = secsBuffer[i] + epochOffset; 
 if (!sameTimezone) { 
   // adjust timestamp value to same wall clock time if writer and reader 
   // time zones have different rules, which is required for Apache Orc. 
   const auto& wv = writerTimezone->getVariant(writerTime); 
   const auto& rv = readerTimezone->getVariant(writerTime); 
   if (!wv.hasSameTzRule(rv)) { 
     // If the timezone adjustment moves the millis across a DST boundary, 
     // we need to reevaluate the offsets. 
     int64_t adjustedTime = writerTime + wv.gmtOffset - rv.gmtOffset; 
     const auto& adjustedReader = readerTimezone->getVariant(adjustedTime); 
     writerTime = writerTime + wv.gmtOffset - adjustedReader.gmtOffset; 
   } 
 } 
+
+For a TIMESTAMP_INSTANT column, this process is much simpler. Given the same values
+for DATA and SECONDARY stream, and given the offset from 1 January 1970 00:00:00 UTC
+to 1 January 2015 00:00:00 UTC is 1,420,070,400 seconds, we first add this to
+the DATA value -1,440,851,103 to produce -20,780,703 which is then adjusted 1 second
+back to -20,780,704. Paired with the SECONDARY value of 199,900,000 nanoseconds, this
+represents 5 May 1969 11:34:56.1999 UTC.
+
 ## Struct Columns
 
 Structs have no data themselves and delegate everything to their child