Skip to content

Conversation

Harshg999
Copy link
Collaborator

Problem Statement

When executing Hive queries from Hue, different query patterns showed inconsistent user attribution in Ranger audit logs:

  • Query 1 (CREATE EXTERNAL TABLE + INSERT): Executed as hive service account
  • Query 2 (CREATE TABLE AS SELECT): Executed as logged-in user (e.g., knoxui)

This inconsistency caused Ranger authorization failures when policies were configured for the logged-in user but not the service account.

Root Cause Analysis

The hive.server2.proxy.user configuration parameter was only set during session initialization in the open_session() method (line 711). However, HiveServer2 requires this parameter to be passed at the statement execution level via the confOverlay parameter to ensure consistent impersonation.

The _get_query_configuration() method only returned user-defined settings from the query object, without including the critical proxy user configuration needed for proper impersonation.

Solution

This PR ensures hive.server2.proxy.user is explicitly included in every statement execution by modifying three methods:

  1. _get_query_configuration(): Automatically adds proxy user to query configuration
  2. execute_statement(): Ensures proxy user is set for synchronous executions
  3. execute_async_statement(): Ensures proxy user is set for async executions

The fix checks that the proxy user isn't already present (to avoid overriding user-provided values) and only applies to Hive/Beeswax query servers.

Changes Made

File Modified: apps/beeswax/src/beeswax/server/hive_server2_lib.py

  • Lines 1073-1077: Added proxy user check in execute_statement()
  • Lines 1092-1096: Added proxy user check in execute_async_statement()
  • Lines 1280-1284: Added proxy user to _get_query_configuration() output

Testing

  • CREATE EXTERNAL TABLE + INSERT now executes as logged-in user
  • CREATE TABLE AS SELECT continues to execute as logged-in user
  • Ranger audit logs show consistent user attribution across all query types
  • No impact on Impala queries (uses different impersonation mechanism)

Impact

  • Scope: Affects all Hive/Beeswax queries executed through Hue
  • Backward Compatibility: Fully backward compatible - only ensures existing impersonation is consistently applied
  • Security: Improves security by ensuring all operations are properly attributed to the actual user
  • Authorization: Fixes Ranger policy enforcement for users with restricted access

Problem:
User impersonation via hive.server2.proxy.user was only set during
session initialization but not passed with individual statement
executions. This caused different query types to execute as different
users in Ranger audit logs - CREATE TABLE + INSERT statements ran as
the Hive service account while CREATE TABLE AS SELECT queries ran as
the logged-in user.

Root Cause:
The _get_query_configuration() method only returned user-defined
settings without including the proxy user configuration. Statement
execution methods (execute_statement and execute_async_statement)
relied solely on session-level configuration, which HiveServer2 does
not consistently propagate to all operations.

Solution:
Ensure hive.server2.proxy.user is explicitly included in the
confOverlay parameter for every statement execution by:
1. Adding proxy user to _get_query_configuration() output
2. Adding proxy user check in execute_statement()
3. Adding proxy user check in execute_async_statement()

This guarantees consistent user impersonation for all Hive query
types including DDL, DML, and CTAS operations.

Testing:
- Verified CREATE EXTERNAL TABLE + INSERT executes as logged-in user
- Verified CREATE TABLE AS SELECT executes as logged-in user
- Confirmed Ranger audit logs show consistent user attribution
@Harshg999 Harshg999 self-assigned this Oct 7, 2025
Copy link

github-actions bot commented Oct 7, 2025

⚠️ No test files modified. Please ensure that changes are properly tested. ⚠️

Copy link

github-actions bot commented Oct 7, 2025

UI Code Coverage Report

Lines Statements Branches Functions
Coverage: 33%
39.88% (31298/78466) 31.56% (14605/46272) 25.8% (2347/9095)

Copy link

github-actions bot commented Oct 7, 2025

Coverage

Backend Code Coverage Report •
FileStmtsMissCoverMissing
apps/beeswax/src/beeswax/server
   hive_server2_lib.py100542757%70, 73–75, 87, 91, 97–99, 104, 111–113, 129–132, 156, 158, 183, 191–193, 205–212, 216–218, 230, 239–247, 249, 251, 261, 292, 301–304, 306, 331–344, 386, 396, 404, 407–408, 410, 428, 430–434, 437–438, 441, 449–452, 454–456, 458–459, 461–462, 464–469, 471–472, 474–482, 484–485, 487–489, 491, 496, 500–513, 518, 522, 526, 530, 534–546, 551–558, 563–564, 566–570, 630–631, 675, 682, 700–701, 707, 711–717, 720, 723, 726, 729–735, 741–743, 750–751, 753, 755, 780–783, 797–798, 802–803, 814, 840, 848–851, 864–873, 903, 961, 973–979, 981–982, 988–989, 991–992, 998, 1000–1007, 1009–1011, 1013, 1015–1020, 1022, 1030–1031, 1052, 1054–1056, 1058, 1060–1061, 1064–1065, 1067, 1081, 1090–1091, 1094, 1097–1098, 1100–1103, 1105–1106, 1108, 1110, 1124–1125, 1128–1130, 1142, 1148, 1153–1154, 1156–1157, 1159, 1161, 1164–1166, 1169–1176, 1178–1179, 1181, 1187, 1210, 1220, 1222–1228, 1231–1232, 1235–1236, 1239–1241, 1243–1245, 1261–1263, 1286–1287, 1300–1301, 1303, 1319, 1325–1326, 1328, 1330, 1336, 1361, 1370–1371, 1373, 1375, 1383, 1390–1393, 1395–1398, 1400–1402, 1406, 1419–1423, 1427, 1430, 1433, 1450, 1467, 1473, 1480, 1486, 1498, 1501–1504, 1507–1509, 1512–1514, 1517–1520, 1523–1526, 1528–1529, 1531, 1533, 1535, 1538–1540, 1543, 1546–1548, 1551, 1554, 1557–1558, 1560–1561, 1563–1564, 1566, 1568, 1571–1572, 1575, 1591–1593, 1596–1597, 1600, 1603, 1607, 1614, 1621, 1626, 1629
TOTAL568742891549% 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant