-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Description
Spark does not have unsigned types. Unsigned values in HPCC data that exceed what a long type can represent should be converted to string. Today, these values are mis-interpreted when converted to Java types.
This nid field of this file is a good example: https://eclwatch-hpcc.us-prod400thor-prod.azure.lnrsg.io/esp/files/index.html#/files/data/thor_data400::base::emailage_ingest::20250818
I don’t believe spark can represent any HPCC type greater than unsigned4.
Steps to Reproduce
- Read a dataset with an unsigned8
Expected Behavior
Unsigned8 values are read correctly
Actual Behavior
Unsigned8 values are read as Longs and look incorrect in Spark
HPCC Systems Version
9.10.x
HPCC4J Module
spark-hpcc - Spark connector for reading/writing HPCC datasets
HPCC4J Version
9.10.x
Java Version
Java 11
Environment/Configuration
Databricks
Code Sample
Additional Context
No response
Pre-submission Checklist
- I have checked the Common Issues and Solutions wiki
🤖 AI Validation Response
- Ready for re-review (Check this box when you've addressed the feedback and want the AI to validate again)
Thank you for the detailed bug report!
Your issue is well-documented and provides all the information we need to investigate the Unsigned8 handling in spark-hpcc. This is a known limitation due to Java's lack of unsigned 64-bit integer support, and your environment and steps are clear.
📋 Action Required
- Review the workaround in our documentation for handling Unsigned8 fields in Spark.
- If you need to preserve the full unsigned8 range, configure your pipeline to treat these fields as strings or BigIntegers.
- Let us know if the documented solution does not address your use case or if you have further questions.
ℹ️ How to Find This Information
Workaround for Unsigned8 Handling
- See the Common Issues and Solutions wiki, section: "Long type overflow error Message".
- In Spark, unsigned8 values exceeding 263-1 will overflow Java Longs and appear negative.
- To avoid this, use the connector option to treat unsigned8 fields as strings or BigIntegers.
Checking Your Configuration
- Review your spark-hpcc connector options for unsigned8 handling.
- If you need help configuring this, let us know your current pipeline setup.
🔍 Related Resources
- Common Issues and Solutions wiki – Unsigned8 overflow workaround
- Related issue: [BUG]: Issue Reading Data from Dataland Thor via Databricks #51 – Also discusses HPCC data reading in Spark
⚠️ Important Notes
- This is a known limitation: Java does not natively support unsigned 64-bit integers.
- HPCC4J 9.10.x with HPCC Platform 9.10.x and Java 11 is a valid, supported combination.
- Never share sensitive data (like passwords) in public issues.
Tip: This is covered in our documentation. Please review the suggested section and let us know if you need further clarification.
Please update this issue if you have additional questions or if the workaround does not resolve your problem. We're here to help!