Sensitive data written to disk unencrypted in Spark
High severity
GitHub Reviewed
Published
Aug 8, 2019
to the GitHub Advisory Database
•
Updated Oct 24, 2024
Description
Published by the National Vulnerability Database
Aug 7, 2019
Reviewed
Aug 8, 2019
Published to the GitHub Advisory Database
Aug 8, 2019
Last updated
Oct 24, 2024
Prior to Spark 2.3.3, in certain situations Spark would write user data to local disk unencrypted, even if spark.io.encryption.enabled=true. This includes cached blocks that are fetched to disk (controlled by spark.maxRemoteBlockSizeFetchToMem); in SparkR, using parallelize; in Pyspark, using broadcast and parallelize; and use of python udfs.
References