- B站视频-Spark从零到精通完整版:需要看的基础章节包括:概述、集群搭建、入门、深入RDD、RDD的分区、RDD的缓存、Spark原理(SparkSQL可以后续再看)
- spark知识点总结
- 马殿军的源码分析
- Spark源码阅读
- SparkSQL Hive ThriftServer 源码解析:SparkSQLCLIService
- Spark CommitCoordinator 保证数据一致性
- Data Source V2 API in Spark 3.0
- 一文理解 Apache Spark DataSource V2 诞生背景及入门实战
- Spark内存管理详解
- 探索Spark Tungsten的秘密
- Spark性能优化指南——基础篇
- Spark性能优化指南——高级篇
- Spark常见问题汇总
- The Internals Online Books
- waitingforcode系列
- Hive SQL迁移Spark SQL在滴滴的实践
- SparkSQL 在有赞的实践
- SparkSQL在有赞大数据的实践(二)
- Spark 2.3 无缝升级到 3.0 在唯品会的实践
- Apache Spark 完全替代传统数仓的技术挑战及实践
- 京东Spark自研Remote Shuffle Service在大促中的应用实践
- 降本增效利器!趣头条Spark Remote Shuffle Service最佳实践
- Magnet: A scalable and performant shuffle architecture for Apache Spark
- Cosco: An Efficient Facebook-Scale Shuffle Service
- Uber’s Highly Scalable and Distributed Shuffle as a Service
- github: uber rss
- github: linkedin magnet
- SPARK-31924: Create remote shuffle service reference implementation
- SPARK-25299: Use remote storage for persisting shuffle data
- SPARK-30602: Support push-based shuffle to improve shuffle efficiency