References#

Official documentation and additional resources for learning Apache Spark.

Official Documentation#

Apache Spark Official Site#

Programming Guides#

Operations Guides#

Cluster Manager Guides#

API Documentation#

Java API#

Scala API#

Additional Learning Resources#

Online Courses#

Blogs and Documentation#

Community#

Data Sources#

  • Kafka — Streaming data source
  • HDFS — Distributed file system
  • Parquet — Columnar format
  • Delta Lake — Storage with ACID transaction support

Cluster Environments#

Cloud Services#

Version Release Notes#

Performance Benchmarks#

Beginner#

  • Learning Spark, 2nd Edition (O’Reilly) — Jules S. Damji et al.
  • Spark: The Definitive Guide (O’Reilly) — Bill Chambers, Matei Zaharia

Advanced#

  • High Performance Spark (O’Reilly) — Holden Karau, Rachel Warren
  • Spark in Action, 2nd Edition (Manning) — Jean-Georges Perrin