Apache Spark Assembly
English
English
  • Apache Spark Assembly
  • Ways to Read This Book
  • Core - Operation related
    • SparkContext
    • SparkConf
    • SparkEnv
    • Heartbeat
      • HeartbeatReceiver
      • Untitled
    • Scheduler
  • CORE - Execution Related
    • RDD
      • RDD Design
      • Principles of Overriding RDD
      • Default RDDs
        • ShuffledRDD
      • Transformations and Their Design
        • map / flatMap
        • filter
        • repartition / coalesce
        • sample / randomSplit / takeSample
        • union / ++ / intersection
        • sortBy
        • glom
        • cartesian
        • groupBy
        • pipe
        • mapPartitions / mapPartitionsWithIndex
        • zip / zipPartitions
        • Extra
      • Actions and Their Design
        • forach / foreachPartition
        • collect
        • toLocalIterator
        • subtract
        • reduce / treeReduce
        • fold
        • aggregate / treeAggregate
        • count / countApprox
        • countByValue / countByValueApprox
        • countApproxDistinct
        • take / first / top / takeOrdered
        • max / min
        • isEmpty
        • saveAsTextFile / saveAsObjectFile
        • keyBy
        • checkpoint / localCheckpoint / isCheckpointed / getCheckpointFile
        • zipWithIndex / zipWithUniqueId
        • Extra
      • Cache & Persist
      • RDD Operation Scope
      • RDD Checkpointing
    • Shuffle
    • Serializer
    • Partitioner
    • Broadcast
    • Aggregator
    • Memory
    • Storage
  • Running Spark App
    • Starting point
    • Mastering SparkConf
    • Web UI
  • Untitled
  • Programming Spark
    • Debugging
Powered by GitBook
On this page

Was this helpful?

  1. CORE - Execution Related

RDD

Resilent Distributed Dataset

PreviousSchedulerNextRDD Design

Last updated 4 years ago

Was this helpful?