Spark Memory usages
Off-heap memory
compressd buffered etc.
Netty direct buffer.
Off-heap execution
Off-heap storage
Executor memory model
Spark Memory (0.75 Storage Memory/Execution Memory)
User Memory (0.25)
Reserved Memory (300MB)
Executor memory model - a bit of history
2014(Dynamic Assignment, Static Memory Management)
2015(Project Tungsten)
2016.01(Off-heap Execution,Cooperative Spilling,Unified Memory Management)
2016.07(Off-heap Storage)
Executor memory model -Execution vs Storage
Execution
Memory used for shuffle,joins,sorts,aggregations
Storage
Memory used to cache data that will be reused later
How to arbitrate memory between execution and storage
Execution/storage
When execution is full, spill to disk
Storage is full,will evict LRU block to disk
Problems of static assignment
1.execution can only use a fraction of the memory
2.efficient use of memory required user tunning(调优)
Unified memory management
Dynamic assignment between tasks
the share of each task depends on number of actively running tasks
if another task comes along so the first task will have to spill
Each task is now assigned 1/N of the memory,where N=4
Each task is no assigned 1/N of the memory, where N=2
Cooperative spilling (Sort forces Aggragate to spill a page to free memory)
Execution memory model -Project Tungsten
Memory management and Binary Processing
In-memory binary date representation: row format and shuffle data
Cache-aware Computation
Faster sorting and hashing for aggregation,joins, and shuffle
Code Generation (not in this topic)
Faster expression evaluation and DataFrame/SQL operators
Project Tungsten: row binary format
Native: 4 bytes with UTF-8 encoding
Java: 48 bytes
文档信息
- 本文作者:Jessica
- 本文链接:https://jessica0530.github.io/2021/03/12/Spark-Memory-Model/
- 版权声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)