
COMPONENTS
Catalog
包括 schema,Table,RelDataType,Statistic
TableStatistic :rowCount行数,和ColumnStat
ColumnStatistic : distinct values ndv,nullCount,avgLen,maxLen,maxValue,minValue

SQL parser
1.将有效的SQL查询解析为抽象语法树(AST)
2.用JavaCC编写的LL(k)解析器
3.Input queries被解析为抽象语法树(AST)
4.Tokens在Calcite中由SqlNode表示
5.SqlNode也可以通过unparse方法转换回SQL字符串
JavaCC
1.Java编译器
2.创建于1996年的Sun Microsystem
3.从特定域的语言生成Java代码
4.ANTLR是用于Hive和Drill等项目的替代品

SQLNode
SqlNode 表示抽象语法树中的元素

SQL validator
根据Catalog提供的Metadata信息验证抽象语法树
Query optimizer
将AST转换为logical plans,优化 logical plans,并将 logical expressions 转换为 physical plans
Query Plan
Query plans represent the steps necessary to execute a query

Query Optimization
Optimize logical plan Goal is typically to try to reduce the amount of data that must be processed early in the plan Convert logical plan into a physical plan Physical plan is engine specific and represents the physical execution stages
Prune unused fields Merge projections Convert subqueries to joins Reorder joins Push down projections Push down filters

SQL generator
Converts physical plans to SQL
关键词
Relational algebra (RelNode)
RelNode 表示 a relational expression 是处理 data的行为
比如 Aggregate,Sort,Join,Project,Filter 等
RelNode 根据上游的Input数量,抽象出 3类,SingleRel,BiRel,SetOp
| RelNode类型 | 具体RelNode |
|---|---|
| SingleRel (input只有一个) | Aggregate,Calc,Scan |
| BiRel (input 有2个 left,right) | Join |
| SetOp (input有多个) | Union, Minus(except) ,Intersect |
Logical algebra
比如LogicalAggregate
Physical algebra
Row expressions (RexNode)
RexNode 表示 row级别的 表达式 ,在 validate之后生成 所以有该row的type类型
{@link RexLiteral} (constant value),
* {@link RexVariable} (variable), {@link RexCall} (call to operator with
* operands). Expressions are generally created using a {@link RexBuilder}
* factory.</p>
RexNode通过 RexBuilder Factory生成
| RexInputRef | Input column Reference |
|---|---|
| RexLiteral | 常量 |
| RexFieldAccess | SELECT emp.empno FROM emp |
| RexCall | Function call ,binary, unary, functions, special syntactic constructs * like CASE … WHEN … END |
| RexVariable | variable |
Traits (RelTrait)
表示 RelNode的 Trait,并不会 改变执行,主要用于 验证 plan output,Flink Retract用到 此Trait 3个重要的 trait types: Convention RelCollation 排序 ASC DESC RelDistribution hash,range,random,boardcast
举例 FlinkRelDistributionTraitDef SINGLETON(“single”), HASH_DISTRIBUTED(“hash”), RANGE_DISTRIBUTED(“range”), RANDOM_DISTRIBUTED(“random”), ROUND_ROBIN_DISTRIBUTED(“rr”), BROADCAST_DISTRIBUTED(“broadcast”), ANY(“any”);
比如在优化算子里面要满足是按key 进行hash Distribution
private def isInputSatisfyRequiredDistribution(input: RelNode, keys: Array[Int]): Boolean = {
val requiredDistribution = createDistribution(keys)
val inputDistribution = input.getTraitSet.getTrait(FlinkRelDistributionTraitDef.INSTANCE)
inputDistribution.satisfies(requiredDistribution)
}
Conventions (Convention)
Convention is a type of RelTrait
A Convention is associated with a RelNode interface
SparkConvention, JdbcConvention, EnumerableConvention, etc
Conventions are used to represent a single data source
Inputs to a relational expression must be in the same convention
METADATA
| Selectivity | ||
| UniqueKeys | ||
| RowCount | ||
| DistinctRowCount | ||
| PercentageOriginalRows | ||
| ColumnUniqueness | ||
| ColumnOrigin | ||
| Predicates | ||
| Collation | ||
| Distribution | ||
| Size | ||
| Parallelism | ||
| Memory | ||
| AllPredicates | ||
| ExpressionLineage | ||
| TableReferences |
Rules (RelOptRule)
Rules 用于修改 query plans
一共有2种类型:converters and transformers Rules
Converter rules implement Converter and convert from one convention to another
Rules are matched to elements of a query plan using pattern matching
onMatch is called for matched rules
Converter rules applied via convert
Converter Rule

Pattern Match

Planners (RelOptPlanner)
Planners implement the RelOptPlanner interface
Two types of planners: HepPlanner VolcanoPlanner
Heuristic Optimization
HepPlanner is a heuristic optimizer similar to Spark’s optimizer Applies all matching rules until none can be applied Heuristic optimization is faster than cost- based optimization Risk of infinite recursion if rules make opposing changes to the plan
Cost-based Optimization
VolcanoPlanner is a cost-based optimizer Applies matching rules iteratively, selecting the plan with the cheapest cost on each iteration Costs are provided by relational expressions Not all possible plans can be computed Stops optimization when the cost does not significantly improve through a determinable number of iterations
Cost is provided by each RelNode Cost is represented by RelOptCost Cost typically includes row count, I/O, and CPU cost Cost estimates are relative Statistics are used to improve accuracy of cost estimations Calcite provides utilities for computing various resource-related statistics for use in cost estimations

Programs (Program)
文档信息
- 本文作者:Jessica
- 本文链接:https://jessica0530.github.io/2020/08/18/calcite-2/
- 版权声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)