Spark
| Feature | Items | Iceberg | Hudi |
|---|---|---|---|
| DDL | SQL create table | ☑️ | ☑️ |
| SQL create table … as select | ☑️ | ☑️ | |
| SQL replace table … as select | ☑️ | ✖️ | |
| SQL drop table | ☑️ | ✖️ | |
| SQL alter table | ☑️ | ☑️ 部分支持 | |
| Write | SQL insert into | ☑️ | ☑️ |
| SQL insert overwrite | ☑️ | ☑️ | |
| Read | SQL select(最新快照) | ☑️ | ☑️ |
| DataFrame time travel query(某个时间点的快照) | ☑️ as-of-timestamp | ☑️ as.of.instant | |
| DataFrame version travel query(某个版本点的快照) | ☑️ snapshot-id | ✖️ | |
| Incremental | DataFrame incremental query(两个快照间的增量查询) | ☑️ start-snapshot-id, end-snapshot-id 可选的,默认为当前快照 |
☑️ beginInstantTime, endInstantTime |
| Streaming | DataFrame incremental query(某个时间点后的增量查询) | ☑️ stream-from-timestamp | ☑️ beginInstantTime |
| DataFrame write | ☑️ | ? | |
| Update | SQL update | ☑️ | ☑️ |
| SQL merge into | ☑️ | ☑️ | |
| Delete | SQL delete from | ☑️ | ☑️ |
Flink
| Feature | Items | Iceberg | Hudi |
|---|---|---|---|
| DDL | SQL create table | ☑️ | ☑️ |
| SQL create table like | ☑️ | ✖️ | |
| SQL drop table | ☑️ | ✖️ | |
| SQL alter table | ☑️ 部分支持 | ☑️ 部分支持 | |
| Write | SQL insert into | ☑️ | ☑️ |
| SQL insert overwrite | ☑️ | ✖️ 计划中 | |
| Read | SQL select(最新快照) | ☑️ | ☑️ |
| (某个时间点的快照) | ✖️ | ✖️ | |
| (某个版本点的快照) | ✖️ | ✖️ | |
| Incremental | SQL select | ✖️ | ☑️ read.start-commit, read.end-commit |
| Streaming | SQL select | ☑️ streaming, monitor-interval, start-snapshot-id |
☑️ read.streaming.enabled, read.streaming.check-interval, read.start-commit |
| SQL insert into | ☑️ | ? |
:::info Time Travel 使用场景
- 回滚:恢复到表的以前版本
- 调试:检查以前版本的数据,以查看它是如何随时间变化的
- 审核历史:通过 commit 的线索,可以查看数据的变更记录
:::
