2016 年,在上周末优化班的时候,一个同学请求现场优化如下 SQL。
with temp as(select sgd.detail_id id,wmsys.wm_concat(distinct(sg.gp_name)) groupnames,wmsys.wm_concat(distinct(su.user_name)) usernamesfrom sgdleft join sgon sg.id = sgd.gp_idleft join sugon sg.id = sug.gp_idleft join suon sug.user_id = su.idgroup by sgd.detail_id)select zh.id,zh.id detailid,zh.name detailname,zh.p_level hospitallevel,zh.type hospitaltype,dza.name region,temp.groupnames,temp.usernames,(casewhen gd.gp_id is null then0else1end) isallotedfrom zhleft join dzaon zh.area_id = dza.idleft join tempon zh.id = temp.idleft join (select gp_id, detail_id from sys_gp_detail where gp_Id = :0) gdon zh.id = gd.detail_id order by length(id),zh.id asc;
该 SQL 返回 20 779 行数据,要执行 4 分 32 秒。该执行计划中全是 HASH JOIN,这里就不贴执行计划了。
首先这条 SQL 最终返回 20 779 行数据,该 SQL 语句最后部分没有 GROUP BY,表与表之间关联全是外连接,主表 zh 没有过滤条件,因此判断 zh 表最多 20 779 行数据,因为它是外连接的主表,不管关联有没有关联上,zh 会返回表中全部数据,如果 zh 与 dza 是 1:n 关系,那么 zh 表总行数还将少于 20 779 行数据。同时也判定 dza,TEMP 数据量都不大,因为所有表关联完只返回 20779 行数据。既然都是小表,为什么最终要执行 4 分 32 秒呢?遇到此类问题,我们需要将 SQL 拆开,分步执行,这样就能判断 SQL 中哪一步是性能瓶颈。9.10 节中案例也是采用分步执行方法找到问题根本原因。
SQL 语句中有个 with as 子句,对其单独执行,发现要执行两分钟左右。with as 子句中有两个列转行函数:wmsys.wm_concat,将其注释之后 with as 子句能秒出。现在我们定位到,SQL 性能问题是由 wmsys.wm_concat 导致。对于列转行,Oracle 还提供了 Listagg 分析函数,wmsys.wm_concat 从 Oracle11g 之后返回的是 Clob 类型,而 Listagg 返回的是 varchar2 类型。因此我们尝试对 with as 子句进行等价改写,利用分析函数 Listagg 代替 wmsys.wm_concat,以验证改写之后是否还会出现性能问题。with as 子句原始 SQL 如下。
select sgd.detail_id id,wmsys.wm_concat(distinct(sg.gp_name)) groupnames,wmsys.wm_concat(distinct(su.user_name)) usernamesfrom sgdleft join sg on sg.id = sgd.gp_idleft join sug on sg.id = sug.gp_idleft join su on sug.user_id = su.idgroup by sgd.detail_id;
因为 with as 子句中有两个 wmsys.wm_concat,而且 wmsys.wm_concat 中有 distinct,而 Listagg 不支持 distinct,所以我们只能一个一个去掉 wmsys.wm_concat。现在将 with as 子句中 wmsys.wm_concat(distinct(su.user_name))usernames 去掉,只保留 wmsys.wm_concat(distinct (sg.gp_name))groupnames。因为 usernames 关联了 su,sug,而现在只保留 groupnames,所以我们需要将 su,sug 去掉,去掉 usernames 的 SQL 如下。
select sgd.detail_id id, wmsys.wm_concat(distinct(sg.gp_name)) groupnamesfrom sys_gp_detail sgdleft join sys_gp sg on sg.id = sgd.gp_idgroup by sgd.detail_id;
其执行计划如下。
已用时间:00: 00: 58.04.执行计划----------------------------------------------------------Plan hash value: 3491823204-----------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|-----------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20584 | 824K| | 1308 (8)|| 1 | SORT GROUP BY | | 20584 | 824K| 15M| 1308 (8)||* 2 | HASH JOIN RIGHT OUTER| | 313K| 12M| | 449 (6)|| 3 | TABLE ACCESS FULL | SYS_GP | 3 | 69 | | 3 (0)|| 4 | TABLE ACCESS FULL | SYS_GP_DETAIL | 313K| 5518K| | 438 (5)|-----------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------2 - access(「SG」.「ID」(+)=「SGD」.「GP_ID」)统计信息----------------------------------------------------------1 recursive calls249348 db block gets44447 consistent gets0 physical reads0 redo size9993548 bytes sent via SQL*Net to client6067828 bytes received via SQL*Net from client83118 SQL*Net roundtrips to/from client1 sorts (memory)0 sorts (disk)
执行计划中的 db block gets 来自于 Clob。因为 Listagg 不支持 distinct,所以我们需要先去重,再采用 Listagg,Listagg 改写的 SQL 如下。
select detail_id, listagg(gp_name, ',') withingroup(order by null)from (select sgd.detail_id, sg.gp_namefrom sys_gp_detail sgdleft join sys_gp sg on sg.id = sgd.gp_idgroup by sgd.detail_id, sg.gp_name)group by detail_id;
改写后的执行计划如下。
已用时间:00: 00: 01.12执行计划----------------------------------------------------------Plan hash value: 147456425-------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes |TempSpc| Cost(%CPU)|-------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 20584 | 1547K| | 1467 (7)|| 1 | SORT GROUP BY | | 20584 | 1547K| | 1467 (7)|| 2 | VIEW | VM_NWVW_0 | 43666 | 3283K| | 1467 (7)|| 3 | HASH GROUP BY | | 43666 | 1748K| 15M| 1467 (7)||* 4 | HASH JOIN RIGHT OUTER| | 313K| 12M| | 449 (6)|| 5 | TABLE ACCESS FULL | SYS_GP | 3 | 69 | | 3 (0)|| 6 | TABLE ACCESS FULL | SYS_GP_DETAIL | 313K| 5518K| | 438 (5)|-------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------4 - access(「SG」.「ID」(+)=「SGD」.「GP_ID」)统计信息----------------------------------------------------------1 recursive calls0 db block gets2775 consistent gets0 physical reads0 redo size450516 bytes sent via SQL*Net to client15595 bytes received via SQL*Net from client1387 SQL*Net roundtrips to/from client1 sorts (memory)0 sorts (disk)20779 rows processed
使用 Listagg 改写之后,SQL 能在 1 秒执行完毕,而采用 wmsys.wm_concat 需要 58 秒,这说明采用 Listagg 代替 wmsys.wm_concat 能达到优化目的。
下面我们改写另外一个 wmsys.wm_concat,改写的思路一模一样,先去重,再使用 Listagg。
select detail_id, listagg(user_name, ',') withingroup(order by null)from (select sgd.detail_id id, su.user_namefrom sgdleft join sg on sg.id = sgd.gp_idleft join sug on sg.id = sug.gp_idleft join su on sug.user_id = su.idgroup by sgd.detail_id, su.user_name)group by detail_id;
最终的 with as 子句如下。
select a.detail_id id , a.groupnames, b.usernamesfrom (select detail_id, listagg(gp_name, ',') withingroup(order by null) groupnamesfrom (select sgd.detail_id, sg.gp_namefrom sys_gp_detail sgdleft join sys_gp sg on sg.id = sgd.gp_idgroup by sgd.detail_id, sg.gp_name)group by detail_id) a,(select detail_id, listagg(user_name, ',') withingroup(order by null) usernamesfrom (select sgd.detail_id, su.user_namefrom sgdleft join sg on sg.id = sgd.gp_idleft join sug on sg.id = sug.gp_idleft join su on sug.user_id = su.idgroup by sgd.detail_id, su.user_name)group by detail_id) bwhere a.. detail_id = b.detail_id;
用改写后的 with as 子句替换原始 SQL 中的 with as 子句,最终 SQL 能在两秒左右执行完毕。
在工作中尽量使用 Listagg 代替 wmsys.wm_concat。
