案例一
2011 年,一税务局的朋友请求优化下面 SQL。
select *from (select t.zxid,t.gh,t.xm,t.bm,t.fzjgdm,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZH) slzl,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '0') 无评价,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '1') 满意,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '2') 较满意,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '3') 一般,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '4') 较不满意,(select count(a.session_id)from test_v awhere to_char(t.zxid) = a.ZCRYZHand a.myd = '5') 不满意from CC_ZXJBXX tWHERE t.yxbz = 'Y')where slzl <> 0;
该 SQL 有 7 个标量子查询,在 5.5 节中讲到,标量子查询类似嵌套循环,如果主表返回数据很多并且主表连接列基数很高,会导致子查询被多次扫描。该 SQL 竟然有 7 个标量子查询,而且每个标量子查询除了过滤条件不一样,其他都一样,显然我们可以将标量子查询等价改写为外连接,从而优化 SQL,等价改写之后的写法如下。
SELECT T.ZXID,T.GH,T.XM,T.BM,T.FZJGDM,SUM(1) SLZL,SUM(DECODE(A.MYD, '0', 1, 0)) 无评价,SUM(DECODE(A.MYD, '1', 1, 0)) 满意,SUM(DECODE(A.MYD, '2', 1, 0)) 较满意,SUM(DECODE(A.MYD, '3', 1, 0)) 一般,SUM(DECODE(A.MYD, '4', 1, 0)) 较不满意,SUM(DECODE(A.MYD, '5', 1, 0)) 不满意FROM CC_ZXJBXX T, test_v Awhere A.ZCRYZH = T.ZXIDand T.YXBZ = 'Y'GROUP BY T.ZXID, T.GH, T.XM, T.BM, T.FZJGDM;
SQL 改写之后,因为两表只有关联条件,没有过滤条件,所以两表关联走 HASH 连接,test_v 也只需要被扫描一次,从而大大提升 SQL 性能。
注:decode(条件,值1,返回值1,值2,返回值2,…值n,返回值n,缺省值)这个是decode的表达式,具体的含义解释为:
IF 条件=值1 THENRETURN(翻译值1)ELSIF 条件=值2 THENRETURN(翻译值2)......ELSIF 条件=值n THENRETURN(翻译值n)ELSERETURN(缺省值)END IF
例如:
select t.id,t.name,t.age,decode(t.sex, '1', '男生', '2', '女生', '其他') as sexfrom STUDENT2 t
上述 SQL 其实是一个典型的报表开发初学者在刚开始工作的时候编写的,强烈建议大家要加强 SQL 编程技能。
案例二
本案例发生在 2017 年,是一个比较经典的标量子查询改写优化案例。SQL 和执行计划如下。
SELECT A.LXR_ID,A.SR,(SELECT C.JGID || '@' || C.DLS_BM || '@' || C.DLS_MCFROM KHGL_DLSJBXX C, KHGL_ZJKJ ZJKJWHERE C.JGID = ZJKJ.JGIDAND EXISTS (SELECT 1FROM LXR_YH YHWHERE YH.KH_ID = ZJKJ.KJ_IDAND YH.KHLX = '2'AND YH.LXR_ID = A.LXR_ID)) AS ZJJGXXFROM LXR_JBXX AWHERE A.STATUS = '1'AND A.GRDM = :v1AND EXISTS (SELECT 1FROM LXR_YH YH, KHGL_GRDLXX GRDLWHERE YH.FZGS_DM = :v2AND YH.KHLX = '2'AND YH.LXR_ID = A.LXR_IDAND GRDL.GRDL_ID = YH.KH_IDAND GRDL.STATUS = '1')AND ROWNUM < 21;
Execution Plan----------------------------------------------------------Plan hash value: 704492369---------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 41 | 12 (0)| 00:00:01 ||* 1 | FILTER | | | | | ||* 2 | HASH JOIN | | 28114 | 3761K| 156 (2)| 00:00:02 || 3 | TABLE ACCESS FULL | KHGL_DLSJBXX | 15342 | 1063K| 89 (2)| 00:00:02 || 4 | TABLE ACCESS FULL | KHGL_ZJKJ | 28114 | 1812K| 66 (0)| 00:00:01 ||* 5 | INDEX RANGE SCAN | LXR_YH_ID_LX | 1 | 68 | 4 (0)| 00:00:01 ||* 6 | COUNT STOPKEY | | | | | || 7 | NESTED LOOPS SEMI | | 1 | 41 | 12 (0)| 00:00:01 ||* 8 | TABLE ACCESS BY INDEX ROWID | LXR_JBXX | 1 | 39 | 5 (0)| 00:00:01 ||* 9 | INDEX RANGE SCAN | IDX_LXR_JBXX_GRDM | 1 | | 3 (0)| 00:00:01 || 10 | VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 7 (0)| 00:00:01 || 11 | NESTED LOOPS | | 1 | 110 | 7 (0)| 00:00:01 || 12 | TABLE ACCESS BY INDEX ROWID| LXR_YH | 1 | 75 | 5 (0)| 00:00:01 ||* 13 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_FZGSDM | 1 | | 4 (0)| 00:00:01 ||* 14 | INDEX RANGE SCAN | IDX_GRDLXX_XZQH_FZGS | 1 | 35 | 2 (0)| 00:00:01 |---------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - filter( EXISTS (SELECT 0 FROM "LXR_YH" "YH" WHERE "YH"."LXR_ID"=:B1 AND "YH"."KHLX"='2'AND "YH"."KH_ID"=:B2))2 - access("C"."JGID"="ZJKJ"."JGID")5 - access("YH"."KH_ID"=:B1 AND "YH"."KHLX"='2' AND "YH"."LXR_ID"=:B2)6 - filter(ROWNUM<21)8 - filter("A"."STATUS"='1')9 - access("A"."GRDM"=:V1)13 - access("YH"."FZGS_DM"=:V2 AND "YH"."LXR_ID"="A"."LXR_ID" AND "YH"."KHLX"='2')14 - access("GRDL"."GRDL_ID"="YH"."KH_ID" AND "GRDL"."STATUS"='1')filter("GRDL"."STATUS"='1')Statistics----------------------------------------------------------1 recursive calls2 db block gets103172 consistent gets21144 physical reads0 redo size533 bytes sent via SQL*Net to client472 bytes received via SQL*Net from client2 SQL*Net roundtrips to/from client0 sorts (memory)0 sorts (disk)1 rows processed
该 SQL 只返回 1 行数据,但是逻辑读为 103 172,显然 SQL 还能进一步优化。从执行计划中可以看到,Id=1 是 Filter,Filter 下面有两个儿子,这属于有害的 Filter。Id=3 和 Id=4 的两个表走的是全表扫描,并且这两个表 Id 前面没有*,也就是说这两个表没有过滤条件。SQL 的逻辑读绝大部分应该是由 Id=1 的 Filter,以及 Id=3 和 Id=4 这两个表贡献而来的。
Id=3 和 Id=4 这两个表来自于标量子查询。注意观察原始 SQL,在标量子查询中,Id=3 与 Id=4 这两个表与主表 LXR_JBXX 没有直接关联,主表是与标量子查询中的半连接进行关联的(YH.LXR_ID = A.LXR_ID)。
大家还记得标量子查询的原理吗?标量子查询类似嵌套循环,主表通过连接列传值给子查询。因为本案例 SQL 比较特殊,主表是与标量子查询中的半连接的表进行关联的,主表没有直接与标量子查询中 From 后面的表进行关联,这就导致了标量子查询中 From 后面的表没能通过连接列进行传值,从而导致 Id=3 和 Id=4 的表走了全表扫描,也导致了 SQL 使用了 Filter,进而使整个 SQL 运行缓慢。
为了消除 Filter,同时也为了能使 Id=3 和 Id=4 的两个表能走索引,需要对 SQL 进行等价改写,将标量子查询中的半连接改写为内连接就能使 Id=3 和 Id=4 的两个表使用索引了。在标量子查询章节中提到过,标量子查询可以等价改写为外连接。因为标量子查询中没有聚合函数,因此判断Id=3与Id=4两表关联之后应该是返回1的关系,因为如果两表关联后返回n的关系,SQL会报错。那么现在只需要考虑将标量子查询的半连接等价改写为内连接即可。因为原始的 SQL 写的是半连接,没有写成内连接,因此我们判断标量子查询中的半连接应该是属于n的关系,将半连接改写为内连接,如果半连接属于n的关系,要先将半连接变成 1 的关系。所以原始 SQL 可以等价改写为下面 SQL:
SELECT A.LXR_ID, A.SR, B.MSG AS ZJJGXXFROM LXR_JBXX A,(SELECT C.JGID || 『@』 || C.DLS_BM || 『@』 || C.DLS_MC AS MSG,YH.LXR_IDFROM KHGL_DLSJBXX C,KHGL_ZJKJ ZJKJ,(SELECT LXR_ID, KH_IDFROM LXR_YHWHERE KHLX = '2'GROUP BY LXR_ID, KH_ID) YH --对连接列分组将 n 的关系变为 1 的关系WHERE C.JGID = ZJKJ.JGIDAND YH.KH_ID = ZJKJ.KJ_ID) BWHERE A.LXR_ID = B.LXR_ID(+)AND A.STATUS = '1'AND A.GRDM = :v1AND EXISTS (SELECT 1FROM LXR_YH YH, KHGL_GRDLXX GRDLWHERE YH.FZGS_DM = :v2AND YH.KHLX = '2'AND YH.LXR_ID = A.LXR_IDAND GRDL.GRDL_ID = YH.KH_IDAND GRDL.STATUS = '1')AND ROWNUM < 21;
改写之后,SQL 的执行计划如下:
Elapsed: 00:00:00.01Execution Plan----------------------------------------------------------Plan hash value: 2638330795-------------------------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-------------------------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | 124 | 87 (3)| 00:00:02 ||* 1 | COUNT STOPKEY | | | | | || 2 | NESTED LOOPS OUTER | | 1 | 124 | 87 (3)| 00:00:02 || 3 | NESTED LOOPS SEMI | | 1 | 41 | 12 (0)| 00:00:01 ||* 4 | TABLE ACCESS BY INDEX ROWID | LXR_JBXX | 1 | 39 | 5 (0)| 00:00:01 ||* 5 | INDEX RANGE SCAN | IDX_LXR_JBXX_GRDM | 1 | | 3 (0)| 00:00:01 || 6 | VIEW PUSHED PREDICATE | VW_SQ_1 | 1 | 2 | 7 (0)| 00:00:01 || 7 | NESTED LOOPS | | 1 | 110 | 7 (0)| 00:00:01 || 8 | TABLE ACCESS BY INDEX ROWID | LXR_YH | 1 | 75 | 5 (0)| 00:00:01 ||* 9 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_FZGSDM | 1 | | 4 (0)| 00:00:01 ||* 10 | INDEX RANGE SCAN | IDX_GRDLXX_XZQH_FZGS | 1 | 35 | 2 (0)| 00:00:01 || 11 | VIEW PUSHED PREDICATE | | 1 | 83 | 75 (3)| 00:00:01 || 12 | NESTED LOOPS | | | | | || 13 | NESTED LOOPS | | 1 | 203 | 75 (3)| 00:00:01 ||* 14 | HASH JOIN | | 1 | 132 | 74 (3)| 00:00:01 || 15 | VIEW | | 1 | 66 | 7 (15)| 00:00:01 || 16 | SORT GROUP BY | | 1 | 68 | 7 (15)| 00:00:01 ||* 17 | TABLE ACCESS BY INDEX ROWID| LXR_YH | 1 | 68 | 6 (0)| 00:00:01 ||* 18 | INDEX RANGE SCAN | IDX_KHGL_LXRYH_LXRID | 1 | | 4 (0)| 00:00:01 || 19 | TABLE ACCESS FULL | KHGL_ZJKJ | 28114 | 1812K| 66 (0)| 00:00:01 ||* 20 | INDEX UNIQUE SCAN | KHGL_DLSJBXX_PK | 1 | | 0 (0)| 00:00:01 || 21 | TABLE ACCESS BY INDEX ROWID | KHGL_DLSJBXX | 1 | 71 | 1 (0)| 00:00:01 |-------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------1 - filter(ROWNUM<21)4 - filter("A"."STATUS"='1')5 - access("A"."GRDM"=:V1)9 - access("YH"."FZGS_DM"=:V2 AND "YH"."LXR_ID"="A"."LXR_ID" AND "YH"."KHLX"='2')10 - access("GRDL"."GRDL_ID"="YH"."KH_ID" AND "GRDL"."STATUS"='1')filter("GRDL"."STATUS"='1')14 - access("YH"."KH_ID"="ZJKJ"."KJ_ID")17 - filter("KHLX"='2')18 - access("LXR_ID"="A"."LXR_ID")20 - access("C"."JGID"="ZJKJ"."JGID")Statistics----------------------------------------------------------0 recursive calls1 db block gets400 consistent gets0 physical reads0 redo size533 bytes sent via SQL*Net to client472 bytes received via SQL*Net from client2 SQL*Net roundtrips to/from client1 sorts (memory)0 sorts (disk)1 rows processed
对 SQL 进行等价改写之后,SQL 的逻辑读下降到 400,本次优化也就到此为止。
通过本案例,各位读者应该对 SQL 等价改写引起足够重视,同时也要掌握标量子查询等价改写为外连接,半连接等价改写为内连接,反连接改写为外连接等最基本的 SQL 改写技巧,另外,大家还要对表与表之间关系引起足够重视。
