This quest started with the usual question: why is this query so slow? To put it in the picture, it was a query loading one DWH table by reading one source table from a legacy system (already loaded to Oracle, so no heterogenous services were involved at this step), joining it several times to several tables.
(It's the usual badly-designed legacy system: if flag1 is I, join table T1 by C1, if flag1 is N, join table T1 by C2... 20 times.)
If I simplify the query, we are talking about something like:
We even know, that the query always return number of rows identical to number of rows in T2. However, ommiting the T1.h = 'I'/'G' conditions in join clause would duplicate the rows, so the conditions are necessary there. Ofcourse it's not possible to move the conditions to WHERE clause, as this would elimitate all rows from result query.
To make the test case query even shorter, we can use for the demonstration just:
(This query makes almost no business sense now, but the the lateral view issue I want to demonstrate is still there.)
The query plan looks like:
This is awful! Cost of 1804M just for joining two tables (T1: 188K rows, T2: 5M rows). And yes, the execution proves the plan is not good (I did not have the patience to wait many hours (days?) for the query to finish).
However, a colleague suggested modifying the query as follows:
This does not change the result set - the 'I' is always not null and thus the nvl is superfluos. However, we get a different execution plan!
The cost is now 5409K, the operation is a nice hash join, and the query really finishes in few minutes.
The question now is: WHY?
Well, this is a matter of query optimization and plan generation, so the first person to ask is directly the CBO. So, I enabled the 10053 event for the two queries and dived into the two trace files, mainly to see the differences.
Both queries had the main query block initially rewritten as:
(The second query with the added NVL in "T1"."H"=NVL('G',"T2"."C1") ).
So, for Oracle, it is a lateral (correlated) view. That's not nice, but at this stage of CBO processing, normal. CBO will try to get rid of it.
However only for the NVL case the CBO trace shows:
Followed by:
Thus, CBO was able to rewrite is as the old-fashioned (+) outer join; however, it was not able to do it for the non-NVL query. And these result are passed to the next stage, and as no constraints or predicate move-around changes the query, they are verbatim passed for actual plan generation. And understandably, a lateral (correlated) view is not considered for hash join.
Anyway, should you read the Inside the Oracle Optimizer blog, you would already know that this is the classical example of the lateral non-mergeable view. Still, why the second one worked as we wanted?
Well, the quirk is in the fact that there is no way how to write the non-NVL query using (+) syntax - there is just no place to put the (+) sign to the t1.y='I' predicate to change it from filter to join predicate. However, adding artifically a column from T2 makes it possible, and the CBO did it. The CBO internally uses the old Oracle syntax, and thus if you can't rewrite your query using that syntax, neither CBO can.
Just a note - the same applies for example for predicate length(t1.q)=10, you can save the day by using length(nvl(t1.q,t2.c1))=10.
Tested on: Windows 64-bit (EM64T), Oracle 10.2.0.4.
(It's the usual badly-designed legacy system: if flag1 is I, join table T1 by C1, if flag1 is N, join table T1 by C2... 20 times.)
If I simplify the query, we are talking about something like:
SELECT T1.m,
case
when T1.h = 'I' then T2_I.n
when T1.h = 'G' then T2_G.n
else null
end
FROM T1
LEFT OUTER JOIN T2 T2_I
ON (T1.h = 'I' and T1.y = T2_I.c1)
LEFT OUTER JOIN T2 T2_G
ON (T1.h = 'G' and T1.z = T2_G.c2)
We even know, that the query always return number of rows identical to number of rows in T2. However, ommiting the T1.h = 'I'/'G' conditions in join clause would duplicate the rows, so the conditions are necessary there. Ofcourse it's not possible to move the conditions to WHERE clause, as this would elimitate all rows from result query.
To make the test case query even shorter, we can use for the demonstration just:
SELECT count(*)
FROM T1
LEFT OUTER JOIN T2
ON (T1.h = 'I' and T1.y = T2.c1)
(This query makes almost no business sense now, but the the lateral view issue I want to demonstrate is still there.)
The query plan looks like:
---------------------------------------------------------------
|Id|Operation |Name|Rows |Bytes| Cost | Time |
---------------------------------------------------------------
|0 |SELECT STATEMENT | | 1 | 43 | 1804M|999:59:59 |
|1 | SORT AGGREGATE | | 1 | 43 | | |
|2 | NESTED LOOPS OUTER | | 9805M| 392G| 1804M|999:59:59 |
|3 | TABLE ACCESS FULL | T1 | 188K|7899K| 718 | 00:00:09 |
|4 | VIEW | |52124 | | 9593 | 00:01:56 |
|*5| FILTER | | | | | |
|*6| TABLE ACCESS FULL| T2 |52124 | 356K| 9593 | 00:01:56 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("T1"."H"='I')
6 - filter("T1"."Y"="T2"."C1")
This is awful! Cost of 1804M just for joining two tables (T1: 188K rows, T2: 5M rows). And yes, the execution proves the plan is not good (I did not have the patience to wait many hours (days?) for the query to finish).
However, a colleague suggested modifying the query as follows:
SELECT count(*)
FROM T1
LEFT OUTER JOIN T2
ON (T1.h = nvl('I',T2.c1) and T1.y = T2.c1)
This does not change the result set - the 'I' is always not null and thus the nvl is superfluos. However, we get a different execution plan!
-------------------------------------------------------------
|Id|Operation |Name|Rows |Bytes| Cost | Time |
-------------------------------------------------------------
|0 |SELECT STATEMENT | | 1 | 54 | 5409K| 18:01:45 |
|1 | SORT AGGREGATE | | 1 | 54 | | |
|*2| HASH JOIN OUTER | | 9805M| 493G| 5409K| 18:01:45 |
|3 | TABLE ACCESS FULL| T1 | 188K|7899K| 718 | 00:00:09 |
|4 | TABLE ACCESS FULL| T2 | 5212K| 54M| 9585 | 00:01:55 |
-------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."Y"="T2"."C1"(+) AND
"T1"."H"=NVL('I',"T2"."C1"(+)))
The cost is now 5409K, the operation is a nice hash join, and the query really finishes in few minutes.
The question now is: WHY?
Well, this is a matter of query optimization and plan generation, so the first person to ask is directly the CBO. So, I enabled the 10053 event for the two queries and dived into the two trace files, mainly to see the differences.
Both queries had the main query block initially rewritten as:
SQL:******* UNPARSED QUERY IS *******
SELECT "T1"."Y" "Y","T1"."H" "H",
"from$_subquery$_004"."C1_0" "C1"
FROM "SCOTT"."T1" "T1",
LATERAL( (SELECT "T2"."C1" "C1_0" FROM "SCOTT"."T2" "T2" WHERE "T1"."H"='G' AND "T1"."Y"="T2"."C1"))(+) "from$_subquery$_004"
(The second query with the added NVL in "T1"."H"=NVL('G',"T2"."C1") ).
So, for Oracle, it is a lateral (correlated) view. That's not nice, but at this stage of CBO processing, normal. CBO will try to get rid of it.
However only for the NVL case the CBO trace shows:
CVM: Merging SPJ view SEL$1 (#0) into SEL$2 (#0)
Followed by:
SQL:******* UNPARSED QUERY IS *******
SELECT COUNT(*) "COUNT(*)"
FROM "SCOTT"."T1" "T1","SCOTT"."T2" "T2"
WHERE "T1"."Y"="T2"."C1"(+)
AND "T1"."H"=NVL('G',"T2"."C1"(+))
Thus, CBO was able to rewrite is as the old-fashioned (+) outer join; however, it was not able to do it for the non-NVL query. And these result are passed to the next stage, and as no constraints or predicate move-around changes the query, they are verbatim passed for actual plan generation. And understandably, a lateral (correlated) view is not considered for hash join.
Anyway, should you read the Inside the Oracle Optimizer blog, you would already know that this is the classical example of the lateral non-mergeable view. Still, why the second one worked as we wanted?
Well, the quirk is in the fact that there is no way how to write the non-NVL query using (+) syntax - there is just no place to put the (+) sign to the t1.y='I' predicate to change it from filter to join predicate. However, adding artifically a column from T2 makes it possible, and the CBO did it. The CBO internally uses the old Oracle syntax, and thus if you can't rewrite your query using that syntax, neither CBO can.
Just a note - the same applies for example for predicate length(t1.q)=10, you can save the day by using length(nvl(t1.q,t2.c1))=10.
Tested on: Windows 64-bit (EM64T), Oracle 10.2.0.4.
Comments