Thursday, July 11, 2013

Semantic query optimization gone bad

Here's a small test case to illustrate an issue one of my customers ran into when upgrading their main database from 10G to 11.2.0.3. The issue is about a particular query that returns correct results in 10G, but returns no rows when run in 11G. The test case sets up two tables, a parent table and a child table. The parent table has a concatenated primary key that is referenced by the child table. This issue is caused by the different CHECK constraints that exist on the second column of the key in both the parent and the child table.
drop table child;
drop table parent;

create table parent
(c1 number not null
,c2 number not null
,constraint par_pk primary key(c1,c2)
,constraint chk01 check(c2 = 2))
/

create table child
(c0 number not null
,c1 number not null
,c2 number not null
,constraint chi_pk primary key(c0)
,constraint chi_par_fk foreign key (c1,c2) references parent(c1,c2)
,constraint chk02 check(c2 between 0 and 999))
/

create index chi_fk on child(c1,c2)
/

insert into parent values(1,2);
insert into child values(0,1,2);
commit;

The query in question is this one, rather straightforward join between above two tables:
select c.*
from parent p
    ,child c
where p.c1 = c.c1
  and p.c2 = c.c2
/

Note that this query should just return the single child row, given the data that we inserted in both tables, but in 11.2.0.3 it returns no rows.

This issue seems to be caused by what's called transitive constraint injection into the query's where clause. Since our query states "p.c2 = c.c2" the optimizer in its wisdom decides to apply the text of constraint CHK01 on c.c2 and inject the resulting text into the query-text. And vice-versa it decides to apply the text of constraint CHK02 on p.c2 and inject the resulting text into the query-text. This results in the following plan for above query:
---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     1 |    65 |     1   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |        |     1 |    65 |     1   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| CHILD  |     1 |    39 |     1   (0)| 00:00:01 |
|*  3 |    INDEX SKIP SCAN           | CHI_FK |     1 |       |     1   (0)| 00:00:01 |
|*  4 |   INDEX UNIQUE SCAN          | PAR_PK |     1 |    26 |     0   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("C"."C2"=2)
       filter("C"."C2"=2)
   4 - access("P"."C1"="C"."C1" AND "P"."C2">=0 AND "P"."C2"<=999)

Note that at line 4, we seem to have lost the access predicate "p.c2 = c.c2". In itself that should not be an issue. If you were to look at this plan, and reverse-engineer its workings, the plan should just produce the single child row. But it doesn't.

Workarounds for this issue are:
- disable constraint CHK01, (exclusive) or disable constraint CHK02.
or,
- in the case of this plan, force a full scan on p, by adding a hint such as full(p).

There seems to go something bad internally in the exec-plan with regards to the Index-Unique-Scan on line 4 and its access predicates. If you run the query with gather_plan_statistics and then peek at a-rows (actual rows), you'll see that the Index-Unique-Scan fails to produce the single parent row.

I played around a little bit by varying the CHECK constraint on C2 of the child table. For instance if you change CHK02 to be CHECK(C2 > 0), line 4 of above execution plan ends up with this access predicate:

access("P"."C1"="C"."C1" AND "P"."C2">0 AND "P"."C2"="C"."C2")

Not something I've ever seen, a combination of a unbounded range and an equality predicate used as access predicate on the same indexed column. This one by the way has the same issue: the Index-Unique-Scan also produces zero rows.

Seems the code that is supposed to access the index is fed with some bad input, causing it to fail to execute correctly.

To be continued.

13/7/16 Update: This is a know bug in code that determines the access predicate to be used for an index scan. Bug number 12827166.



Saturday, May 25, 2013

Atomic refresh of MV, without the redo...


I tweeted about something that occupied me last Friday morning, and Galo Balda suggested to turn it into a blogpost:


Which is what *was* actually my idea when I was done with it, but instead decided to dedicate just a short tweet to it. Anyway here is the blogpost.

What was the scenario?

In order to fix performance issues with certain internal OBIEE reports, it was decided (over a year ago) to introduce a couple of materialized views. These were complex materialized views which could only be refreshed complete: no fast refresh possible. Users were requesting that these views be up-to-date every half hour. So using dbms_job a job was submitted that did a dbms_mview.refresh([mview],'C');

Refresh times of the mviews is in the order of tens of seconds. This implies that when a non atomic-refresh is used, users could see empty mviews. Remember:

  • Atomic-refresh: performs a delete of all rows currently in the mview, followed by a re-execute of the underlying mview query and re-insert of all rows retrieved. And then finally a 'commit';
  • Non atomic-refresh: performs a truncate (which includes an implicit commit) of the underlying mview's table segment, followed by the re-insert (and this case using +APPEND hint, btw) of rows, finally followed by a commit;
In non atomic-refresh mode, right after the truncate, users will see an empty mview for the aforementioned tens of seconds. This was not acceptable, so atomic refreshes are performed every 30 minutes. In atomic-refresh mode, users will continue to see the previous contents, up to the commit, at which time they suddenly see the new contents.

Some of the mviews held a considerable amount of (wide) rows: in the order of a few hundred thousand. Segment sizes in the tens of megabytes. In atomic-refesh mode this causes a considerable amount more of redo than is the case in non atomic-mode. The database at hand was running archive-log mode. And DBA's were planning on securing the database further by implementing a physical dataguard setup for it.

DBA's were concerned by the amount of redo generated during the day. After some investigation they found that almost all of that redo was to be attributed to the atomic-refreshes of the set of mviews.

So the challenge put forward by them was: could we please decrease that amount of redo generated?

How did we fix it?

I had recently attended Hotsos 2013. During one of Tom Kyte's presentations I was introduced with a novel way of using partition exchange to speedup full table updates. It went like this:

  • Suppose you need to update a huge table, all rows. You can do that with an update statement. But that would: a) take a lot of time, b) generate a lot of undo and redo.
  • Instead, assume that the table to be updated has been defined as being partitioned. In such a way that all rows would (always) go into a single partition. Right, so it is a partitioned table, but it has (and always will have) a single partition. You can do that with range partitioning on some not null column and using MAX_VALUE in the partition clause (code for this will follow later in this post).
    (I agree this is a *big* assumption, but let's assume we set up the table like this)
  • What we can do now is instead of the full-table update, perform a CTAS. And in this CTAS ensure the update is reflected by employing the appropriate SELECT-clause expressions.
  • Once this table has been created, we can then perform a partition exchange of this table segment with our (partitioned) table segment that was in need of the full-table update.
  • Presto. Done.
The thought I had friday was: can I use this trick to fix the redo-issue with the atomic-refreshes of our mviews? So is it possible to:
  1. Create a new single partition table with the same structure of the mview.
  2. And then perform a non atomic-refresh on the mview.
  3. And then perform a partition exchange of the mview's table segment with the partitioned table's table segment.
  4. And have the users use the partitioned table, not the mview. They will see an instant update when the exchange is executed.
Turned out this works as a charm, and reduced the redo considerably (due to non atomic-refreshes used now).

So here's a quick overview of the cookbook:

Current situation:
  • Mview1, currently every 30 minutes hit by an atomic-refresh.

New situation:
  1. Mview1 re-created under a different name.
  2. Single partition partitioned table created with original mview's name. Use some not-null column to do the range partition on.
  3. Submit a dbms_job to:
    • non atomically refresh the mview,
    • do a partition exchange
Example code:

Ad 1) Say we recreated that mview under name mview1_tmp.

Ad 2) Creating the table with the original mview's name, with a single partition:

create table mview1
tablespace ...
partition by range([some column]) 
      -- Some *not null* column of the original MVIEW mview1.
(partition all_data values less than (MAXVALUE))
as
select *
from mview1_tmp;

Ad 3) Submit the job:

declare
pl_jobno number;
begin
  --
  dbms_job.submit(pl_jobno
                 ,'begin dbms_mview.refresh(''MVIEW1_TMP''
                                           ,''C''
                                           ,atomic_refresh=>false);
                  execute immediate ''alter table MVIEW1
                          exchange partition all_data with table MVIEW1_TMP
                         including indexes without validation'';
                   end;'
                 ,INTERVAL => 'sysdate + (30/(24*60)');
  --
end;
/



Cheers,

Toon


PS1. When switching the refresh-mode of an mview from atomic to non-atomic, one can then also get more benefits (redo-log size wise), by specifying NOLOGGING. Which is what we did too.

PS2: I was concerned whether the partition exchange would hit an ORA-00054, or something alike, when a user would still have an open cursor on the partitioned table. Turned out it didn't. A partition exchange will just successfully execute in this case.