Dr. Statslove or: How I Learned to Stop Guessing and Love the 10053 Trace

This is the long-overdue follow-up post about my recent problems with large partitioned tables and statistics. First and foremost, a huge "Thank You!" to Wolfgang Breitling for his help on this. Wolfgang took pity on me after a pathetic post on oracle-l and held my hand, guiding me into the 10053 trace file and looking over my shoulder as I gained a small, but valuable, bit of knowledge into interpreting how and why the CBO makes some of the decisions it does. And so, without further ado, let me layout the evidence in this case...

THE FACTS
We have a large, partitioned table in an instance running on Oracle 10gR2 (10.2.0.2). This table holds data for each of the last 4 months, in a "rolling window" scenario. A full month will contain 25-30 million rows. This table, which we'll cleverly call FOO, has two indexes on it: FOO_PK, which is a compound primary key index involving 11 fields, and FOO_IDX, which is a compound non-unique index involving 5 fields. Both indexes make use of the fields YEAR, MONTH, and LOCATION in their left-most fields. FOO_PK also indexes CUST_NO in the 4th slot. The other fields do not play a role in this matter, but I initially guessed that they did, as I'll tell you later.

As part of the "rolling window" scenario, at the end of a month, the oldest partition would be dropped, and a new partition for the next month would be created. This would usually occur on, say, the 28th of a given month. After various billing cycles throughout the month, data would be bulk loaded (via INSERT+APPEND from external tables) into the new partition. The first such load would likely occur around the 5th of the month. Each biling cycle would be for one of 10 or so LOCATIONs, and some LOCATIONs have multiple billing cycles (different end days to split up the customer base).

THE CRIME
After the first bulk load of the cycle, and during some later ones early on in the month (but not all), when the application would query based on YEAR, MONTH, LOCATION, and CUST_NO, the CBO was choosing FOO_IDX rather than FOO_PK. This was bad because FOO_IDX does not index CUST_NO, and so was resulting in basically a huge range scan.

I immediately knew the problem must be with out-of-date statistics. Our instance was using the default GATHER_STATS_JOB that collects stale database statistics every night and weekends. After this job or a manual call to gather stats on that table, things would work well again. However, gathering statistics on even just that partition took nearly an hour. That's a long time for the users to be frustrated when what should be a 5-second query is now a 3-minute query and the help desk gets flooded with mis-diagnosed "MY COMPUTER IS FROZEN" tickets. A lot of the users were actually hard-rebooting their workstations because of this.

BAAG
In an homage to the BAAG Party, here is the Rube Goldberg-esque conspiracy theory that I had laid out in my head to explain what was happening.

Knowing that this partition's statistics were now stale, the CBO decides to discard them entirely.
The CBO looks at what it knows about the two indexes. It sees that FOO_PK is an 11-field monstrosity, and that FOO_IDX is only 5-fields, a relative kitten of an index.
The CBO decides that, all-things being equal, FOO_PK would take a lot more I/O to load into memory, and so defaults to FOO_IDX.

Then I sit back and wait for the kudos to roll in, a la C. Montgomery Burns. However when I actually tested this theory, replacing FOO_PK with a smaller, non-unique index on just the "main four" fields, the CBO still chose FOO_IDX. I was crestfallen, to say the least.

Wolfgang to the Rescue
This is about the time that Wolfgang Breitling answered my email off-line and helped through some things. First, he demonstrated how to run a 10053 trace on just the "EXPLAIN PLAN" call for a statement, which avoids having to actually run the statement. From the 10053 trace file after a query against data that had just been loaded into a previously empty partition:

  Access Path: index (RangeScan)
 Index: FOO_IDX
 resc_io: 0.00  resc_cpu: 200
 ix_sel: 0.0000e+00  ix_sel_with_filters: 0.0000e+00
 Cost: 1.00  Resp: 1.00  Degree: 1
Access Path: index (RangeScan)
 Index: FOO_PK
 resc_io: 0.00  resc_cpu: 200
 ix_sel: 0.0000e+00  ix_sel_with_filters: 0.0000e+00
 Cost: 1.00  Resp: 1.00  Degree: 1
****** trying bitmap/domain indexes ******
Best:: AccessPath: IndexRange  Index: FOO_IDX
      Cost: 1.00  Degree: 1  Resp: 1.00  Card: 0.00  Bytes: 0

Here you can see that the CBO costed both indexes equally at 1.00. This is where I learned that, in the case of ties, the CBO chooses the first index that it saw, and it looks at them in alphabetical order. So the chances are that if my FOO_PK index had been named FOO_AWESOME, it would have always been chosen in this case. But then we wouldn't have had such fun solving the problem.

But what about those seemingly random cases where data was already present? It turns out to have been happening when the LOCATION being queried was outside of the range of LOCATION values in the statistics. For example, if I have locations 001 and 003 loaded with statistics, then I load location 005 and query for it, I'll get a trace like this:

  Access Path: index (RangeScan)
 Index: FOO_IDX
 resc_io: 5.00  resc_cpu: 36640
 ix_sel: 5.9706e-08  ix_sel_with_filters: 5.9706e-08
 Cost: 1.00  Resp: 1.00  Degree: 1
Using prorated density: 5.9706e-08 of col #1 as selectivity of out-of-range value pred
Using prorated density: 5.9706e-08 of col #1 as selectivity of out-of-range value pred
Access Path: index (RangeScan)
 Index: FOO_PK
 resc_io: 5.00  resc_cpu: 36616
 ix_sel: 6.6995e-12  ix_sel_with_filters: 3.9386e-15
 Cost: 1.00  Resp: 1.00  Degree: 1
****** trying bitmap/domain indexes ******
Best:: AccessPath: IndexRange  Index: FOO_IDX
      Cost: 1.00  Degree: 1  Resp: 1.00  Card: 0.00  Bytes: 0

You can see the messages about "prorated density" and "out-of-range" values. The CBO tries to make an educated guess about the density of the specified LOCATION (col #1) value, which it does very poorly. Due to some rounding on Oracle's part, the costs of the two indexes once again tie, and the tie goes to the alphabetical first: FOO_IDX again.

If, however, in that same scenario with locations 001 and 003, I load location 002 and query for it, you'll see much different results, since 002 is within the range of 001-to-003:

  Access Path: index (RangeScan)
 Index: FOO_IDX
 resc_io: 951956.00  resc_cpu: 10796254518
 ix_sel: 0.5  ix_sel_with_filters: 0.5
 Cost: 96360.10  Resp: 96360.10  Degree: 1
Access Path: index (RangeScan)
 Index: FOO_PK
 resc_io: 5.00  resc_cpu: 59016
 ix_sel: 3.5448e-05  ix_sel_with_filters: 2.3983e-08
 Cost: 1.01  Resp: 1.01  Degree: 1
****** trying bitmap/domain indexes ******
Best:: AccessPath: IndexRange  Index: FOO_PK
      Cost: 1.01  Degree: 1  Resp: 1.01  Card: 0.08  Bytes: 0

Here you can see that the cost associated with FOO_IDX now makes it an obviously poor choice, and FOO_PK is the winner.

POSSIBLE SOLUTIONS
So now that we've identified the problem, we drew up a list of possible solutions.

Gather statistics after loading: Again, not ideal due to time involved in the middle of the business day.
Stage table loading with partition exchange: This would be ideal. We would load the data into a stage table with the same structure as a single partition in the big table. We'd then gather statistics and build indexes on that stage table and then do a partition exchange to swap that table with an empty partition in the big table, taking with it the up-to-date indexes and statistics. However this would require sub-partitioning the table by market and billing cycle, and then making the necessary changes to the loading scripts. A longer term solution, to be sure.
"Fix" the statistics: This would involve setting the table, index and column level statistics to values based on previous full months, except for the YEAR and MONTH fields, and without histograms, per Wolfgang's suggestion. We also lock statistics on that table to prevent GATHER_STATS_JOB from overwriting our settings. This would all be done with the DBMS_STATS package, of course.

Option #3 made the most sense. Wolfgang graciously whipped up a script to set the column statistics to what we needed. As soon as a new partition is created, we set the statistics to our own values with the full range accounted for. We also then gather real statistics on the current partition, which isn't really necessary but I felt better having real statistics when it was possible.

So now all is well on that front. I've had to follow-up with Wolfgang a couple of times to make sure I understood what his script was doing, and he followed-up with some minor errata. Thanks again to Wolfgang, and I'll lobby my wife to name a future son after him.

Edit: FOO_IDX is a non-unique index. Sorry for the quasi-typo.