Bind Variables and Parallel Queries Do Not Mix

This post was promised long ago, and I apologize for the tardiness. Some of you may recall my whining about seemingly unexplainable instance hanging since migrating our database to 64-bit hardware in September. Well, after some back and forth and hand-offs from one rep to another, we finally were given a possible explanation: Bug 4367986. The summary of the bug is "bind peeked parallel cursors do not share." This basically means that parallel queries that use bind variables won't share cursors. Not only does this defeat the purpose of using the bind variables, but it creates a new cursor for each parallel process. After a while, your cursor count will go up, just as mine did:

select sql_id, count(*)
from v$sql_shared_cursor
where bind_peeked_pq_mismatch='Y'
group by sql_id;

SQL_ID COUNT(*)
------------- ----------
f3u64ru922snx 520
ckha07wkfaf8v 5
9g26upcqjh8kp 1
gdnga6d26vf4g 15

While I wasn't able to choke and hang the instance in development, I was able to drive the count up as we saw in production.

We probably didn't see this before our x86_64 migration because on our 32-bit instance, our parallel_max_servers was only set to 16. After migrating to the new hardware it was raised to 80 based on "the formula". Dropping it to 0 obviously prevented the problem from coming up as well.

The bug is reportedly fixed in 10.2.0.4, which wasn't released at the time we were finishing up the SR. There was only a one-off patch for 10.2.0.3, meaning we had to upgrade from 10.2.0.2. Well we did this, applied the patch, and haven't had a reoccurrence of the problem since. That query listed above now happily returns no rows.

Of course there was also the issue of using bind variables in queries against partitioned tables. Greg Rahn had this to say:

Using PQ with binds can have other adverse effects, specifically if the partition key is not provided as a literal. When the partition key is a bind, the resulting plan will be a KEY-KEY plan (for pstart/pstop) because w/o a literal value the optimizer can not tell if there is any partition elimination since the literal value is not provided at parse time. This often times results in a "worst case" assumption, thus is it possible to have different plans even when the bind and literal statements use the same values.

I would speculate that the overhead of parsing literals when using PQ is minimal compared to the side effects it is causing (due to the bug) and the potential of suboptimal plans. I personally would never mix the two.