Underscoring the Magical World of Oracle Database’s Hidden Parameters

I’ll just start out with the usual disclaimer about not setting Oracle’s hidden parameters (that start with an underscore) in a production environment without specific guidance from Oracle Support. Having said that, sometimes “you gotta do what you gotta do” before Oracle Development can get you a patch or a more specific recommendation.

Every major release of Oracle Database (now every year) has a number of new optimizer features along with the handful of bug fixes. The new optimizer features help you get the best execution plan in the shortest amount of time with the least amount of effort on your part — remember that the SQL language is declarative, not procedural (until you start putting your SQL statements into PL/SQL!). But as it happens with any large piece of software that has literally millions of lines of code, sometimes the new features don’t work quite as expected. In other words, new bugs! Many of these new features are controlled with hidden parameters, and I found one that wasn’t quite working as advertised.

Knowing how to at least identify what went wrong is the first step, if anything to help Oracle Support come up with a bug fix faster and potentially give you a way to mitigate the problem until the bug fix (or next Release Update) arrives.

Background

Keeping up with Oracle’s new release cycle, I dutifully started upgrading my development and QA databases from version 12.2.0.1 to 18c. At the time this bug first appeared, I was on 18.3 (RU July 2018), but this issue still happens with the latest RU at the time of this writing (April 2019). Things were going well right off the bat — many of the bugs we hit in 12.2.0.1 were fixed in the first release of 18c. We were getting better execution plans with few regressions, and that was exciting for everyone including our customers who were wondering why they needed to upgrade their Oracle version so often!

After a couple of months, however, I started getting reports from the R&D team about some strange behavior of analytic queries (everything from MIN/MAX to window functions). It was happening more when subqueries were involved and then rolled up with GROUP BY. Many times they just rewrote the queries and things were fine. But having been an Oracle DBA for over 25 years, I knew that things wouldn’t stay fine for long.

The first analytics-related bug (or so I thought was analytics-related) reared its ugly head like this:

Mysterious ORA-00932

The reason the ORA-00932 error was a mystery was because there were no columns in the query that could possibly have a string resembling anything even close to a DATE value. The VARCHAR2 columns referenced in the query all had values that would convert correctly to a number:

SELECT CAST(ROW_NUMBER() OVER
     (PARTITION BY INV_LOG.LOG_ID
     ORDER BY SUBQ.SUPPLY_ID, SUBQ.SUPPLY_INV_LOC_ID) AS
                   VARCHAR2(31)) ROWNUMBER,
     CAST(SUBQ.SUPPLY_INV_LOC_ID AS NUMERIC(18,0)) INVLOCID 
FROM . . .

Because this query had a window function, and I had just patched another bug with the LEAD function, I assumed that this was another variation of an analytics function bug, but I was wrong…

Debug Step 1: Oracle Support

Of course, the first thing to do when seeing what’s obviously a bug is to check Oracle Support for related symptoms, then create a service request. The query would not even parse, so I couldn’t even get a SQL_ID, so the Oracle Support engineer said after much back and forth, consternation, and hand-wringing: how about change OPTIMIZER_FEATURES_ENABLE to “12.1.0.2” to see if the query would parse, and sure enough, the query parsed and ran fine! At that point I ran SQLHC on the working version of the query and Oracle Support is looking at those results. But what do I do in the mean time? The release deadline was approaching, and the developers were getting restless.

At that point, the light bulb went off in my head: what are the new optimizer features introduced in Oracle Database 18c? Perhaps I can disable one of them without reverting the entire optimizer environment back two releases?

Debug Step 2: The X$ Detective

Setting OPTIMIZER_FEATURES_ENABLE to 12.1.0.2 at the system level was out of the question, and even having my developers enable that at the session level or use the OPT_PARAM hint was not the best solution for several reasons. First, why throw out all of the cool new optimizer features because of one misbehaving one? Second, who knows how many queries would have to be changed. And even if the queries were changed, what happens when the bug is fixed? There would inevitably be places where the hint would not get removed.

The first step was to figure out what hidden parameters controlled which new optimizer features. To find all of the hidden parameters that have the string “optimizer” in it, you use this query:

select
  ksppinm hparam,
  ksppstvl hparamval
from
  x$ksppi 
     join x$ksppsv
        using(indx)
where substr(ksppinm,1,1) = '_'
  and ksppinm like '%optimizer%';

HPARAM                                   HPARAMVAL           
---------------------------------------- --------------------
_optimizer_key_vector_pruning_enabled    TRUE                
_optimizer_vector_base_dim_fact_factor   200                 
_optimizer_adaptive_plans_continuous     FALSE               
_optimizer_adaptive_plans_iterative      FALSE               
_optimizer_generate_ptf_implied_preds    TRUE                
_optimizer_inmemory_capture_stored_stats TRUE                
_optimizer_inmemory_use_stored_stats     AUTO                
_optimizer_control_shard_qry_processing  65528               
_optimizer_interleave_or_expansion       TRUE                
_optimizer_use_feedback_for_join         FALSE               
_optimizer_ads_for_pq                    FALSE               
. . .

Wow, there are 191 of those in version 18c (18.6). Leaving some details as an exercise to the reader — my next step was to do a LEFT OUTER JOIN to the same query on the 12.1.0.2 side, then keeping the parameters that were only on the 18c side or had different default values.

Debug Step 3: The Process of Elimination

Knowing what parameters were new or had changed was half the battle. From Step 2 I built a list of ALTER SESSION statements that I could set to the opposite value and see if I would strike optimizer gold.

alter session set "_optimizer_ads_use_partial_results"=false;
alter session set "_optimizer_ads_use_spd_cache"=false;
alter session set "_optimizer_aggr_groupby_elim"=false;
alter session set "_optimizer_allow_all_access_paths"=false; -- BINGO
alter session set "_optimizer_band_join_aware"=false;
alter session set "_optimizer_cbqt_or_expansion"='OFF';
alter session set "_optimizer_eliminate_subquery"=false;
alter session set "_optimizer_enable_plsql_stats"=false;
alter session set "_optimizer_enhanced_join_elimination"=false;
alter session set "_optimizer_gather_feedback"=false;
alter session set "_optimizer_generate_ptf_implied_preds"=false;
alter session set "_optimizer_interleave_or_expansion"=false;
alter session set "_optimizer_key_vector_payload"=false;
alter session set "_optimizer_key_vector_pruning_enabled"=false;
alter session set "_optimizer_multicol_join_elimination"=false;
alter session set "_optimizer_nlj_hj_adaptive_join"=false;
alter session set "_optimizer_reduce_groupby_key"=false;
alter session set "_optimizer_strans_adaptive_pruning"=false;
alter session set "_optimizer_ads_use_partial_results"=true;
alter session set "_optimizer_ads_use_spd_cache"=true;
alter session set "_optimizer_aggr_groupby_elim"=true;
alter session set "_optimizer_allow_all_access_paths"=true;
alter session set "_optimizer_band_join_aware"=true;
alter session set "_optimizer_cbqt_or_expansion"='ON';
alter session set "_optimizer_eliminate_subquery"=true;
alter session set "_optimizer_enable_plsql_stats"=true;
alter session set "_optimizer_enhanced_join_elimination"=true;
alter session set "_optimizer_gather_feedback"=true;
alter session set "_optimizer_generate_ptf_implied_preds"=true;
alter session set "_optimizer_interleave_or_expansion"=true;
alter session set "_optimizer_key_vector_payload"=true;
alter session set "_optimizer_key_vector_pruning_enabled"=true;
alter session set "_optimizer_multicol_join_elimination"=true;
alter session set "_optimizer_nlj_hj_adaptive_join"=true;
alter session set "_optimizer_reduce_groupby_key"=true;
alter session set "_optimizer_strans_adaptive_pruning"=true;

It didn’t take long — I would flip the first half, then the second half, then half again (“The Price is Right” method, a.k.a. a binary search) until I found which parameter, if any, changed the query from being un-parseable to running correctly. One, and only one of those parameters, “_optimizer_allow_all_access_paths“, if set to FALSE (defaults to TRUE), did the trick. Apparently one of the attempted access paths caused the optimizer to flip column names around — I can’t wait to see the description of the bug! All of my 18c development and QA databases have this parameter setting in place and everything has been working fine for over two weeks.

Conclusion

Repeating the big caveat: don’t set Oracle hidden parameters in a production environment without a specific directive from Oracle Support. But taking calculated risks and understanding how the Oracle optimizer works can give you an edge when trying to move along code development and doing what you can to assist Oracle in assisting you to help squash a new optimizer bug in 18c.

I apologize for sounding like Mr. Obvious, but there will always be bugs with new Oracle Database features, and sometimes old features will break. Make sure you have a good QA regression testing process, most of which will be automated. Your developers, when writing new code, will probably find the rest of the new bugs!

Engage with Oracle Support early and often — you may not get the turnaround you’d like to make sure that bugs are fixed before your next release date, so be sure to have Plan B or Plan C ready. In this case, I still will not release the new code with the hidden parameter “_optimizer_allow_all_access_paths” set to FALSE until I get the official confirmation from Oracle Support (and another bug number with my name on it!) or a bug fix is released.

But setting this parameter will not hold up the rest of the development cycle and cause missed deadlines. As of this writing Oracle Support is making sure this bug isn’t already documented (it’s not, believe me, I checked!), and at that point they’ll give me a bug number and I hope some mitigation strategy (set the same parameter I did?). Wish me luck (skill), I know you’re all counting on me.

3 comments

  1. Have you tried optimising with 10053 enabled to generate the optimizer trace file. You may find from the trace file the point where the optimizer does its switch that introduces the possibility of error, and that might tell you what (transformation) feature is causing the problem.

    Does the code use “ANSI” style joins, or traditional Oracle ?

  2. They’re all ANSI joins. I believe I did run a 10053 trace at the time, didn’t see quite where it went wrong — however — Oracle Support did finally acknowledge that it’s a bug, but it’s still unpublished. I badger them on the status every week!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s