Wednesday, September 22, 2010

Use Oracle Hints or Not

There has been many a heated discussions about this subject among developers and DBA's and at times taking pretty ugly turn with one section saying "Yes" and others saying "No" as if to emphasize that we only live in Yes & No world. Some of our tuning experts who were brought onsite at a client to resolve performance issues brought their own arsenal of repertoire of "Do's" and "Don'ts" and their first option of tweaking the code was with hints in the queries. This I hilariously call "Injecting of Hints".

Is this a right approach? Did we understand what was root cause before put in a hint. Will this hint support the same level of execution speed should data in the underlying Join tables change? Will this support if the client moves on to different version of the Database? What happens if the hint is deprecated? These are some of the valid questions developers, DBA's should take into account before including one in the queries.

Oracle CBO optimizer is built to provide the best execution plan for a Query based on the statistics that it has at the time of execution. Should the characteristics of the underlying data change drastically and statistics on those changed objects is not captured then it is worth putting a hint around the query to execute it faster. But this type injecting a hint in the query should be temporary fix and not a permanent one.

Does this mean we should not use hints in the first place? No the questions begs for more details on Approaches, Goals, Complexity, Database Type (OLTP, OLAP, MOLAP, ROLAP), Query nature (Adhoc or Tuned), Data Maintenance (One time upload),  any known issues with database, special query requirements that otherwise wouldn't work etc.

Oracle has broadly built different types of hints to address the above and many other scenarios and these hints as a Swiss knife like tool for any issues at Database, Session or for individual query requirements. (Remember Swiss knife is not equivalent to butcher knife if that is what you need in the first place)
  • Hints for Optimization Approaches and Goals (All Rows, First Rows etc)
  • Hints for Access Paths (Cluster, Full, Index, Hash, RowID etc)
  • Hints for Join Orders (Leading, Ordered etc)
  • Hints for Join Operations (NL, Merge, Hash etc)
  • Hints for Parallel Execution (Parallel, No Parallel etc)
  • Additional Hints (Anti Join, No Cache, Materialize etc)
 As I had indicated earlier, Oracle Optimizer relies on Statistics and collection of the same becomes one of the key constraint for CBO to work effectively. This is a Metadata for the Optimizer holding information about size of the table, distribution of data within columns of these tables, cardinality of data among combined columns, function based data columns population etc. Having Stats (Short for Statistics) makes the Optimizer to pick the right execution plan rather than having to pick one based on guess work.

This is one such area, Oracle has been improving upon constantly and in 11g has come up methods of collecting extended statistics. Collection of Stats using the Oracle DBMS_STATS package is results in  manifold performance improvement over init.ora changes even by Oracle Gurus. This package has become the core heart for collection of stats and surprisingly many developers have just skimmed the surface of this package. Besides, using this package one could transfer Statistics from Production server to say Build server to compare the execution plan with varying degrees of data in the build environment.

Oracle recommends using the Automatic Optimizer Statistics method for Stats collection in a appropriate maintainence window for ALL objects in the database for which statistics are absent or stale and this avoids any need for manual intervention. Timing and setting up of the maintenance window is critical so it is run when processing is idle and data for the current or previous period are already loaded to avoid manual re-run for stats.

Oracle provides several flexible features for collection of Stats and user needs to understand these and take several important considerations while using this package.

  • Statistics Gathering Using Sampling (Estimate Percent, Auto Sample Size etc)
  • Parallel Statistics Gathering (Degree of Parallelism)
  • Statistics on Partitioned Objects (Local/Global, Granularity, Incremental etc)
  • Column Statistics and Histograms (Understand skewness of data at column level)
  • Extended Statistics (Column grouping for accurate selectivity within a group, Expression Stats)
  • Determining Stale Statistics (Monitor Staleness on Inserts/Updates/Deletes, Truncation etc)
  • User-Defined Statistics (Stats on User defined Indexes, Functions)
  • Set Preferences on Stats collection package DBMS_STATS (Cascade, Degree, Granularity etc)
I prefer to take the route of rewriting the queries first rather than 'Injecting Hints'. Majority of the time I try to spot issues with queries in the first or second glance and simplify to make sure my cost and execution plan appears to be good. I try to eliminate multiple round trips to the data from same table, improvise the filter conditions to pick the right amount of data with appropriate filter clauses, avoid predicates which inhibit optimizer from picking indexes, eliminate the generous usage of features like 'distinct', 'group by' 'union', rewrite where possible with Oracle's Analytical functions and so forth. To be honest, writing queries is an Art and mastering it takes time like any other Art.

Secondly I prefer to keep on top of the subject on software upgrades,  improvements and implement these features as I go along on the development side rather than keep singing and humming that "old is gold" or "old was gold" rhymes.

No comments: