Blog | Oracle DBA

In the realm of Oracle tuning, joins are the symphony that binds tables into meaningful insights. Yet, selecting the optimal join method—Nested Loops or Hash Join—can dramatically sway performance, especially as data volumes swell and query patterns evolve. This blog post guides you through comparing execution plans under varying data sizes, forcing a Hash Join, measuring CPU and I/O impact, and sculpting optimizer behavior via statistics and cardinality hints.

1. Understanding the Two Join Methods

Nested Loops Join
Iterates row-by-row from the driving table, probing the inner table via index lookups. Ideal when the driving set is small and the inner table has a selective index.
Hash Join
Builds an in-memory hash table on the smaller input, then probes it with rows from the larger input. Best suited for large, unsorted sets and full table scans.

2. Example Schema and Test Data

We’ll use two tables:

CREATE TABLE customers (
  customer_id    NUMBER PRIMARY KEY,
  region_id      NUMBER,
  customer_name  VARCHAR2(100)
);

CREATE TABLE orders (
  order_id       NUMBER PRIMARY KEY,
  customer_id    NUMBER,
  order_total    NUMBER,
  order_date     DATE
);

/* Populate small volumes first */
BEGIN
  FOR i IN 1..1000 LOOP
    INSERT INTO customers VALUES (i, MOD(i,10)+1, 'Cust '||i);
    INSERT INTO orders    VALUES (i, MOD(i,100)+1, DBMS_RANDOM.VALUE(100,1000), SYSDATE - MOD(i,30));
  END LOOP;
  COMMIT;
END;
/

Later, scale up to millions of rows by adjusting loop bounds.

3. Comparing Execution Plans at Small Scale

3.1 Nested Loops by Default

With 1,000 rows in each table, Oracle may choose Nested Loops:

EXPLAIN PLAN FOR
SELECT c.customer_name, o.order_total
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
 WHERE c.region_id = 5;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Sample Plan:

--------------------------------------------------------------------------------
| Id  | Operation                     | Name       | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |            |    10 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                 |            |    10 |    12   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID | CUSTOMERS  |    10 |     2   (0)| 00:00:01 |
|   3 |    INDEX RANGE SCAN           | IDX_REGION |    10 |     1   (0)| 00:00:01 |
|   4 |   INDEX UNIQUE SCAN           | PK_ORDERS  |    1  |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Here, Nested Loops with index scans is efficient for small result sets.

4. Scaling Up: Forcing Hash Join

After populating 1 million rows in each table, a Nested Loops may become costly. Let’s force a Hash Join using a hint:

EXPLAIN PLAN FOR
SELECT /*+ USE_HASH(o) */ c.customer_name, o.order_total
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
 WHERE c.region_id = 5;

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

Sample Plan:

------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows     | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |   10000  |  5000   (5)| 00:00:30 |
|   1 |  HASH JOIN             |           |   10000  |  5000   (5)| 00:00:30 |
|   2 |   TABLE ACCESS FULL    | CUSTOMERS |     1000 |   200    (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL    | ORDERS    |  100000  |   800    (5)| 00:00:05 |
------------------------------------------------------------------------------------

Oracle now builds a hash on the 1,000 matching customers and probes it against all orders, avoiding millions of nested index lookups.

5. Measuring CPU and I/O Impact

Capture real-time statistics before and after forcing the Hash Join:

-- Reset statistics
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

-- Run the query without hint
SELECT /*+ NO_CPU_COSTING */ ...;  -- your original query

-- Query V$SYSSTAT for physical reads and CPU time
SELECT name, value FROM v$sysstat
 WHERE name IN ('physical reads','CPU used by this session');

-- Run the hinted query
SELECT /*+ USE_HASH(o) */ ...;

-- Re-query V$SYSSTAT to observe deltas

You’ll observe:

Nested Loops: Low CPU, extremely high physical reads (index probes).
Hash Join: Higher CPU (hash build), but dramatically fewer reads and lower elapsed time.

6. Steering the Optimizer with Statistics

When data skew exists—say certain region_id values dominate—you can refine optimizer estimates:

6.1 Creating Histograms

EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname           => USER,
  tabname           => 'CUSTOMERS',
  method_opt        => 'FOR COLUMNS SIZE 254 REGION_ID'
);

This height-balanced histogram informs the optimizer that region_id = 5 is rare or common, thereby adjusting row estimates and join costs.

6.2 Cardinality Overrides

As a last resort, embed a cardinality hint:

SELECT /*+ CARDINALITY(c 500) */ c.customer_name, o.order_total
  FROM customers c
  JOIN orders o ON c.customer_id = o.customer_id
 WHERE c.region_id = 5;

This tells Oracle to assume 500 rows from customers, influencing its join method choice.

7. Best Practices and Takeaways

Test Under Realistic Volumes: A plan optimal at 1,000 rows may falter at 1 million.
Use Hints Judiciously: Leverage hints to validate performance, but aim for lasting solutions through statistics.
Monitor Continuously: Employ AWR and ASH to detect plan regressions and I/O anomalies.
Iterate and Automate: Integrate stat gathering and plan baseline scripts into your CI/CD pipeline for sustained efficiency.

Mastering join strategies is akin to fine-tuning an orchestra—each instrument (table) and conductor (optimizer) must align in harmony. By comparing execution plans, forcing join methods for validation, measuring resource impact, and sculpting optimizer behavior with statistics and hints, you ensure that every join resonates with peak performance, regardless of data scale. Encourage your fellow DBAs to embrace this methodical approach, and watch your queries sing with efficiency.

Oracle DBA

Join Strategy Mastery: Nested Loops vs. Hash Joins

1. Understanding the Two Join Methods

2. Example Schema and Test Data

3. Comparing Execution Plans at Small Scale

3.1 Nested Loops by Default

4. Scaling Up: Forcing Hash Join

5. Measuring CPU and I/O Impact

6. Steering the Optimizer with Statistics

6.1 Creating Histograms

6.2 Cardinality Overrides

7. Best Practices and Takeaways

Oracle DBA

No Comments

Leave a Message Cancel Reply

Search

Recent Posts

Categories

Tags

Archives

Blog Post

Join Strategy Mastery: Nested Loops vs. Hash Joins

1. Understanding the Two Join Methods

2. Example Schema and Test Data

3. Comparing Execution Plans at Small Scale

3.1 Nested Loops by Default

4. Scaling Up: Forcing Hash Join

5. Measuring CPU and I/O Impact

6. Steering the Optimizer with Statistics

6.1 Creating Histograms

6.2 Cardinality Overrides

7. Best Practices and Takeaways

Oracle DBA

No Comments

Leave a Message Cancel Reply

Search

Recent Posts

Categories

Tags

Archives