Blog | Oracle DBA

When an application team approaches you with a complaint of performance degradation on an Oracle Database, a structured and methodical troubleshooting approach is paramount. Below is a strategic, end-to-end diagnostic framework to identify and remediate the root cause:

Step 1: Preliminary Assessment – Framing the Problem

Before diving into deep diagnostics, it’s imperative to clearly frame the scope and context of the performance issue. A well-executed preliminary assessment not only accelerates root cause identification but also prevents unnecessary deep-dives into irrelevant areas.

1.1 Understand the Business Impact

Start by engaging the application or business stakeholder to assess:

What exactly is perceived as “slow”?
- Is it a specific transaction, report, dashboard, or background process?
- What is the expected vs. actual response time?
What is the criticality?
- Is this impacting end-users, SLAs, or financial operations?
- Is the issue isolated to a few users or global across the platform?

1.2 Temporal and Environmental Context

Clarify the timeframe and scope:

When did the issue start? (exact date/time)
Is it consistent, intermittent, or escalating?
Has anything changed recently?
- Application deployments
- Database patching
- Statistics gathering jobs
- Infrastructure (network, storage, OS patches)

1.3 Identify the Affected Components

Correlate the slowness with specific components:

Is it tied to a specific schema, module, or data set?
Is the latency observed during a particular time window (e.g., peak hours)?
Is the performance degradation occurring across all nodes (in case of RAC)?

1.4 Initial Triage – Quick System Health Check

Run a high-level health check to eliminate basic availability or load issues:

sqlCopyEditSELECT status FROM v$instance;
SELECT * FROM v$session WHERE status = 'ACTIVE' AND username IS NOT NULL;

Check for any system-wide locks, long-running queries, or wait events.
Use tools like OEM, Toad, or SQL Developer Performance Hub if available, for real-time observation.

1.5 Communication and Expectations Management

Communicate to the application team that you are initiating a structured investigation. Set a clear expectation on timelines and follow-ups. Collect logs, user reports, and timestamps if available.

Step 2: Session-Level Diagnostics – Identifying the Frontline Symptoms

Once the problem has been scoped, the next logical step is to observe what is happening inside the Oracle instance at the session level. This layer provides immediate insights into what the database is actually doing (or waiting for), and is crucial for identifying the “first responders” – sessions that are directly experiencing or causing performance degradation.

2.1 Identify Active Sessions and Their Wait Events

Start by querying V$SESSION to isolate active, non-background sessions:

SELECT s.sid, s.serial#, s.username, s.status,
s.program, s.module, s.event, s.sql_id,
s.wait_class, s.seconds_in_wait,
s.blocking_session
FROM v$session s
WHERE s.username IS NOT NULL
AND s.status = 'ACTIVE'
ORDER BY s.seconds_in_wait DESC;

What this reveals:

Which sessions are active and waiting
What events they are waiting on (e.g., db file sequential read, log file sync)
Blocking chains and contention scenarios
SQL currently being executed (SQL_ID)

2.2 Analyze Top Wait Events

Understanding the nature of the wait event is essential:

I/O-related waits:
db file scattered read, db file sequential read – indicates full table scans or index lookups.
Contention waits:
enq: TX - row lock contention, buffer busy waits – points to application design or concurrency issues.
Commit/Redo related:
log file sync, log file switch – indicates issues with redo generation or log configuration.

2.3 Correlate SQL_ID with Execution Plan and Statistics

Once the problematic SQL_IDs are identified, gather execution stats from V$SQL:

SELECT sql_id, elapsed_time/1000000 elapsed_sec, 
       cpu_time/1000000 cpu_sec, executions, 
       buffer_gets, disk_reads, sql_text
FROM   v$sql
WHERE  sql_id = '<sql_id>';

This provides a high-level performance profile:

Total and per-execution time
Logical vs. physical I/O
SQL text for review and tuning

2.4 Examine Blocking and Long-Running Sessions

Use the following to identify blocking sessions:

SELECT blocking_session, sid, serial#, wait_class, seconds_in_wait
FROM v$session
WHERE blocking_session IS NOT NULL;

Also, monitor for runaway or long-duration sessions:

SELECT sid, username, sql_id, status, 
       round(last_call_et/60) minutes_running
FROM   v$session
WHERE  status = 'ACTIVE'
AND    username IS NOT NULL
ORDER BY last_call_et DESC;

2.5 Optional: Enable Real-Time Monitoring for Critical Sessions

In critical scenarios, consider enabling:

SQL Trace (10046): For deep-dive analysis
Real-Time SQL Monitoring: For long-running queries (Enterprise Edition with Tuning Pack)

Example:

EXEC DBMS_MONITOR.SESSION_TRACE_ENABLE(session_id => , serial_num => , waits => TRUE, binds => TRUE);

Key Takeaway

Session-level diagnostics act as a real-time telemetry system for your Oracle instance. They help pinpoint:

What the database is waiting on
Which SQL is consuming the most resources
Where the immediate bottlenecks reside

By capturing this telemetry early, DBAs can isolate hotspots, minimize guesswork, and fast-track root cause analysis—laying the groundwork for surgical remediation.

Step 3: AWR or Statspack Analysis – Building the Performance Narrative

After establishing real-time symptoms through session-level diagnostics, the next strategic move is to perform historical and trend-based analysis. This is where AWR (Automatic Workload Repository) or Statspack comes into play—offering a macro-level view of system performance over time.

This step is about connecting tactical observations with systemic behavior, enabling data-driven decisions rather than reactive assumptions.

3.1 Choosing Between AWR and Statspack

AWR (Licensed Feature):
Available with Oracle Enterprise Edition + Diagnostic Pack. Highly granular, with integrated OEM reports, SQL monitoring, and wait event breakdowns.
Statspack (Free):
Suitable for Standard Edition or environments without Diagnostic Pack licensing. Offers baseline performance stats with some manual correlation required.

3.2 Generating AWR Reports

Use the built-in script:

$ORACLE_HOME/rdbms/admin/awrrpt.sql

Select the database ID, instance number, and the snapshot range that aligns with the problem window. Choose HTML format for better readability.

Key Sections to Review:

Top 5 Timed Foreground Events
- Helps identify major wait bottlenecks (I/O, CPU, locking, commit latency)
Load Profile
- Transactions per second, redo size, logical/physical reads, parse counts
Instance Efficiency Metrics
- % Non-Parse CPU usage, buffer cache hit ratio, soft parse ratio
Top SQL by Elapsed Time / Buffer Gets
- Identifies most expensive statements at the system level
Wait Class Breakdown
- Helps determine if system is bottlenecked on user I/O, concurrency, or commit contention
I/O Statistics by Tablespace / File
- Reveals storage-level imbalances or hot segments

3.3 Interpreting Findings in a Business Context

High db file sequential read? → Likely many index lookups—check indexing strategy and stats freshness.
High log file sync? → Too many commits—batching DML may help.
High CPU usage with low throughput? → Could indicate inefficient SQL or excessive context switching.
Poor buffer cache hit ratio? → Possible undersized SGA or suboptimal data access paths.

Always correlate the Top SQL section with inputs from Step 2. This helps reinforce whether the problematic SQL statements seen in V$SQL and V$SESSION are the actual systemic culprits or red herrings.

3.4 Using ADDM (If Licensed)

ADDM (Automatic Database Diagnostic Monitor) provides actionable recommendations:

$ORACLE_HOME/rdbms/admin/addmrpt.sql

Review:

SQL tuning suggestions
Resource bottlenecks
Configuration advisories

ADDM is especially useful in presenting findings to non-technical stakeholders or auditors.

3.5 Statspack Workflow (For Unlicensed Environments)

Install Statspack (once):
@?/rdbms/admin/spcreate.sql
Take snapshots manually or via scheduled job.
Generate reports:
@?/rdbms/admin/spreport.sql

Statspack output is more text-heavy but still provides:
- Wait event trends
- SQL statistics
- Instance activity
- I/O breakdown

Key Takeaway

AWR/Statspack bridges the gap between momentary observations and historical patterns. It is the DBA’s strategic tool to:

Validate if an issue is systemic or isolated
Track workload spikes
Correlate performance degradation with environmental or workload changes

Armed with this historical lens, senior DBAs can elevate diagnostics from symptomatic firefighting to root-cause engineering—enabling stable, scalable performance tuning initiatives.

Step 4: Execution Plan Review – Decoding the Optimizer’s Strategy

With the problematic SQL statements identified through session analysis and AWR/Statspack reports, the next critical step is to inspect how Oracle’s Cost-Based Optimizer (CBO) is executing those queries. The execution plan is essentially a blueprint of the optimizer’s strategy, and analyzing it reveals whether the chosen access paths and join methods are optimal—or inadvertently leading to performance degradation.

4.1 Retrieve the Execution Plan for Target SQL_ID

Use DBMS_XPLAN.DISPLAY_CURSOR to get the most recent execution plan for a given SQL_ID:

SELECT * 
FROM   TABLE(DBMS_XPLAN.DISPLAY_CURSOR('<sql_id>', NULL, 'ALLSTATS LAST'));

This displays:

Actual vs. estimated row counts (E-Rows vs. A-Rows)
Execution operations (full scans, index access, joins)
Buffer gets, I/O cost, and execution time
Predicate filters and access predicates

Note: The ALLSTATS LAST format is invaluable—it overlays actual runtime statistics with the plan, enabling accurate assessment of plan efficiency.

4.2 Key Red Flags to Watch For

When reviewing the execution plan, focus on these common anti-patterns:

Full Table Scans on Large Tables
Acceptable for small tables or when fetching large result sets. Otherwise, it may signal missing indexes or bad filtering logic.
Nested Loop Joins on Large Data Sets
Effective for small rowsets but disastrous at scale. Prefer hash joins or merge joins for bulk processing.
Cartesian Joins
Usually a mistake—check for missing join conditions.
High Cost and Cardinality Mismatches
If estimated rows (E-Rows) differ drastically from actual rows (A-Rows), it indicates stale or inaccurate optimizer statistics.
Excessive Buffer Gets or I/O Reads
Points to inefficient access paths, lack of partition pruning, or improper join order.

4.3 Use Real-Time SQL Monitoring (If Licensed)

For long-running SQL (over 5 seconds by default), use Real-Time SQL Monitoring via:

SELECT DBMS_SQLTUNE.REPORT_SQL_MONITOR(sql_id => '<sql_id>', type => 'TEXT', report_level => 'ALL') 
FROM dual;

Or, view interactively in Oracle Enterprise Manager (OEM).

Benefits:

Visual progress of execution
Execution timeline per step
Wait events and row sources in real time

This is especially useful for batch processes or complex analytical queries where execution plans are dynamic.

4.4 Plan Stability and Regression Handling

Check if the execution plan has regressed due to plan drift:

Use DBA_HIST_SQL_PLAN to review historical plan changes.
Use SQL Plan Baselines (SPB) to freeze known good plans.
Use SQL Profiles or SQL Patches to guide the optimizer if needed.

SELECT plan_hash_value, sql_id, timestamp
FROM   dba_hist_sql_plan
WHERE  sql_id = '<sql_id>';

If plan instability is confirmed:

Re-gather statistics with appropriate granularity
Evaluate OPTIMIZER_FEATURES_ENABLE and dynamic sampling
Consider hinting or SPB as a last resort

4.5 Statistics Health Check

Execution plans rely on optimizer statistics. If statistics are stale or missing:

Use DBMS_STATS.REPORT_STATS_OPERATIONS to verify recent activity
Re-gather stats using:

EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname => 'SCHEMA', tabname => 'TABLE_NAME', cascade => TRUE);

Also verify:

Column histograms
Table and index block counts
Partition-level stats (if applicable)

Key Takeaway

Execution plan analysis is where diagnosis meets prescription. It enables DBAs to:

Validate optimizer decisions
Detect and resolve access path inefficiencies
Restore predictable performance through targeted tuning

By mastering execution plan diagnostics, a senior DBA transitions from reactive tuning to predictive optimization—ensuring not just performance recovery but sustained scalability.

Step 5: System-Level Metrics – Evaluating the Infrastructure Footprint

After isolating issues at the session and SQL level, it’s crucial to broaden the lens and assess system-level metrics. Oracle performance problems are often symptoms of deeper systemic issues—CPU starvation, memory pressure, I/O latency, or network congestion. This step helps ensure that the root cause isn’t hiding in the underlying infrastructure stack.

5.1 CPU Utilization and Load Averages

High CPU usage can lead to process queuing, slower parse times, and session stalls.

On Linux/Unix:

top
vmstat 5 5
mpstat -P ALL 5

What to look for:

CPU %us (user) vs %sy (system) vs %wa (I/O wait)
Load average relative to number of CPU cores
High context switches or run queue saturation

Within Oracle:

SELECT * 
FROM   v$osstat 
WHERE  stat_name LIKE '%CPU%';

Correlate this with AWR’s Host CPU Usage and Instance CPU Usage sections.

5.2 Memory and Swap Pressure

Memory pressure leads to excessive PGA/SGA swapping, degrading query performance.

On Linux/Unix:

free -g
vmstat 5 5

Indicators of concern:

High swap usage with low free memory
Frequent page-ins/page-outs
Memory fragmentation or lack of HugePages

Within Oracle:
Check PGA/SGA metrics:

SELECT name, value
FROM   v$pgastat;

SELECT component, current_size, user_specified_size
FROM   v$sga_dynamic_components;

Examine AWR’s PGA Memory Advisory and Memory Statistics to determine if memory targets are undersized or misallocated.

5.3 I/O Latency and Throughput

Sluggish storage can paralyze even the most optimized queries.

OS Tools:

iostat -x 5

Oracle Views:

SELECT name, read_time, write_time, reads, writes
FROM   v$filestat;

SELECT file_id, file_name, phyrds, phywrts, readtim, writetim
FROM   v$datafile df JOIN v$filestat fs ON df.file# = fs.file#;

Check AWR’s Tablespace I/O Stats and File I/O Stats for high latency or uneven distribution.

Tip: Look for files or tablespaces with disproportionately high read/write times relative to activity levels—these are usually hot spots or misaligned I/O paths.

5.4 Network Latency (For Distributed Applications)

When application and database tiers are distributed:

Validate interconnect and listener performance
Use tnsping, ping, and traceroute to check latency
Review application logs for timeout exceptions or retries

Monitor listener logs and connection wait events like SQL*Net more data from client/server.

5.5 Concurrent Load and Resource Contention

If multiple applications or background jobs run concurrently:

Review v$resource_limit for exhausted limits
Evaluate system usage using AWR’s Instance Activity Stats and Resource Limit sections
Cross-check with crontab or job scheduler timelines

Key Takeaway

System-level diagnostics anchor your performance triage in real resource constraints. Even well-tuned SQL cannot overcome:

CPU overcommitment
Swap thrashing
I/O saturation
Memory starvation

By correlating Oracle metrics with OS-level indicators, a senior DBA elevates root cause analysis from component-level fixes to full-stack awareness—ensuring that the database operates within a healthy ecosystem.

Step 6: Application and Code-Level Review – Aligning Database Strategy with Application Design

After a detailed investigation at the database and infrastructure layers, one must not overlook a critical dimension: the application layer. Even with a healthy database and optimal hardware, suboptimal application design, ORM misuse, or improper database access patterns can undermine performance and scalability.

This step bridges the gap between database operations and application architecture, encouraging collaboration rather than siloed troubleshooting.

6.1 Understand the Application Context

Begin by engaging the application owner or development team to answer key contextual questions:

Is the slowness isolated to specific pages, APIs, or user journeys?
Is the slowdown constant or intermittent (e.g., peak hours)?
Has there been a recent code release, config change, or deployment?

Understanding the functional context often uncovers logical inefficiencies invisible from a purely technical lens.

6.2 Examine Bind Variable Usage

Poor use of bind variables can cause hard parsing, cursor cache fragmentation, and plan instability.

Run:

SELECT sql_id, force_matching_signature, parsing_schema_name,
       executions, parse_calls, module
FROM   v$sql
WHERE  parsing_schema_name = '<app_schema>'
AND    executions > 100
ORDER BY parse_calls DESC;

Indicators of bind variable misuse:

Parse calls ≈ Executions (should be much less)
High number of child cursors (v$sql_shared_cursor)
SQL text with literals instead of placeholders

Solution:

Enforce bind variable use in application frameworks
Use CURSOR_SHARING=FORCE cautiously as a short-term mitigation

6.3 Evaluate Connection Pooling Strategy

Improper connection pooling leads to:

Excessive connect/disconnect overhead
Idle session saturation
Inconsistent session-level parameters

Best practices:

Use a mid-tier connection pool (e.g., UCP, HikariCP, WebLogic)
Ensure idle timeouts are configured (INACTIVE_TIMEOUT, IDLE_TIME)
Validate session recycling and health checks

Verify:

SELECT username, program, status, count(*) 
FROM   v$session 
GROUP BY username, program, status;

Excessive INACTIVE sessions often point to pooling misconfigurations.

6.4 Trace Specific User Workflows

If the application team identifies a reproducible issue (e.g., submitting an order takes 15 seconds), enable detailed tracing at the session level:

EXEC DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(client_id => '<client_id>');
-- Or
EXEC DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE(service_name => '<service>', module_name => '<module>', action_name => '<action>');

Analyze the resulting trace files with tkprof or use Oracle Trace Analyzer to identify:

Repeated lookups or unnecessary queries
Poor join strategies
Delayed commits, or over-frequent commits
Redundant network round-trips

6.5 Collaborate on Code Refactoring and Query Optimization

Sometimes, it’s the application logic itself that needs re-engineering:

Avoid “SELECT N+1” query patterns
Reduce round-trips by batching DML operations
Replace procedural row-by-row processing with set-based SQL
Leverage PL/SQL bulk operations (FORALL, BULK COLLECT)

A collaborative review between DBAs and developers often yields the highest ROI—especially when aligned with query tuning recommendations from previous steps.

6.6 Address ORM Abstractions Cautiously

Object-Relational Mappers (e.g., Hibernate, Django ORM, Entity Framework) abstract SQL generation, but often lead to:

Suboptimal query structures
Lazy loading pitfalls
Inefficient pagination strategies

Mitigation:

Enable SQL logging to audit queries
Use ORM-native query tuning options (e.g., JOIN FETCH, query hints)
Consider hand-optimized queries for high-impact paths

Key Takeaway

Application-layer inefficiencies often manifest as database slowness. Step 6 ensures that database tuning efforts are not undone by flawed access patterns, poor connection hygiene, or misbehaving code.

This step exemplifies the shift from “firefighting” to cross-functional performance engineering, positioning the DBA as a strategic partner in application reliability and scalability.

Step 7: Remediation Recommendations

Based on findings:

Tune expensive SQL.
Add or adjust indexes.
Update stale optimizer stats.
Resolve locking/contention issues.
Allocate additional resources or spread load.

Step 8: Postmortem and Baseline

Objective: Institutionalize learning.

Document the issue and solution.
Establish baselines using AWR snapshots or Oracle Enterprise Manager.
Consider proactive monitoring (alerts on CPU, I/O, wait classes).

Conclusion: A Strategic, End-to-End Framework for Oracle Performance Triage

In the dynamic landscape of enterprise systems, database performance issues rarely stem from a single root cause. The six-step framework outlined in this series is designed to equip Oracle professionals with a methodical, multi-layered approach—enabling faster diagnostics, deeper insights, and more sustainable resolutions.

Preliminary Assessment establishes the context and initial scope by leveraging user feedback and system-wide symptoms.
Session-Level Diagnosis pinpoints the real-time behavior of application sessions and isolates problematic workloads.
Top SQL Identification focuses efforts on the highest-impact statements, often responsible for the majority of resource consumption.
Execution Plan Review unveils the optimizer’s decision-making, highlighting tuning opportunities and inefficiencies.
System-Level Metrics provide foundational validation of the infrastructure’s ability to support current workloads effectively.
Application and Code-Level Review closes the loop, addressing the origin of database interaction patterns and collaborating with development teams for holistic resolution.

By leveraging AWR/ASH insights, execution plan intelligence, and OS-level diagnostics—coupled with a collaborative posture toward developers and infrastructure teams—a senior Oracle DBA transforms from a reactive troubleshooter into a strategic performance architect.

In essence, database performance tuning is not a one-off intervention; it is a continuous discipline. By embracing a structured, diagnostic-first mindset across both technical and organizational silos, one ensures not only timely remediation—but also architectural resilience and long-term scalability.

Blog Post

System Slowness in Oracle? Here’s How to Troubleshoot Like a PRO

Step 1: Preliminary Assessment – Framing the Problem

1.1 Understand the Business Impact

1.2 Temporal and Environmental Context

1.3 Identify the Affected Components

1.4 Initial Triage – Quick System Health Check

1.5 Communication and Expectations Management

Step 2: Session-Level Diagnostics – Identifying the Frontline Symptoms

2.1 Identify Active Sessions and Their Wait Events

2.2 Analyze Top Wait Events

2.3 Correlate SQL_ID with Execution Plan and Statistics

2.4 Examine Blocking and Long-Running Sessions

2.5 Optional: Enable Real-Time Monitoring for Critical Sessions

Key Takeaway

Step 3: AWR or Statspack Analysis – Building the Performance Narrative

3.1 Choosing Between AWR and Statspack

3.2 Generating AWR Reports

3.3 Interpreting Findings in a Business Context

3.4 Using ADDM (If Licensed)

3.5 Statspack Workflow (For Unlicensed Environments)

Step 4: Execution Plan Review – Decoding the Optimizer’s Strategy

4.1 Retrieve the Execution Plan for Target SQL_ID

4.2 Key Red Flags to Watch For

4.3 Use Real-Time SQL Monitoring (If Licensed)

4.4 Plan Stability and Regression Handling

4.5 Statistics Health Check

Key Takeaway

Step 5: System-Level Metrics – Evaluating the Infrastructure Footprint

5.1 CPU Utilization and Load Averages

5.2 Memory and Swap Pressure

5.3 I/O Latency and Throughput

5.4 Network Latency (For Distributed Applications)

5.5 Concurrent Load and Resource Contention

Key Takeaway

Step 6: Application and Code-Level Review – Aligning Database Strategy with Application Design

6.1 Understand the Application Context

6.2 Examine Bind Variable Usage

6.3 Evaluate Connection Pooling Strategy

6.4 Trace Specific User Workflows

6.5 Collaborate on Code Refactoring and Query Optimization

6.6 Address ORM Abstractions Cautiously

Key Takeaway

Step 7: Remediation Recommendations

Step 8: Postmortem and Baseline

Conclusion: A Strategic, End-to-End Framework for Oracle Performance Triage

Oracle DBA

No Comments

Leave a Message Cancel Reply

Learn OCI in Hindi

Register for Early Access

Search

Recent Posts

Categories

Tags

Archives