Daily Data Guard Monitoring Checklist – Bash automation script
Below is a copy-paste ready Bash automation script that runs the Daily Data Guard Monitoring Checklist (OCI CLI + SQL*Plus), aggregates the results, and emails them to a distribution list. The script is written to be robust, modular, and suitable for scheduling via cron
. Explanations and deployment notes follow the script so you can paste this into a blog post and ship it to your team.
Use only headings and spacing for separation, avoid horizontal rules, emojis, and extra styling as requested.
Script: copy-paste into a file (for example /opt/dbadmin/dg_daily_check.sh
), edit the variables, make executable, and schedule.
#!/bin/bash
#
# dg_daily_check.sh
# Daily Data Guard Monitoring Automation
# Requires: OCI CLI configured, sqlplus available, mailx/sendmail configured
#
# Usage: /opt/dbadmin/dg_daily_check.sh
# Cron example: 0 7 * * * /opt/dbadmin/dg_daily_check.sh >/dev/null 2>&1
set -o pipefail
# -------------------------
# Configuration - EDIT THESE
# -------------------------
COMPARTMENT_OCID="ocid1.compartment.oc1..aaaaaaaaexample"
PRIMARY_DB_OCID="ocid1.database.oc1.ap-mumbai-1.abcdexample"
DATA_GUARD_ASSOC_ID="ocid1.dataguardassociation.oc1.ap-mumbai-1.abcdefexample"
# SQL*Plus connection string - prefer wallet / secure method
# Examples:
# 1) Using EZCONNECT: "dbadmin/password@//dbhost.odba.in:1521/PRODDB"
# 2) Using TNS alias: "dbadmin/password@PRODDB_TNS"
# 3) Using wallet + OS authentication - modify as appropriate
SQLPLUS_CONN_PRIMARY="dbadmin/YourSecurePassword@//primary-host:1521/PRODDB"
SQLPLUS_CONN_STANDBY="dbadmin/YourSecurePassword@//standby-host:1521/STANDBYDB"
# Email settings
MAIL_TO="dba-team@odba.in"
MAIL_FROM="oci-monitor@odba.in"
MAIL_SUBJECT="Daily Data Guard Check - $(date -u '+%Y-%m-%d %H:%M UTC')"
# Working directory for temporary files
WORKDIR="/tmp/dg_check"
mkdir -p "${WORKDIR}"
TIMESTAMP="$(date -u '+%Y-%m-%d %H:%M:%S UTC')"
REPORT="${WORKDIR}/dg_report_$(date -u '+%Y%m%d').txt"
> "${REPORT}"
# Timeout for OCI CLI calls (seconds)
OCI_TIMEOUT=30
# -------------------------
# Helper functions
# -------------------------
log() {
local msg="$1"
echo "${TIMESTAMP} : ${msg}" | tee -a "${REPORT}"
}
run_oci() {
# wrapper that calls oci and traps failures but continues
local cmd="$1"
echo "=> OCI CLI: ${cmd}" | tee -a "${REPORT}"
# Use timeout to avoid blocking forever (requires coreutils timeout)
timeout ${OCI_TIMEOUT} bash -c "${cmd}" 2>>"${WORKDIR}/oci_errors.log" | tee -a "${REPORT}" || echo "OCI command failed or timed out (see oci_errors.log)" | tee -a "${REPORT}"
echo >> "${REPORT}"
}
run_sqlplus() {
# runs a SQL block via sqlplus and appends to the report
# usage: run_sqlplus "$SQL_CONNECT" <<'SQL' ... SQL
local conn="$1"
shift
echo "=> SQL*Plus (${conn%%/*}@...):" | tee -a "${REPORT}"
sqlplus -s "${conn}" <<SQL >>"${REPORT}" 2>>"${WORKDIR}/sql_errors.log"
WHENEVER SQLERROR CONTINUE
SET LINESIZE 200
SET PAGESIZE 100
SET TRIMOUT ON
SET TRIMSPOOL ON
$*
EXIT
SQL
echo >> "${REPORT}"
}
# -------------------------
# Begin checks
# -------------------------
echo "Daily Data Guard Monitoring Report" > "${REPORT}"
echo "Generated: ${TIMESTAMP}" >> "${REPORT}"
echo >> "${REPORT}"
log "Step 1: Check Data Guard association status (OCI CLI)"
run_oci "oci db data-guard-association list --database-id ${PRIMARY_DB_OCID} --compartment-id ${COMPARTMENT_OCID} --output json"
log "Step 2: Check archive destination status on primary (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL dest_id FORMAT 999
COL status FORMAT A15
COL protection_mode FORMAT A22
COL synchronization_status FORMAT A15
SELECT dest_id,
status,
protection_mode,
synchronization_status,
error
FROM v$archive_dest_status
WHERE dest_id > 1;
SQLBLOCK
log "Step 3: Verify managed recovery process on standby (SQL)"
run_sqlplus "${SQLPLUS_CONN_STANDBY}" <<'SQLBLOCK'
COL process FORMAT A10
COL status FORMAT A12
SELECT process, status, thread#, sequence#
FROM v$managed_standby
WHERE process LIKE 'MRP%';
SQLBLOCK
log "Step 4: Fetch Data Guard transport and apply lag (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL name FORMAT A20
COL value FORMAT A10
COL unit FORMAT A15
SELECT name, value, unit
FROM v$dataguard_stats
WHERE name IN ('transport lag', 'apply lag');
SQLBLOCK
log "Step 5: Check Data Guard errors in alert log (last 24 hours) (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL originating_timestamp FORMAT A22
COL message_text FORMAT A200
SELECT originating_timestamp, message_text
FROM v$diag_alert_ext
WHERE message_text LIKE '%ORA-%'
AND originating_timestamp > SYSDATE - 1
ORDER BY originating_timestamp DESC;
SQLBLOCK
log "Step 6: Check database roles and open modes on primary and standby (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
SELECT 'PRIMARY' role_indicator, database_role, open_mode, protection_mode
FROM v$database;
SQLBLOCK
run_sqlplus "${SQLPLUS_CONN_STANDBY}" <<'SQLBLOCK'
SELECT 'STANDBY' role_indicator, database_role, open_mode, protection_mode
FROM v$database;
SQLBLOCK
log "Step 7: Fetch Data Guard metrics from OCI Monitoring (transport/apply lag)"
run_oci "oci monitoring metric-data summarize-metrics-data \
--compartment-id ${COMPARTMENT_OCID} \
--namespace oci_database \
--query-text \"DataGuardTransportLag[1m].mean(), DataGuardApplyLag[1m].mean()\" \
--start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--output json"
log "Step 8: Summary counts - how many destinations not VALID (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
SELECT COUNT(*) AS non_valid_destinations
FROM v$archive_dest_status
WHERE status <> 'VALID'
AND dest_id > 1;
SQLBLOCK
# -------------------------
# Finalize report and email
# -------------------------
EMAIL_BODY_FILE="${WORKDIR}/dg_email_body_$(date -u +%Y%m%d).txt"
{
echo "Daily Data Guard Check"
echo "Timestamp: ${TIMESTAMP}"
echo
cat "${REPORT}"
echo
echo "OCI CLI errors (if any):"
if [ -s "${WORKDIR}/oci_errors.log" ]; then
tail -n 200 "${WORKDIR}/oci_errors.log"
else
echo "None"
fi
echo
echo "SQL*Plus errors (if any):"
if [ -s "${WORKDIR}/sql_errors.log" ]; then
tail -n 200 "${WORKDIR}/sql_errors.log"
else
echo "None"
fi
} > "${EMAIL_BODY_FILE}"
log "Step 9: Send email to ${MAIL_TO}"
# Try mailx first, then fallback to sendmail
if command -v mailx >/dev/null 2>&1; then
mailx -s "${MAIL_SUBJECT}" -r "${MAIL_FROM}" "${MAIL_TO}" < "${EMAIL_BODY_FILE}"
log "Email sent using mailx"
elif command -v sendmail >/dev/null 2>&1; then
# Minimal sendmail usage
{
echo "From: ${MAIL_FROM}"
echo "To: ${MAIL_TO}"
echo "Subject: ${MAIL_SUBJECT}"
echo
cat "${EMAIL_BODY_FILE}"
} | sendmail -t
log "Email sent using sendmail"
else
log "No mailx or sendmail found. Please configure email client. Report saved at ${EMAIL_BODY_FILE}"
fi
log "Daily Data Guard Monitoring Completed"
echo "Report: ${REPORT}"
Explanation for blog post
Purpose and scope
This script performs the Daily Data Guard Monitoring Checklist automatically. It runs OCI CLI checks and SQL queries on the primary and standby databases, aggregates outputs and errors, and emails a consolidated report to your DBA distribution. It is intended to run as a daily cron job or via automation/orchestration pipeline.
Security considerations
- Credentials: Do not hard-code production passwords in scripts. Use secure methods:
• Oracle Wallets (preferred for SQL*Net authentication).
• OS-level credential vaults (HashiCorp Vault, AWS Secrets Manager, OCI Vault).
• Usesqlplus /@TNS_ALIAS
with wallet-based authentication or environment-based credentials. - OCI CLI: Ensure the OCI CLI profile used has the minimum required privileges for monitoring and Data Guard read operations.
- File permissions: Store the script and temporary files with restricted permissions (
chmod 700
for script,chmod 600
for credentials).
Deployment steps
- Create script file, edit variables:
•COMPARTMENT_OCID
,PRIMARY_DB_OCID
,DATA_GUARD_ASSOC_ID
.
•SQLPLUS_CONN_PRIMARY
andSQLPLUS_CONN_STANDBY
— replace with wallet or secure connection strings.
•MAIL_TO
,MAIL_FROM
. - Make executable:
•chmod 700 /opt/dbadmin/dg_daily_check.sh
- Test manually:
•sudo -u dbadmin /opt/dbadmin/dg_daily_check.sh
• Inspect${WORKDIR}
and report file. - Schedule via cron:
•crontab -e
and add:
0 07 * * * /opt/dbadmin/dg_daily_check.sh >/dev/null 2>&1
• This runs the script daily at 07:00 UTC (adjust to your timezone).
Customizations and enhancements
- Alert thresholds: parse metric outputs and trigger high-priority alerts if
transport lag
orapply lag
exceed defined thresholds. - Persistence: push outputs into a central logging system (ElasticSearch, OCI Logging, Splunk) for historical trending.
- Notification channels: integrate with Slack, MS Teams, PagerDuty instead of email.
- Parallelization: run SQL checks concurrently for faster completion on large estates.
- Retry & backoff: add retries for intermittent OCI CLI failures.
Sample outputs (illustrative)
Below are small excerpts you will see in the report (simulated):
OCI CLI Data Guard association excerpt (JSON):
{
“data”: [
{
“id”: “ocid1.dataguardassociation.oc1.ap-mumbai-1.abcdefexample”,
“lifecycle-state”: “AVAILABLE”,
“role”: “PRIMARY”,
“peer-role”: “STANDBY”,
“protection-mode”: “MAXIMUM_PERFORMANCE”,
“transport-type”: “ASYNC”
}
]
}
v$archive_dest_status sample:
DEST_ID STATUS PROTECTION_MODE SYNCHRONIZATION_STATUS ERROR
2 VALID MAXIMUM PERFORMANCE SYNCHRONIZED
v$managed_standby sample:
PROCESS STATUS THREAD# SEQUENCE#
MRP0 APPLYING 1 14523
v$dataguard_stats sample:
NAME VALUE UNIT
transport lag +00 00:00:04
apply lag +00 00:00:07
Troubleshooting tips
- If the OCI CLI times out, increase
OCI_TIMEOUT
or check network connectivity from the host running the script. - If SQL*Plus fails to connect, verify TNS names, wallet, or that the database listener is reachable from the monitoring host.
- If email fails, ensure
mailx
orsendmail
is installed and relaying is configured for${MAIL_FROM}
.
Operational playbook snippet (example response)
If the script detects:
• apply lag
> 300 seconds:
• Investigate long running apply on standby: v$managed_standby
, review alert logs on standby.
• Check transport errors: v$archive_dest_status
error column.
• If necessary, escalate to network/storage team to validate connectivity/IO performance.
Closing notes
This script is a practical foundation for daily Data Guard monitoring in Oracle Cloud. It standardizes checks, improves mean time to detect (MTTD), and creates a single report consumable by DBAs and ops teams. Extend it with alerting, secure credential management, and integration into your existing observability platform to build a production-grade monitoring pipeline.
No Comments