Daily Data Guard Monitoring Checklist – Bash automation script

Below is a copy-paste ready Bash automation script that runs the Daily Data Guard Monitoring Checklist (OCI CLI + SQL*Plus), aggregates the results, and emails them to a distribution list. The script is written to be robust, modular, and suitable for scheduling via cron. Explanations and deployment notes follow the script so you can paste this into a blog post and ship it to your team.

Use only headings and spacing for separation, avoid horizontal rules, emojis, and extra styling as requested.

Script: copy-paste into a file (for example /opt/dbadmin/dg_daily_check.sh), edit the variables, make executable, and schedule.

#!/bin/bash
#
# dg_daily_check.sh
# Daily Data Guard Monitoring Automation
# Requires: OCI CLI configured, sqlplus available, mailx/sendmail configured
#
# Usage: /opt/dbadmin/dg_daily_check.sh
# Cron example: 0 7 * * * /opt/dbadmin/dg_daily_check.sh >/dev/null 2>&1

set -o pipefail

# -------------------------
# Configuration - EDIT THESE
# -------------------------
COMPARTMENT_OCID="ocid1.compartment.oc1..aaaaaaaaexample"
PRIMARY_DB_OCID="ocid1.database.oc1.ap-mumbai-1.abcdexample"
DATA_GUARD_ASSOC_ID="ocid1.dataguardassociation.oc1.ap-mumbai-1.abcdefexample"

# SQL*Plus connection string - prefer wallet / secure method
# Examples:
# 1) Using EZCONNECT: "dbadmin/password@//dbhost.odba.in:1521/PRODDB"
# 2) Using TNS alias: "dbadmin/password@PRODDB_TNS"
# 3) Using wallet + OS authentication - modify as appropriate
SQLPLUS_CONN_PRIMARY="dbadmin/YourSecurePassword@//primary-host:1521/PRODDB"
SQLPLUS_CONN_STANDBY="dbadmin/YourSecurePassword@//standby-host:1521/STANDBYDB"

# Email settings
MAIL_TO="dba-team@odba.in"
MAIL_FROM="oci-monitor@odba.in"
MAIL_SUBJECT="Daily Data Guard Check - $(date -u '+%Y-%m-%d %H:%M UTC')"

# Working directory for temporary files
WORKDIR="/tmp/dg_check"
mkdir -p "${WORKDIR}"
TIMESTAMP="$(date -u '+%Y-%m-%d %H:%M:%S UTC')"
REPORT="${WORKDIR}/dg_report_$(date -u '+%Y%m%d').txt"
> "${REPORT}"

# Timeout for OCI CLI calls (seconds)
OCI_TIMEOUT=30

# -------------------------
# Helper functions
# -------------------------
log() {
  local msg="$1"
  echo "${TIMESTAMP} : ${msg}" | tee -a "${REPORT}"
}

run_oci() {
  # wrapper that calls oci and traps failures but continues
  local cmd="$1"
  echo "=> OCI CLI: ${cmd}" | tee -a "${REPORT}"
  # Use timeout to avoid blocking forever (requires coreutils timeout)
  timeout ${OCI_TIMEOUT} bash -c "${cmd}" 2>>"${WORKDIR}/oci_errors.log" | tee -a "${REPORT}" || echo "OCI command failed or timed out (see oci_errors.log)" | tee -a "${REPORT}"
  echo >> "${REPORT}"
}

run_sqlplus() {
  # runs a SQL block via sqlplus and appends to the report
  # usage: run_sqlplus "$SQL_CONNECT" <<'SQL' ... SQL
  local conn="$1"
  shift
  echo "=> SQL*Plus (${conn%%/*}@...):" | tee -a "${REPORT}"
  sqlplus -s "${conn}" <<SQL >>"${REPORT}" 2>>"${WORKDIR}/sql_errors.log"
WHENEVER SQLERROR CONTINUE
SET LINESIZE 200
SET PAGESIZE 100
SET TRIMOUT ON
SET TRIMSPOOL ON
$*
EXIT
SQL
  echo >> "${REPORT}"
}

# -------------------------
# Begin checks
# -------------------------
echo "Daily Data Guard Monitoring Report" > "${REPORT}"
echo "Generated: ${TIMESTAMP}" >> "${REPORT}"
echo >> "${REPORT}"

log "Step 1: Check Data Guard association status (OCI CLI)"
run_oci "oci db data-guard-association list --database-id ${PRIMARY_DB_OCID} --compartment-id ${COMPARTMENT_OCID} --output json"

log "Step 2: Check archive destination status on primary (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL dest_id FORMAT 999
COL status FORMAT A15
COL protection_mode FORMAT A22
COL synchronization_status FORMAT A15
SELECT dest_id,
       status,
       protection_mode,
       synchronization_status,
       error
FROM   v$archive_dest_status
WHERE  dest_id > 1;
SQLBLOCK

log "Step 3: Verify managed recovery process on standby (SQL)"
run_sqlplus "${SQLPLUS_CONN_STANDBY}" <<'SQLBLOCK'
COL process FORMAT A10
COL status FORMAT A12
SELECT process, status, thread#, sequence#
FROM   v$managed_standby
WHERE  process LIKE 'MRP%';
SQLBLOCK

log "Step 4: Fetch Data Guard transport and apply lag (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL name FORMAT A20
COL value FORMAT A10
COL unit FORMAT A15
SELECT name, value, unit
FROM   v$dataguard_stats
WHERE  name IN ('transport lag', 'apply lag');
SQLBLOCK

log "Step 5: Check Data Guard errors in alert log (last 24 hours) (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
COL originating_timestamp FORMAT A22
COL message_text FORMAT A200
SELECT originating_timestamp, message_text
FROM   v$diag_alert_ext
WHERE  message_text LIKE '%ORA-%'
AND    originating_timestamp > SYSDATE - 1
ORDER BY originating_timestamp DESC;
SQLBLOCK

log "Step 6: Check database roles and open modes on primary and standby (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
SELECT 'PRIMARY' role_indicator, database_role, open_mode, protection_mode
FROM   v$database;
SQLBLOCK

run_sqlplus "${SQLPLUS_CONN_STANDBY}" <<'SQLBLOCK'
SELECT 'STANDBY' role_indicator, database_role, open_mode, protection_mode
FROM   v$database;
SQLBLOCK

log "Step 7: Fetch Data Guard metrics from OCI Monitoring (transport/apply lag)"
run_oci "oci monitoring metric-data summarize-metrics-data \
--compartment-id ${COMPARTMENT_OCID} \
--namespace oci_database \
--query-text \"DataGuardTransportLag[1m].mean(), DataGuardApplyLag[1m].mean()\" \
--start-time $(date -u -d '10 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--output json"

log "Step 8: Summary counts - how many destinations not VALID (SQL)"
run_sqlplus "${SQLPLUS_CONN_PRIMARY}" <<'SQLBLOCK'
SELECT COUNT(*) AS non_valid_destinations
FROM   v$archive_dest_status
WHERE  status <> 'VALID'
AND    dest_id > 1;
SQLBLOCK

# -------------------------
# Finalize report and email
# -------------------------
EMAIL_BODY_FILE="${WORKDIR}/dg_email_body_$(date -u +%Y%m%d).txt"
{
  echo "Daily Data Guard Check"
  echo "Timestamp: ${TIMESTAMP}"
  echo
  cat "${REPORT}"
  echo
  echo "OCI CLI errors (if any):"
  if [ -s "${WORKDIR}/oci_errors.log" ]; then
    tail -n 200 "${WORKDIR}/oci_errors.log"
  else
    echo "None"
  fi
  echo
  echo "SQL*Plus errors (if any):"
  if [ -s "${WORKDIR}/sql_errors.log" ]; then
    tail -n 200 "${WORKDIR}/sql_errors.log"
  else
    echo "None"
  fi
} > "${EMAIL_BODY_FILE}"

log "Step 9: Send email to ${MAIL_TO}"
# Try mailx first, then fallback to sendmail
if command -v mailx >/dev/null 2>&1; then
  mailx -s "${MAIL_SUBJECT}" -r "${MAIL_FROM}" "${MAIL_TO}" < "${EMAIL_BODY_FILE}"
  log "Email sent using mailx"
elif command -v sendmail >/dev/null 2>&1; then
  # Minimal sendmail usage
  {
    echo "From: ${MAIL_FROM}"
    echo "To: ${MAIL_TO}"
    echo "Subject: ${MAIL_SUBJECT}"
    echo
    cat "${EMAIL_BODY_FILE}"
  } | sendmail -t
  log "Email sent using sendmail"
else
  log "No mailx or sendmail found. Please configure email client. Report saved at ${EMAIL_BODY_FILE}"
fi

log "Daily Data Guard Monitoring Completed"
echo "Report: ${REPORT}"

Explanation for blog post

Purpose and scope

This script performs the Daily Data Guard Monitoring Checklist automatically. It runs OCI CLI checks and SQL queries on the primary and standby databases, aggregates outputs and errors, and emails a consolidated report to your DBA distribution. It is intended to run as a daily cron job or via automation/orchestration pipeline.

Security considerations

  1. Credentials: Do not hard-code production passwords in scripts. Use secure methods:
    • Oracle Wallets (preferred for SQL*Net authentication).
    • OS-level credential vaults (HashiCorp Vault, AWS Secrets Manager, OCI Vault).
    • Use sqlplus /@TNS_ALIAS with wallet-based authentication or environment-based credentials.
  2. OCI CLI: Ensure the OCI CLI profile used has the minimum required privileges for monitoring and Data Guard read operations.
  3. File permissions: Store the script and temporary files with restricted permissions (chmod 700 for script, chmod 600 for credentials).

Deployment steps

  1. Create script file, edit variables:
    COMPARTMENT_OCID, PRIMARY_DB_OCID, DATA_GUARD_ASSOC_ID.
    SQLPLUS_CONN_PRIMARY and SQLPLUS_CONN_STANDBY — replace with wallet or secure connection strings.
    MAIL_TO, MAIL_FROM.
  2. Make executable:
    chmod 700 /opt/dbadmin/dg_daily_check.sh
  3. Test manually:
    sudo -u dbadmin /opt/dbadmin/dg_daily_check.sh
    • Inspect ${WORKDIR} and report file.
  4. Schedule via cron:
    crontab -e and add:
    0 07 * * * /opt/dbadmin/dg_daily_check.sh >/dev/null 2>&1
    • This runs the script daily at 07:00 UTC (adjust to your timezone).

Customizations and enhancements

  1. Alert thresholds: parse metric outputs and trigger high-priority alerts if transport lag or apply lag exceed defined thresholds.
  2. Persistence: push outputs into a central logging system (ElasticSearch, OCI Logging, Splunk) for historical trending.
  3. Notification channels: integrate with Slack, MS Teams, PagerDuty instead of email.
  4. Parallelization: run SQL checks concurrently for faster completion on large estates.
  5. Retry & backoff: add retries for intermittent OCI CLI failures.

Sample outputs (illustrative)

Below are small excerpts you will see in the report (simulated):

OCI CLI Data Guard association excerpt (JSON):
{
“data”: [
{
“id”: “ocid1.dataguardassociation.oc1.ap-mumbai-1.abcdefexample”,
“lifecycle-state”: “AVAILABLE”,
“role”: “PRIMARY”,
“peer-role”: “STANDBY”,
“protection-mode”: “MAXIMUM_PERFORMANCE”,
“transport-type”: “ASYNC”
}
]
}

v$archive_dest_status sample:
DEST_ID STATUS PROTECTION_MODE SYNCHRONIZATION_STATUS ERROR
2 VALID MAXIMUM PERFORMANCE SYNCHRONIZED

v$managed_standby sample:
PROCESS STATUS THREAD# SEQUENCE#
MRP0 APPLYING 1 14523

v$dataguard_stats sample:
NAME VALUE UNIT
transport lag +00 00:00:04
apply lag +00 00:00:07

Troubleshooting tips

  1. If the OCI CLI times out, increase OCI_TIMEOUT or check network connectivity from the host running the script.
  2. If SQL*Plus fails to connect, verify TNS names, wallet, or that the database listener is reachable from the monitoring host.
  3. If email fails, ensure mailx or sendmail is installed and relaying is configured for ${MAIL_FROM}.

Operational playbook snippet (example response)

If the script detects:
apply lag > 300 seconds:
• Investigate long running apply on standby: v$managed_standby, review alert logs on standby.
• Check transport errors: v$archive_dest_status error column.
• If necessary, escalate to network/storage team to validate connectivity/IO performance.

Closing notes

This script is a practical foundation for daily Data Guard monitoring in Oracle Cloud. It standardizes checks, improves mean time to detect (MTTD), and creates a single report consumable by DBAs and ops teams. Extend it with alerting, secure credential management, and integration into your existing observability platform to build a production-grade monitoring pipeline.

Oracle DBA

Experienced OCM-certified Oracle Database Administrator with over 18 years of expertise in designing, implementing, and managing complex database solutions. My expertise spans performance optimization, security, and high-stakes solution implementation. Adept at managing complex environments with precision.

No Comments

    Leave a Message

    Your email address will not be published. All fields are mandatory. **