Remember your dreams

Remember your dreams
Remember your dreams
Showing posts with label Alert log. Show all posts
Showing posts with label Alert log. Show all posts

Friday, November 22, 2019

Alert log checker

Script to monitor the Alert Log for all instance on a server for errors and email if found. 


Even with the arrival (now years ago) of ADRCI (Automatic Diagnostic Repository Command Interpreter), I insist on monitoring the Oracle Alert Log. I will until it goes away. 

I have been running some form of this script since the days of 8i and still find it useful. Customize it to suit your specific needs. Better to know there is an issue before someone tells you. 

Add this to your crontab.
# Check Alert log for errors and email DBA
5,20,35,50 7-16 * * 0-7 /oracle/scripts/monitor/check_alert.sh

The check_alert.sh script


#!/bin/sh
# File-name: check_alert.sh
#-----------------------------------------------------
# Checks Oracle alert log files for all databases
# 1. Gets database name from oratab.
# 2. Checks ORA errors in the alert log file
# 3. Checks fail errors in the alert log file
# 4. Checks Fatal NI connect errors
# If error(s) found)then
# 5. Makes a copy of the alert file

# 6. Cleans the alert log file
# 7. Sends an e-mail with results to DBA

#-----------------------------------------------------

ORACLE_BASE=/oracle/app/oracle
export ORACLE_BASE
TMPDIR=/tmp
export TMPDIR
ORATAB=/etc/oratab
export ORATAB

#-----------------------------------------------
# Get the Oracle instances from the oratab file
# No blank lines
#-----------------------------------------------
cat ${ORATAB}/oratab | while read LINE
do
  case $LINE in
  \#*)            ;;      #comment-line in oratab
  *)
    ORACLE_SID=`echo $LINE | awk -F: '{print $1}' -`

    if [ "$ORACLE_SID" = '*' ] ; then
      ORACLE_SID=""
    fi

    export ORACLE_SID;
    ORACLE_HOME=`echo $LINE | awk -F: '{print $2}' -`; export ORACLE_HOME
    SHLIB_PATH=$ORACLE_HOME/lib:/usr/lib; export SHLIB_PATH
    LD_LIBRARY_PATH=$ORACLE_HOME/lib; export LD_LIBRARY_PATH

#------------------------------------------------------------------------
# Initialization
#------------------------------------------------------------------------
    l_err=0
    l_found=0
    l_date=`date '+%c'`
    l_filedate=`date '+%m%d%H%M'`

    l_log=${ORACLE_BASE}/diag/rdbms/${ORACLE_SID}/${ORACLE_SID}/trace/check_alert_${ORACLE_SID}.log

    l_alertfile=${ORACLE_BASE}/diag/rdbms/${ORACLE_SID}/${ORACLE_SID}/trace/alert_${ORACLE_SID}.log

    echo $l_date "*** log BEGIN ***" > $l_log
    echo "---------------------------------------------------------------------" >> $l_log
    echo "Script : "${0} >> $l_log
    echo "Database : "$ORACLE_SID >> $l_log
    echo "Server : "`uname -n` >> $l_log
    echo "Alert Log : "$l_alertfile >> $l_log
    echo "Copy To : "${l_alertfile}.${l_date} >> $l_log
    echo "---------------------------------------------------------------------" >> $l_log

#------------------------------------------------------------------------
# Verify the existance of the Oracle environment variables
#------------------------------------------------------------------------
    if test `env | grep ORACLE_SID | wc -l` -ne 1 ; then
      l_err=1
      echo "ORACLE_SID is not set \n" >> $l_log
    fi

    if test `env | grep ORACLE_HOME | wc -l` -ne 1 ; then
      l_err=1
      echo "ORACLE_HOME is not set \n" >> $l_log
    fi
#------------------------------------------------------------------------
# Check the alert log file for any errors and clean it
#------------------------------------------------------------------------
    if test -f ${l_alertfile} ; then

      if test `grep "ORA-" ${l_alertfile} | wc -l` -ne 0 ; then
        l_err=1
        l_found=1
        echo "There is an ORA- error in the Oracle alert log file!" >> $l_log
        grep "ORA-" ${l_alertfile} >> $l_log
      fi

      if test `grep -i "fail" ${l_alertfile} | wc -l` -ne 0 ; then
        l_err=1
        l_found=1
        echo "--------------------------------------------------------------" >> $l_log
        echo "There is a fail error in the Oracle alert log file!" >> $l_log
        grep -i "fail" ${l_alertfile} >> $l_log
      fi

      if test `grep -i "Fatal NI connect error" ${l_alertfile} | wc -l` -ne 0 ; then
         if test `grep -i "Fatal NI connect error" ${l_alertfile} | wc -l` -gt 25 ; then
            l_err=1
            l_found=1
            echo "--------------------------------------------------------------" >> $l_log
            echo "There is a fail error in the Oracle alert log file!" >> $l_log
            grep -i "Fatal NI connect error" ${l_alertfile} >> $l_log
            echo " " >> $l_log
            echo "Clients" >> $l_log
            grep -i "Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=" ${l_alertfile} >> $l_log
            grep -i "Time: " ${l_alertfile} >> $l_log
         fi

         if test $l_err -eq 0 ; then
           echo "There are NO errors in the alert log file" >> $l_log
         fi
      fi

# Make a copy of the alert log file only if it's not empty and there are errors
#------------------------------------------------------------------------------
         if test $l_err -eq 1 ; then
           if test `cat ${l_alertfile} | wc -l` -ne 0 ; then
             cat ${l_alertfile} >> ${l_alertfile}.${l_filedate}
#              rm ${l_alertfile}
#              touch ${l_alertfile}
           fi
         fi
     echo "--------------------------------------------------------------" >> $l_log
     echo ${l_date} "*** log END ***" >> $l_log
     fi # Check the alert log
#------------------------------------------------------------------------
#------------------------------------------------------------------------
# Send errors to DBA
#------------------------------------------------------------------------
         if test $l_err -eq 1 ; then
           mail -s "${ORACLE_SID} on `uname -n` : ERRORS in alert_$ORACLE_SID.log" "your_email@company.com" > /dev/null < $l_log
         fi

   esac
done

#-----------------------------------------------------
# End of script





Friday, May 18, 2012

Clear alerts in Oracle Enterprise Manager OEM

I am using 11g, but this work in 10g as well.
I had this annoying alert appearing under my alert tab on OEM for some time. A failed job, yes I knew it failed and fixed it that day. I waited for the alert to disappear, but alas, it did not. I acknowledged it and thought that was the end of that. Well it's been hanging around for some time and when I click on it the alert it take me to a jobs page with no information on how to clear.

I started hunting around for information on how to clear alerts that have no purpose in OEM and I found a lot of posting that said to log in as sysman and query the mgmt_current_severity and then use em_severity.delete_current_severity to remove the desired alert.

Okay that sounds good, but then I found a post by a Said Ahmed. He posted the following query which I found to be very useful so here it is.

First log in as SYSMAN or your repository owner.


select t.target_name
, t.target_type
, collection_timestamp
, message
, 'exec em_severity.delete_current_severity(''' ||
t.target_guid || ''',''' ||
metric_guid || ''',''' ||
key_value || ''')' em_severity
from sysman.mgmt_targets t
inner join
sysman.mgmt_current_severity s
on
t.target_guid = s.target_guid;


This will give you information about all the alerts in OEM. Copy the result from the em_severity column that you want removed. Run it exactly as it appears. Be sure you are logged in as sysman.

For example:
SQL> exec em_severity.delete_current_severity('stuff','moretuff','SCHEMA')

Before deletion
After deletion




Friday, January 15, 2010

Oracle archive log directory fills up.

ORA-00257: archiver error. Connect internal only, until freed.
There are a number of reason you might fill up your archive log directory. If this happens your database will hang until space is freed up. You quickly log into the server and delete a number of archive logs and wa-la, problem resolved. Ah ah... not so fast. If you use RMAN to backup your database you will receive errors when your backup runs because the catalog will be out of sync with your current archive logs on disk.

These are the steps I take:
  1. Delete the archive logs from one or more directories.
  2. Run the crosscheck RMAN command.
  3. Backup your database.
This the RMAN command to run after manually deleting your archive logs.

RMAN>
RMAN> rman target / catalog rman/rman@rman
RMAN>change archivelog all crosscheck;
.
.
.
Crosschecked 598 objects
RMAN>exit

Run hot backup immediately after this.

Proactive Oracle DBA

This is a series of posts. I am working this to share some of the many scripts I schedule to automatically run to alert me of any current o...