Back to articles

High Availability and Oracle environments - part 3

05 June 2011

In the second part of our High Availability and Oracle environments dossier, we looked at the Importance of Availability and the Costs of Production downtime. This is the third part of the dossier, in which we look at the causes of downtime.

Planned and unplanned downtime

One of the challenges in designing a high-availability solution is to examine and address all possible causes of Production downtime. It is important to examine both planned and unplanned downtime. Planned downtime can be just as disruptive as unplanned downtime, particularly in the case of international companies with users spread all over the world.

Causes of unplanned stops

  1. Site failure: likely to affect all processing in a data center, or a subset of the applications supported in that data center...
    • Site-wide power failure
    • Natural disaster rendering the IT site inoperative
    • Terrorist or malicious attack on applications or site
  2. Cluster failure: the entire cluster hosting an Oracle RAC database is unavailable or down.
    • The last surviving node in an Oracle RAC cluster shuts down and cannot be restarted.
    • Both redundant INTERCONNECT connections are unusable, or the entire cluster is unusable.
    • Database corruption so severe that continuity is not possible on current Oracle server
    • Disk access error
  3. Computer failure: when the system running the database becomes unavailable because it is down or inaccessible.
    • Database server hardware failure
    • Operating system failure
    • Oracle instance failure
    • Network interface failure
  4. Data storage failure: when the storage elements of all or part of the database are no longer accessible.
    • Disk failure
    • Disk controller failure
    • SAN array failure
  5. Data corruption: a corrupted block is one that has been altered in such a way that it is different from what Oracle expects to find.
    There are logical and physical corruptions. There is also intra-block and inter-block corruption.
    A failure due to data corruption occurs when a hardware, software or network component causes read or write data corruption. The impact on service levels following data corruption can vary, from a slight impact in the case of one or more corrupted blocks in the database, to database blocking in the case of more extensive corruption.
    Here are just a few of the factors that can lead to corruption:
    • Faulty operating system or disk driver
    • Faulty bus adapter
    • Disk controller fault
    • Disk volume manager error causing a disk read or write error
    • Software fault
  6. Human error: a user has unintentionally modified or deleted data in a database, or someone has made criminal modifications; depending on the type of error, the consequences will be more or less serious.
    • Deletion of data files belonging to a database
    • Deletion of database objects (tables, etc.)
    • Unintentional data modification
    • Fraudulent data changes
  7. Missing entries: a missing entry is another form of data corruption, but it is much more difficult to detect and repair quickly. A lost or missing data block occurs when:
    • In the case of a lost write, the I/O subsystem has validated the writing of a block even though it has not been written to disk; consequently, the next read of this block will return an old version of it, causing a cascade of errors in processing and in the database.
    • In the case of a stray write, the block has been written, but to an incorrect location; consequently, the next read of this block will return an old version, causing a cascade of errors in processing and in the database.
    • In the case of an Oracle RAC database, reading a block on a node returns out-of-date data when another node has just written this block to disk (lost write). This can happen when NFS is used without the “noac” option.

  8. Block or slowdown: A block or slowdown occurs when the database or application is unable to process transactions due to a resource conflict or lock. The perception of a deadlock may be caused by a lack of system resources.
    • Application or database deadlocks
    • Out-of-control processes consuming system resources
    • Massive “storm” of connections or system errors

    • Application peak load situation with lack of system or database resources.

    • Lack of space on ARCHIVE LOGS file destination or FRA (Flash Recovery Area) space

Causes of planned stops

  1. Updating system or database software: a planned shutdown is either periodic (for maintenance tasks) or occasional (for system or database software or infrastructure upgrade tasks). The duration of downtime depends on a number of factors. Here are a few examples:
    • Add or remove a processor on an SMP server
    • Add or remove nodes from a cluster
    • Add or remove disks or SAN bays
    • Modify configuration parameters
    • Update or patch server or operating system
    • Update or patch Oracle software
    • Update or patch application software
    • Migrate hardware platform
    • Move database
    • Switch from 32-bit to 64-bit
    • Switch to cluster architecture
    • Migrate to new storage

  2. Changes to data:  This is the case when changes are made to the logical structure or physical organization of Oracle database objects. These changes are often intended to improve performance or manageability. Here are a few examples:
    • Changes to table definitions
    • Implementation of table partitioning
    • Index creation or reconstruction
  3. Changes to applications: Changes to applications can include changes to data and database schema, as well as changes to programs.

Oracle offers a range of solutions to avoid both planned and unplanned downtime, and to cope with the various possible failures. These solutions will be developed in future articles.

Our experts answer your questions

Do you have any questions about an article? Do you need help solving your IT issues?

Other articles in the category Data & AI

Connected objects, new sources of usable intelligence

Connected objects are proliferating in all sectors. The challenge now is to make effective use of the data they generate. Thanks to an integrated approach combining connectivity, the cloud, artificial intelligence and security, DEEP can help organisations to set up ecosystems that make the most of data, transforming IoT and M2M into powerful levers of efficiency.

Read this article

Published on

21 July 2025

Federated Governance: A Key Pillar for Successful Data Mesh Implementation

Learn why federated governance is a critical organizational pillar in a Data Mesh architecture. A strategic issue for data-driven companies.

Read this article

Published on

12 December 2023

Do you have any other questions? 

Call us free of charge on 8002 4000 or +352 2424 4000 for international calls, from Monday to Friday, 8 am to 6 pm.

About DEEP

Discover DEEP, your unique partner for your digital transformation.