Back to articles

High Availability and Oracle environments - part 3

05 June 2011

In the second part of our High Availability and Oracle environments dossier, we looked at the Importance of Availability and the Costs of Production downtime. This is the third part of the dossier, in which we look at the causes of downtime.

Planned and unplanned downtime

One of the challenges in designing a high-availability solution is to examine and address all possible causes of Production downtime. It is important to examine both planned and unplanned downtime. Planned downtime can be just as disruptive as unplanned downtime, particularly in the case of international companies with users spread all over the world.

Causes of unplanned stops

Site failure: likely to affect all processing in a data center, or a subset of the applications supported in that data center...
- Site-wide power failure
- Natural disaster rendering the IT site inoperative
- Terrorist or malicious attack on applications or site
Cluster failure: the entire cluster hosting an Oracle RAC database is unavailable or down.
- The last surviving node in an Oracle RAC cluster shuts down and cannot be restarted.
- Both redundant INTERCONNECT connections are unusable, or the entire cluster is unusable.
- Database corruption so severe that continuity is not possible on current Oracle server
- Disk access error
Computer failure: when the system running the database becomes unavailable because it is down or inaccessible.
- Database server hardware failure
- Operating system failure
- Oracle instance failure
- Network interface failure
Data storage failure: when the storage elements of all or part of the database are no longer accessible.
- Disk failure
- Disk controller failure
- SAN array failure
Data corruption: a corrupted block is one that has been altered in such a way that it is different from what Oracle expects to find.
There are logical and physical corruptions. There is also intra-block and inter-block corruption.
A failure due to data corruption occurs when a hardware, software or network component causes read or write data corruption. The impact on service levels following data corruption can vary, from a slight impact in the case of one or more corrupted blocks in the database, to database blocking in the case of more extensive corruption.
Here are just a few of the factors that can lead to corruption:
- Faulty operating system or disk driver
- Faulty bus adapter
- Disk controller fault
- Disk volume manager error causing a disk read or write error
- Software fault
Human error: a user has unintentionally modified or deleted data in a database, or someone has made criminal modifications; depending on the type of error, the consequences will be more or less serious.
- Deletion of data files belonging to a database
- Deletion of database objects (tables, etc.)
- Unintentional data modification
- Fraudulent data changes
Missing entries: a missing entry is another form of data corruption, but it is much more difficult to detect and repair quickly. A lost or missing data block occurs when:
- In the case of a lost write, the I/O subsystem has validated the writing of a block even though it has not been written to disk; consequently, the next read of this block will return an old version of it, causing a cascade of errors in processing and in the database.
- In the case of a stray write, the block has been written, but to an incorrect location; consequently, the next read of this block will return an old version, causing a cascade of errors in processing and in the database.
- In the case of an Oracle RAC database, reading a block on a node returns out-of-date data when another node has just written this block to disk (lost write). This can happen when NFS is used without the “noac” option.
Block or slowdown: A block or slowdown occurs when the database or application is unable to process transactions due to a resource conflict or lock. The perception of a deadlock may be caused by a lack of system resources.
- Application or database deadlocks
- Out-of-control processes consuming system resources
- Massive “storm” of connections or system errors
- Application peak load situation with lack of system or database resources.
- Lack of space on ARCHIVE LOGS file destination or FRA (Flash Recovery Area) space

Causes of planned stops

Updating system or database software: a planned shutdown is either periodic (for maintenance tasks) or occasional (for system or database software or infrastructure upgrade tasks). The duration of downtime depends on a number of factors. Here are a few examples:
- Add or remove a processor on an SMP server
- Add or remove nodes from a cluster
- Add or remove disks or SAN bays
- Modify configuration parameters
- Update or patch server or operating system
- Update or patch Oracle software
- Update or patch application software
- Migrate hardware platform
- Move database
- Switch from 32-bit to 64-bit
- Switch to cluster architecture
- Migrate to new storage
Changes to data: This is the case when changes are made to the logical structure or physical organization of Oracle database objects. These changes are often intended to improve performance or manageability. Here are a few examples:
- Changes to table definitions
- Implementation of table partitioning
- Index creation or reconstruction
Changes to applications: Changes to applications can include changes to data and database schema, as well as changes to programs.

Oracle offers a range of solutions to avoid both planned and unplanned downtime, and to cope with the various possible failures. These solutions will be developed in future articles.

Written by

Paul Felix

Product Marketer

Contact us

Do you have any questions about an article? Do you need help solving your IT issues?

Contact an expert

Our experts answer your questions

Do you have any questions about an article? Do you need help solving your IT issues?

Contact an expert

The augmented developer: what role is there for humans in the age of Vibe Coding?

What is vibe coding? Discover how AI coding agents and generative AI are reshaping software development, productivity and human oversight.

Read this article

Published on

07 July 2026

Data & AI Tendances

IoT and M2M in Luxembourg: Unlocking Data Value

Learn how to leverage IoT and M2M data through an integrated approach combining connectivity, cloud, AI and cybersecurity to drive operational efficiency.

Read this article

Published on

21 July 2025

Data & AI Gouvernance

Key Challenges in Successful Digital Transformation for Public Sector

Explore the key challenges and success factors of digital transformation for local authorities: budgets, skills, cybersecurity, inclusion, data sovereignty and sustainability.

Read this article

Published on

13 May 2025

Got a project? Questions?

Send us a message and our experts will get back to you quickly.

Contact the DEEP experts

DEEP? Your digital ally!

With DEEP, turn your IT projects into measurable and sustainable growth drivers.

About DEEP