High Availability and Oracle Environments: Optimizing Performance and Resilience

04 June 2011

In the second part of our High Availability and Oracle environments dossier, we looked at the Importance of Availability and the Costs of Production downtime.

This is the third part of the dossier, in which we look at the causes of downtime.

Previous article: High Availability and Oracle environments - part 2: Importance of Availability and the Costs of Downtime.

Planned and unplanned shutdowns

One of the challenges in designing a high-availability solution is to examine and address all possible causes of Production downtime. It is important to examine both planned and unplanned downtime. Planned downtime can be just as disruptive as unplanned downtime, particularly in the case of international companies with users spread all over the world.

Causes of unplanned downtime

Site failure : this is likely to affect all precessing in a data center, or a subset of the applications supported in that data center...
- Site-wide power failure
- Natural disaster rendering the IT site inoperable
- Terrorist or malicious attack on applications or site
Cluster failure: the entire cluster hosting an Oracle RAC database is unavailable or down
- The last surviving node in an Oracle RAC cluster shuts down and cannot be restarted
- Both redundant INTERCONNECT connections are unusable, or the entire cluster is unusable
- Database corruption so severe that continuity is not possible on the current Oracle server
- Disk access error
Computer failure: when the system running the database becomes unavailable because it is down or inaccessible.
- Database server hardware failure
- Operating system failure
- Oracle instance failure
- Network interface failure
Data storage failure: when the storage elements of all or part of the database are no longer accessible.
- Disk failure
- Disk controller failure
- SAN array failure
Data corruption: a corrupted block is one that has been altered in such a way that it is different from what Oracle expects to find.

There are logical and physical corruptions. There is also intra-block and inter-block corruption.

A failure due to data corruption occurs when a hardware, software or network component causes read or write data corruption. The impact on service levels following data corruption can vary, from a slight impact in the case of one or more corrupted blocks in the database, to database blocking in the case of more extensive corruption.

Here are just a few of the factors that can lead to corruption:
- Operating system or disk driver fault
- Faulty bus adapter
- Disk controller fault
- Disk volume manager error causing disk read or write error
- Software fault
Human error: a user has unintentionally modified or deleted data in a database, or someone has made fraudulent changes; depending on the type of error, the consequences can be more or less serious.
- Deletion of data files belonging to a database
- Deletion of objects in a database (tables, etc.)
- Unintentional modification of data
- Fraudulent data changes
Missing writes: Missing writes are another form of data corruption, but are much more difficult to detect and repair quickly. A lost or missing data block occurs when :
- In the case of a lost write, the I/O subsystem has validated the writing of a block even though it has not been written to disk; consequently, the next read of this block will return an old version of it, causing a cascade of errors in processing and in the
  database.
- In the case of a stray write, the writing is carried out, but at an incorrect location; as a result, the next time this block is read, an older version is returned, causing a cascade of errors in processing and in the database
- In the case of an Oracle RAC database, reading a block on a node returns out-of-date data when another node has just written this block to disk (lost write). This can happen when NFS is used without the “noac” option.
Block or slowdown: A block or slowdown occurs when the database or application is unable to process transactions due to a resource conflict or lock. The perception of a deadlock may be caused by a lack of system resources.
- Application or database deadlocks
- “Out-of-control” processes consuming system resources
- Massive “storm” of connections or system errors
- Application peak load situation with lack of system or database resources
- Lack of space on ARCHIVE LOGS file destination or FRA (Flash Recovery Area) space

Causes of planned downtime

Updating system or database software: a planned shutdown is either periodic (for maintenance tasks) or occasional (for system or database software or insfrastructure upgrade tasks). The duration of downtime depends on many factors. Here are a few examples:
- Add or remove a processor from an SMP server
- Add or remove nodes from a cluster
- Add or remove disks or SANs
- Modify configuration settings
- Update or patch the server or operating system
- Update or patch Oracle software
- Update or patch application software
- Migrate the hardware platform used
- Move the database
- Switch from 32-bit to 64-bit
- Switch to a cluster architecture
- Migrate to new storage
Data modification: This is the case when changes are made to the logical structure or physical organization of Oracle database objects. These changes are often intended to improve performance or manageability. Here are just a few examples:
- Modification to table definitions
- Implementation of table partitioning
- Creation or reconstruction of indexes
Application changes: Application changes can include changes to data and database schema, as well as changes to programs.

Oracle offers various solutions to avoid both planned and unplanned downtime, and to cope with the various possible failures. These solutions will be developed in future articles.

Contact us

Do you have any questions about an article? Do you need help solving your IT issues?

Contact an expert

The augmented developer: what role is there for humans in the age of Vibe Coding?

What is vibe coding? Discover how AI coding agents and generative AI are reshaping software development, productivity and human oversight.

Read this article

Published on

07 July 2026

Data & AI Tendances

IoT and M2M in Luxembourg: Unlocking Data Value

Learn how to leverage IoT and M2M data through an integrated approach combining connectivity, cloud, AI and cybersecurity to drive operational efficiency.

Read this article

Published on

21 July 2025

Data & AI Gouvernance

Key Challenges in Successful Digital Transformation for Public Sector

Explore the key challenges and success factors of digital transformation for local authorities: budgets, skills, cybersecurity, inclusion, data sovereignty and sustainability.

Read this article

Published on

13 May 2025