Archive for April, 2014

SQL Server AlwaysOn – no automatic failover when database files are lost

Kategorien: Allgemein @en, knowledge base @en, News @en
Comments Off on SQL Server AlwaysOn – no automatic failover when database files are lost

With SQL Server Release 2012, Microsoft has introduced a new high-availability technology named “AlwaysOn”. It comes integrated with the database software and extends Microsoft’s high-availability portfolio. AlwaysOn is intended to eventually replace Database Mirroring entirely (refer to High Availability Solutions – MS TechNet).

In this post, we would like to familiarize you with the AlwaysOn technology, but also investigate a constellation of problems that can arise from its use.

What is AlwaysOn?

The SQL Server AlwaysOn (AO) technology has been extensively researched and documented online. We would like to point you to the FAQ on Microsoft’s TechNet website (Microsoft TechNet – AlwaysOn FAQ) and this insightful blog post by Juergen Thomas: SQL Server 2012 AlwaysOn – What is it?.

To put it simply:

AlwaysOn is an alternative high-availability solution for SQL Server databases that is available alongside Database Mirroring (DBM) and SQL Server Clustering. Its objective is to combine the advantages of these two HA solutions, while eliminating all of their disadvantages.

AO mitigates the complexity of a clustered SQL Server instance by means of locally installed SQL Server instances with locally attached storage. As with DBM, the database exists locally on each server.

Other than with DBM, however, a database secured with AO can be addressed using a unique DNS name / IP address. The application that uses the database is not aware of a potential failover and does not have to address another database server at runtime.

Today, a wide variety of business processes is mapped in IT systems and their execution depends on the availability and proper functioning of the systems they are based on. Particularly when it comes to systems that drive business-critical processes, downtime can result in significant financial losses. Users and administrators need to know that the databases that are protected by high-availability technologies will be available at any time even in case of error (hardware failures and the like). But does AlwaysOn function reliably and as expected for every conceivable error?

How does SQL Server / the database protected by AlwaysOn respond if the primary connection is lost (failure of the hard disks or SSDs)?

To answer this question, we set up the following test environment:

  • Operating system platform Windows Server 2012 R2
  • SQL Server 2012 SP1 CU2 (the exact patch level is irrelevant, the following applies to all SQL Server 2012 versions starting with RTM)
  • iSCSI-connected (internet Small Computer System Interface) LUNs for the test database protected by AO
  • Setup of Windows Server Failover Clustering
  • Provision of a test database on the iSCSI LUNs
  • Setup of an SQL Server AO Availability Group

     

Figure 1: AlwaysOn test setup

 

As illustrated in figure 2, the databases on both AlwaysOn test servers are currently synchronized.

Figure 2: Synchronous databases in AO

 

Now we will try to simulate the failure of the primary disk connection by setting the hard disk on which the data files of the primary database reside to offline in the Disk Manager.

One would expect SQL Server 2012 to detect the failure and respond accordingly by triggering a failover.

Figure 3: Simulated hard disk failure

Upon switching to the AO dashboard and refreshing the AlwaysOn status, one will quickly see that despite the “failure” of the hard disk it still considers the databases to be synchronous. Even the SQL Server error log of the primary database instance does not indicate any problems seconds after the “failure” of the hard disk on which the data files of the database reside.

Critical status:

But as soon as one tries to access any database object in this database (in this case using SQL Server Management Studio), the following error message appears:

Figure 4: Message: DB cannot be accessed anymore, DB check is recommended

 

Based on this error message, an administrator would assume that the hard disk on which the data files of the primary database reside is defective and counteract by triggering a failover with AlwaysOn in order to open the database on the secondary side, which is assumed to be synchronous. It is fairly likely, however, that in this state SQL Server will no longer be able to fail over to the up-to-then synchronous second database! The SQL Server error log of the secondary database instance displays the following error message:

 

AlwaysOn: The local replica of availability group ‘AOGroupSLM’ is preparing to transition to the primary role in response to a request from the Windows Server Failover Clustering (WSFC) cluster. This is an informational message only. No user action is required.

The availability replica for availability group ‘AOGroupSLM’ on this instance of SQL Server cannot become the primary replica. One or more databases are not synchronized or have not joined the availability group, or the WSFC cluster was started in Force Quorum mode. If the cluster was started in Force Quorum mode or the availability replica uses the asynchronous-commit mode, consider performing a forced manual failover (with possible data loss). Otherwise, once all local secondary databases are joined and synchronized, you can perform a planned manual failover to this secondary replica (without data loss). For more information, see SQL Server Books Online.

 

 

Conclusion:

The behavior shown by AlwaysOn in such cases is, at first, unexpected:

  • No failover is triggered if the status of a database protected by AO changes. Why is this the case?
    Because AlwaysOn does not perform an availability check on the database level. Other than Database Mirroring, AO just like SQL clustering allows for the logical grouping of multiple databases that can be failed over together if necessary. This must be considered when looking at its behavior. With AlwaysOn, availability is not checked and assessed on the level of individual database, but rather on the level of the SQL Server instance and the AO Availability Group. This is what sets it apart from the Database Mirroring behavior, which checks the availability of the individually protected databases and triggers a failover as soon as the availability of the database on the primary server is affected.

    The following information can be found in SAP Note 1772688 (login required):
    “The latter point also forced a change in the granularity of health checks or indications which would trigger a failover. In DBM the close of a database as result of e.g. a disk failure will trigger a failover (if automatic failover is configured). Whereas in AlwaysOn the health checks are mainly around the Availability Group and the SQL Server instance running the Availability Group as primary. But health detection is not checking the health of the individual databases which are contained in an Availability Group. As a result a database within an Availability
    Group can be closed, triggered e.g. by a disk failure without the AG moving into an unhealthy state or triggering a failover (if automatic failover got configured). This is a drastic change in behavior and coverage area compared to DBM. Especially in the case of disk failures which as a secondary effect cause affected SQL Server databases to close, DBM triggered a failover, whereas AlwaysOn in its default settings won’t trigger a failover.”

  • Following a storage defect on the primary side, the databases are assumed to be out of sync, making it necessary to open the secondary database, which is actually synchronous, as if it were an asynchronously replicated database (‚with allow_data_loss’).
    The actual status of the secondary AO database, however, is identical to that of the secondary DBM database for a similar database error. The databases are equally synchronous or asynchronous with both technologies, but AlwaysOn is more conservative in its assessment than DBM and changes the status of the secondary database to out-of-sync (which is always is case an asynchronously mirrored databases or asynchronous AO).

 

The behavior when log or data files are lost is the same for both AlwaysOn and Database Mirroring until the protected database is closed. Here, the difference between the two HA solutions lies in their level of automation. AlwaysOn will only provide high availability for as long as the integrity of the database files of the current primary database is maintained. In cases, however, where the primary database is lost due to problems such as storage/SSD failure or iSCSI connection loss, the database, depending on its runtime status, is marked as “out of sync” and requires manual intervention. Database Mirroring on the other hand will, depending on its configuration, automatically fail over to the mirror when data or log files are lost.

From a data consistency perspective, there is no reason not to use the SQL Server AlwaysOn high availability solution, if SLAs are continued to be met after manual intervention and starting the mirror.


ITSM Integration powered by easyCloud

Kategorien: knowledge base @en
Comments Off on ITSM Integration powered by easyCloud

Introduction

What has been standard within the classical manufacturing and other producing industries for several decades becomes more important to IT organizations nowadays: tailoring standard processes from best practices and implementing these processes with the help of integrated software.

Many IT organization – independently if internal or external – currently concentrate their strategical activities on adopting and implementing IT processes like those found in the ITIL framework. However, not only the processes itself are introduced, further more tools and software suites are being implemented to support these processes.

These described circumstances lead to several challenges, which can be divided into organizational and technical challenges. On the one hand, changing the processes – and therefore the way people used to act for a certain time – is not always easy. On the other hand, as soon as these processes are agreed upon, the chosen tools or tool suites need to communicate and integrate into the existing landscape. Commonly, this results in integrating a multitude of different platforms, applications and technology components into one leading ITSM tool (suite).



SAP applications – like SAP ERP, SAP BW, SAP CRM etc. – often act as important part of the offered service portfolio, supporting the established business processes and business services. The integration of ITSM processes and SAP ALM processes is therefore an important topic to be discussed. In addition, the concrete integration of ITSM software with SAP software needs to be discussed.

This blog post is not about how to choose ITSM software or what software should be used. Further on, this blog will not give an overview of what ITIL or ITSM and how to implement ITSM processes within organizations. This blog is rather about the integration of SAP solutions into an existing or newly implemented Non-SAP-ITSM software (e.g. like Microsoft System Center Service Manager, ServiceNow, BMC, etc.). It tries to answer questions about what might be suitable for integration and how to integrate it.

Possible Scenarios

First of all, let us have a closer look on possible scenarios, where an integration makes sense. During some customer projects, four main areas emerge as the key points of interest and integration: Change Management, Incident Management, Event/Capacity Management and CMDB design. Within the next few chapters, these scenarios will be detailed and evaluated. Possible solutions will be recommended in the trailing chapters of this blog post.

Change Management

SAP has its own processes established for a – more or less – technical change management: SAP Transport Management System. The STMS defines transportation routes between SAP systems and “packages” every modified or newly created object into a transport request, which can be transported within the landscape. Further on, SAP delivers Change and Request Management (ChaRM) as part of the SAP Solution Manager. ChaRM implements change request management processes with a tight integration into the underlying SAP technology (e.g. STMS).

So, if SAP offers its own its own implementation of Change and Request Management, why should one think about integration? Wouldn’t it be easier to use SAP’s ChaRM for SAP changes? Indeed – but then there would be at least two entry points: one request entry for SAP changes, one entry point for Non-SAP changes. Moreover, what needs to be done, if the change is for SAP and Non-SAP?


Therefore, the first reason to integrate SAP and Non-SAP ITSM is a central entry point for Change Management. This way, key users and change managers can create and track Change Requests during their way towards becoming a change and being implemented. In addition, this should be possible on a higher level, independent from the application and infrastructure underneath.

The second reason, from the perspective of the application owner, is traceability and auditability of changes made to the offered IT services, especially changes made to critical SAP applications. Even if the application owners use their own software for technical or non-technical change management, redundant maintenance of information should be avoided – as far as possible. An integration between a central ITSM tool and the SAP Change Management would solve this issue by either integrating the process layer from the ITSM tool with SAP ChaRM or directly with the SAP TMS.



Event Management

The next possible integration scenario is based on the requirements for a central event management. Common organizations have an SAP Basis team, which monitors their SAP systems on a daily basis. By introducing a central management and monitoring system of IT Services and IT Service Management, monitoring of available IT Services and the underlying applications and infrastructure should be done within these central applications. Critical events should be forwarded to the central software for monitoring and event management. For further analysis, it is – as with any other application – required to work with the SAP system itself. Nevertheless, alerting can be done centrally, which also makes it easier to aggregate and report these information. Furthermore, these information can be used as foundation for SLA reporting.


CMDB

Another example for integrating SAP and a third party ITSM tool is the enrichment of the central CMDB with SAP Applications, SAP Application Components and versions. This way one has a CMDB with all these SAP related information automatically maintained within the central CMDB, which makes the process integration, e.g. within Incident Management, much easier and much more comfortable.



Incident Management

The integration between the Incident Management process and SAP is also a common requirement. Adding additional information for incident tickets is always helpful for solving the incident. Either through adding information directly while creating the incident, or by enriching the incident ticket while working on it, the incident resolution can be accelerated.

As example, an incident ticket could be enriched by general information like the SAP System, the installed application components and versions and further on, additional information about the user could be added. This may include the locked-status of the user or recent authorization issues, which are helpful for processing and solving the incident, but also for using the incident database as Knowledgebase.