This review sets out the findings from our cross-Financial Services change management review which looked at how financial firms manage technology change, the impact of change failures and the practices utilised within the industry to help reduce the impact of incidents resulting from change management.
Why we conducted this review
A number of significant IT failures in the last 10 years have led to greater scrutiny of the effectiveness of technology change management in the financial services (FS) sector. The evolution of business models, economic and regulatory change and the increasing pace of technological advancements have emphasised the need for firms to change with confidence. Technology is integral to the delivery of FS and while technology change presents firms with the opportunity to innovate, lower costs and improve the quality of service, it also introduces operational risks.
Most firms in our 2018/19 Cyber and Technology Resilience questionnaire exercise reported that they have mature change management capabilities. However, our analysis of the incident data firms report to us shows that change related incidents are consistently one of the top causes of failure and operational disruption. Nearly 1,000 material incidents were reported to the FCA in 2019, 17% of which were attributed to change activity.
By reviewing how financial firms implement technology change, and the effect that outages have on consumers and the financial system, we aimed to understand how firms currently approach managing technology change and the causes of the problems they encounter.
What we did
We used a data-driven approach to analyse over 1m production changes implemented in 2019 by a sample of FS companies leveraging different business models at varying scale. We supplemented our data with a qualitative questionnaire, a confidential board questionnaire and industry workshops to understand firms’ release and deployment methodologies, the effectiveness of the governance arrangements in place, and the role that infrastructure plays in deploying change effectively. For more information on the data sources used see Annex A.
We set out our analysis and key findings across the following areas:
- contributing practices to change success and change failure
- the impact of incidents caused by technology change
- how firms govern and manage technology change
- how firms build and deploy technology change
- how infrastructure impacts technology change
Who this applies to
This review is relevant for all FS organisations and may also be of interest to third parties providing technology services to the industry.
Next steps
The aim of operational resilience remains to prevent, respond to, recover and learn from operational disruptions. This review will contribute to the discussion on how firms can implement technology change in ways that reduce the potential for operational disruption. As firms are increasingly using remote and flexible working, it is critically important for firms to understand the services they provide, how change activity can impact those services and invest in their resilience to protect themselves, consumers and the market.
Key Findings
 
                    						One of the goals of this review was to highlight practices utilised within the industry to help reduce the impact and number of incidents resulting from technology change. To achieve this aim we calculated the change performance of each firm using a range of metrics. This included the average number of incidents per change, the severity of the resulting change related incidents, whether the incidents were customer-facing and the average duration of incidents (collectively referred to as the change success rate). Firms were grouped based on their performance and cluster analysis was used to identify common characteristics across high-performing firms. This quantitative analysis was supported by industry workshops and other qualitative data submitted by participating firms.
Practices identified as contributing to change success
While there is no single approach, process, or control that improves change success rates, our analysis and discussions with firms found that stronger governance, day-to-day risk management, increased automation and more robust testing and planning can contribute to successful change activity and less disruption.
Based on the data analysed in this review, we found that firms that had higher change success rates had these common characteristics:
- Firms with well-established governance arrangements have a higher change success rate:
There was a positive correlation between the longevity of governance arrangements and higher change success rates in the sampled firms. Our data showed that robust governance can help reduce the number and impact of operational incidents resulting from change.
- Relying on high levels of legacy technology is linked to more failed and emergency changes:
Over 90% of surveyed FS firms rely on legacy technology in some form to deliver their services. We found that firms with a lower proportion of legacy infrastructure and applications had a higher change success rate. This supports the view that technology change is more difficult to implement without disruption when dealing with legacy infrastructure. Firms with a lower proportion of legacy technology also had a lower proportion of changes being deployed as emergencies and had a higher chance of those emergency changes being successfully deployed.
- Firms that allocated a higher proportion of their technology budget to change experienced fewer change related incidents:
Firms with a high change success rate dedicated a large proportion of their IT budget to change activities. Firms that had the lowest proportion of changes resulting in an incident dedicated between 50-75% of their IT budget to technology change activities.
- Frequent releases and agile delivery can help firms to reduce the likelihood and impact of change related incidents:
Overall, we found that firms that deployed smaller, more frequent releases had higher change success rates than those with longer release cycles. Firms that made effective use of agile delivery methodologies were also less likely to experience a change incident.
- Effective risk management is an important component of effective change management capabilities:
Firms that experienced less incidents due to failed changes mitigated the risk of technology change by leveraging a wide range of technical and business knowledge to ensure that potential impacts were well understood. We also found that firms that continuously managed risks as part of day to day project management were more likely to have higher change success rates compared to firms that employed an ad-hoc or periodic approach to risk management.
Practices identified as contributing to change failure
Based on the data analysed in this review, we also identified some areas which could lead to increased operational disruption when carrying out change activity:
- Most firms do not have complete visibility of third-party changes:
Third parties are not immune to the inherent risks presented by technology change, and most firms in this review did not effectively track third-party changes. According to firms’ incident reporting, in 2019 over 20% of incidents at third-parties were caused by change. Workshop attendees suggested that third-party contracts could be better utilised to provide greater clarity on how changes are communicated and the potential impacts to a client firm’s IT estate.
- Firms’ change management processes are heavily reliant on manual review and actions:
We found that firms are still heavily reliant on manual testing and processes to deliver technology change. Repeatability and consistency throughout the lifecycle of a change and its deployment could help reduce the burden of assurance activity and could also allow for a higher degree of confidence when implementing technology change.
- Legacy technology impacts firms’ ability to implement new technologies and innovative approaches:
Firms that classified a higher proportion of their technology estate as legacy also had lower adoption rates for DevOps, micro-architecture and public cloud which could affect firms’ ability to benefit from innovative approaches. However, firms also highlighted the risks involved in migrating away from legacy technology.
- Major changes were twice as likely to result in an incident when compared with standard changes:
Changes deemed by firms to be ‘major’ were over twice as likely to result in an incident when compared with other change types. Workshop attendees attributed this to their complexity, and the inability to break them down into smaller components. One of the key assurance controls firms used when implementing major changes was the Change Advisory Board (CAB). However, we found that CABs approved over 90% of the major changes they reviewed, and in some firms the CAB had not rejected a single change during 2019. This raises questions over the effectiveness of CABs as an assurance mechanism.
The impact of change incidents
Change management practices and capabilities can help ensure that changes to the IT estate are handled effectively, minimising the number and impact of any change related incidents. The concept of change management is not new to firms, and most firms in our sample had rigorous governance arrangements in place. However, several high-profile incidents in the recent past caused by failed changes have highlighted the substantial consumer detriment that can result from change-related incidents.
Our analysis showed that, in general, change was managed effectively by the industry with 1.6% of technology changes resulting in an incident. However, due to the volume of change implemented by sample firms, this resulted in significant disruption; amounting to over 13,767 incidents in 2019, of which 14% had customer-facing impact. Certain types of change, such as major changes, were more likely to result in incidents. We also identified significant variance in change success rates between firms, highlighting the need for robust controls and oversight.
Customer-facing and high severity incidents resulting from change
Consumers and market participants are increasingly reliant on digital channels to access FS. Even short-lived disruptions can be quickly identified by end users, and can result in significant impact and media attention. On average, each sample firm experienced 80 customer-facing incidents resulting from failed change during 2019.
Our data showed that of all customer-facing incidents in 2019, 7.4% had a root cause relating to change activity. However, of all incidents classified as high severity by firms in 2019, 24% had a root cause relating to change activity, highlighting that failed technology changes can have a higher impact than incidents resulting from other root causes.
When we asked how firms can better manage customer impacts due to disruptive change activity, workshop attendees emphasised the importance of having comprehensive, well-tested roll back plans in place to minimise the impact on customers. These plans often include internal and external communications (including with third parties) to use during an incident to highlight what alternative channels are available, for example. Firms’ board members stated that firms should also look to capture lessons learned from failed changes to continuously improve their capabilities.
Incidents resulting from major changes
 
                    						In total, the sampled firms deployed nearly 68,000 major changes over 2019, which resulted in 2,600 incidents. This is an average failure rate of 3.8%, compared to 1.6% from all change types. We understand that major changes are generally subject to a higher level of scrutiny and governance due to the potential for disruption to services. However, despite the increased oversight, we found that changes classified as 'major' are twice as likely to result in an incident when compared with other change types. Workshop attendees attributed this to their complexity, and the inability to break them down into smaller components.
The Change Advisory Board (CAB) was one of the key controls firms used when implementing major changes. However, we found that CABs approved over 90% of the major changes they reviewed and in some firms, they had not rejected a single change during 2019. This may indicate that many CABs perform more of a 'flight control' role rather than an assurance function. To mitigate this risk, workshop attendees emphasised that CAB members should be carefully selected to ensure that the requested changes are thoroughly checked from both a technical and business perspective. Firms also stressed the importance of having a range of Subject Matter Experts (SMEs) review the changes from both a technical and business perspective.
Incidents resulting from emergency changes
Emergency changes are technology changes that must be implemented as quickly as possible, often in response to an operational incident. Due to the nature of these changes, some of the assurance and governance steps used for normal changes are expedited which can increase the inherent level of risk.
During 2019, firms deployed over 33,000 emergency changes into production, accounting for between 3-5% of total change volume. However, the proportion of emergency changes varied dramatically across firms, with some categorising almost 20% of all changes deployed as emergency changes. We observed a link between emergency changes and legacy infrastructure which may indicate that firms with higher proportions of legacy infrastructure were more likely to both use emergency changes and to have a higher proportion of those changes result in an incident.
We found that, on average, 1.5% of emergency changes resulted in a new incident; a slightly lower proportion than other types of changes. This could be due to an incident already being open for the issue being addressed by the change, but may also indicate strong risk awareness when it comes to implementing emergency changes.
Due to the speed and nature of emergency changes there is little room for error, and these changes can amplify existing weaknesses in a firm’s change capabilities. The workshop participants generally agreed that having the right culture around emergency changes is important to govern their use. As with major changes, participants also highlighted the importance of SME reviews and having stringent governance around the emergency change process.
Incidents resulting from third party changes
Firms rely heavily on third parties for the delivery of business services. Over 30% of the development activity conducted in the sample firms was delivered by third-party teams. This increases the complexity of business models and can heighten operational risk. Recent third-party incidents and security breaches have highlighted the importance of third-party risk management, as well as the need to understand all the critical parties supporting business services. According to data taken from all incidents reported to the FCA by regulated entities in 2019, third-party incidents was one of the top root causes, accounting for 18% of all incidents. Of those, 22% were due to third-party change activity.
Third parties are not immune to the inherent risks presented by technology change but most firms participating in this review did not track third-party changes. While some firms did not allow third parties to deploy changes directly in their environments, all firms and their clients have a degree of dependency on software products provided by third parties. Firms stated that third parties often do not communicate changes to their customers, resulting in difficulty in tracking those changes. Attendees suggested that some of the risk associated with third-party changes could be mitigated by contractual agreements with strong governance against service levels.
Chart
Data table
Governance and management
Governance arrangements
Effective governance at senior levels can help foster an operationally effective environment throughout an organisation. Most of the respondents to our Board questionnaire were confident in their ability to deliver technology change without incident and believed that their organisations had the ability to deliver benefit through those changes. They were also generally confident in their ability to assess risks and approve major technology changes.
We found that the majority of firms had their current technology change governance arrangements in place for between 6 months and a year. We found a positive correlation between the longevity of governance arrangements and higher change success rates in the firms. Firms that had governance arrangements in place for more than a year experienced a lower proportion of incidents resulting from change when compared to peers with newer arrangements.
Most firms stated that their change management governance arrangements were reviewed periodically to ensure processes and controls remained fit for purpose, but their arrangements were also reviewed on an ad-hoc basis following the completion of lessons learned exercises or a major change. Firms also stated that effective board level governance can be complemented by SMEs and Non-Executive Directors providing challenge from a technical and business perspective.
We asked firms how their governance arrangements might change over the next 12 months (before the coronavirus pandemic). Some firms stated they were looking to pilot or increase their use of automation and methodologies to reduce manual overhead. The majority of firms used a consolidated tool to track, categorise and record changes made across their estate.
The drivers of technology change
35,000
Average number of changes implemented per firm over 2019
The financial industry has needed to change quickly, and in some cases dramatically, as customers demand real-time services, seamless experiences and increased customer journey integration. Regulators have also required substantial change from the industry. For example, many firms have had to implement changes to their technology estates due to new regulatory requirements around MiFID/MiFIR and/or Ring Fencing. Firms have also had to respond to developments outside of the sector, like the coronavirus pandemic, the General Data Protection Regulation and the UK’s withdrawal from the EU.
The sample firms collectively deployed over 1m changes over 2019 with each firm deploying 35,000 production changes on average. This represents significant activity and highlights the complexity of translating business or regulatory initiatives into technology change.
Chart
Data table
To better understand the drivers of change, firms were asked to allocate their change budgets across 6 broad buckets. We found that firms dedicated the highest proportion of their change resources to ‘maintenance and upkeep’ and ‘satisfying regulatory and legal requirements’.
The need for continued maintenance presents a challenge to firms going through transformation. While significant effort will always be dedicated to this area, some firms told us that automating routine maintenance tasks and consolidating their operating environments has allowed them to re-allocate resources to other change activities.
Project management methodologies
Effective project management is a critical component of change management governance. It helps ensure that what is being delivered meets the strategic objectives of the organisation, and plays a key role in risk management and quality control. A number of project management methodologies exist, but for the purpose of this review they are broken down into 2 broad types: ‘Agile’ and ‘Waterfall’.
We found firms that utilised a higher proportion of agile project management methodologies had fewer incidents resulting from change. However, firms stated that they often used a hybrid approach by applying agile 'tools and techniques' such as daily stand-ups, use of Kanban boards etc. to waterfall managed projects. Agile and Waterfall methodologies each have their own advantages and disadvantages and the best choice is dependent on the project type and circumstances. However, attendees at the workshops stated that being able to manage changes in smaller packets contributed to the likelihood of success.
Chart
Data table
High risk projects and programmes
Chart
Data table
We found that a number of consistent risk factors are prevalent in high risk change projects of all sizes. Firms across all sectors agreed that the most consistent risk factor is when ‘a project is dependent on other projects delivering their objectives’. These projects require the coordination of many moving parts, detailed awareness of the interconnectedness of systems and services, and changes needing to be completed in tandem to fulfil structured project plans.
Projects that implement technologies not previously used within the organisation were rated high risk, as were those that change legacy technology. Contributing factors to these ratings could be incompatible systems and data, as well as the high degree of uncertainty involved with these projects. Reliance on third parties could also raise unforeseen issues during delivery and firms should ensure that they select and govern their change partners appropriately.
The classification of a project as ‘major’ seems to be less of a risk factor when considering the characteristics that may affect project delivery. This may be because there is greater sponsorship, resource and ultimately scrutiny being placed on a major change project. Firms also seem to trust project delivery when intragroup resources are leaned on more heavily.
Multiple firms highlighted the importance of implementation planning and careful risk management when delivering projects with one or more of these high-risk characteristics. We found that most firms managed risks as part of the day to day project management and actively included team members in the risk process. Firms also managed risk registers for each project. We found that these firms were more likely to have higher change success rates compared to firms that employed an ad-hoc or periodic approach to risk management.
Chart
Data table
Testing approaches
Testing aims to minimise the issues that can occur before, during and after implementation of a change. A variety of testing methods can be used to more closely replicate the change that is intended to take place and reduce potential risks.
Chart
Data table
Firms use a wide range of testing approaches depending on the scope and scale of change being deployed. All firms made use of Regression Testing, Integration Testing and Unit Testing.
However, we found that firms are still heavily reliant on manual testing and peer review, both of which are prone to human error. Repeatability and consistency of processes throughout the lifecycle of a change and its deployment could reduce the testing burden, and allow for increased coverage of assurance activity.
Firms highlighted that end-to-end testing automation was the ‘gold standard’ but not always possible. Most firms use some automated testing, but they commented that it came with its own benefits and challenges, particularly the need to maintain control, and make use of deployment-specific tests. They also emphasised that automated testing does not completely eliminate the need for manual oversight and testing.
Firms that attended the workshops and employed a standardised approach to testing said that they gained confidence in utilising a repeatable process. Firms also stated that communication with business stakeholders, a strong understanding of the technology estate, as well as upstream and downstream systems were common drivers of the design of test plans before go-live.
Chart
Data table
Build and deployment
To react to changing business environments and provide competitive advantage, businesses are requiring their IT teams to deploy changes faster, and more frequently, than ever. New competitors in the FS industry can utilise state-of-the art infrastructure, processes and systems. IT functions are increasingly under pressure to support capabilities such as data analytics, information security, automated processing and integration with third-party systems.
Production deployments are often not routine, requiring weekend working and overtime when incidents occur. This results in a resource cost, slower time to market and can create conflict within firms’ IT teams. We looked to understand the extent to which innovative build and deployment methodologies were being used, as well as their benefits and risks.
DevOps methodologies
Under a DevOps model, development and operations teams are no longer 'siloed', resulting in increased ownership and less friction between teams. Sometimes these 2 teams are merged into a single team. Engineers work across the entire application lifecycle, from development and test to deployment and operations, using a range of skills not limited to a single function. In some DevOps models, quality assurance and security teams may also be more tightly integrated with development and operations throughout the application lifecycle.
Firms in our sample stated a preference for an agile DevOps approach. There was wide support for carrying out smaller changes gradually, but some larger firms outlined clear challenges in doing so. For example, they highlighted that major changes, specifically regulatory changes, are difficult to break into small changes, due to varying specifications and hard deadlines.
84% of firms use DevOps methodologies in some form, but only 13% of firms use DevOps processes for all software delivery activities. While most firms are looking to take advantage of innovative approaches, attendees at the workshops stated that DevOps methodologies are often not applied consistently across the organisation, which could highlight the implementation challenges involved. Our data analysis showed that firms who rely heavily on legacy systems had lower adoption rates. However, a large proportion of the firms that do not widely use DevOps currently plan to deploy more workloads using this methodology over the next 12 months.
DevOps methodologies can increase the frequency and pace of releases. However, increased release frequency and automated assurance can present significant operational and regulatory challenges, and it’s important that firms understand that this isn’t a one-size fits all solution. The incident and change data provided by firms suggested that firms who described a higher proportion of their software development activities as ‘DevOps’ experienced a slightly higher change failure rate compared with other firms. However, due to the sample size no firm conclusion could be identified as to the impact a DevOps methodology has on change success.
To successfully leverage DevOps methodologies, firms said that they first need to have a strong awareness of their operating environments, as well as a solid understanding of risk, and rigorous governance. This requires significant time and upfront financial investment, and can be particularly challenging when multiple methodologies are used at the same time.
Chart
Data table
The speed of technology change
The speed at which firms implement change provides insight into the size of changes, tooling and methodologies they use. Our review showed that most firms deployed changes to their core systems between once a month and once a quarter. Automation can decrease the time it takes to plan, build, test and deploy change while reducing the possibility of human error.
Chart
Data table
Deploying change regularly and safely usually requires firms to use automated tools and processes and to leverage modern infrastructure. Firms that deployed changes between once a week and once a day/multiple times a day made use of automated deployment tools, with 78% utilising a deployment process that was mostly or fully automated. Most of these firms (89%) also leveraged microservice architecture in some capacity. Firms that deploy frequent change were also more likely to use public cloud technology and have limited legacy technology within their IT environments. Overall, we found that firms who used microservices, automation and deployed changes more frequently had higher change success rates. However, it is important to note that approaches were often not consistently applied within firms and that change incidents still occurred.
To see how quickly firms could launch new products, board members were asked ‘if you had an innovative idea that required IT change as a key component, how long would it take to approve the idea, build it, and deploy it to users (on average)?’ While timelines vary depending on the product, most respondents told us that new ideas can be implemented within 6-12 months.
Chart
Data table
The impact of infrastructure
A firm’s infrastructure is the foundation that supports a system or organisation. Technology infrastructure fundamentally defines the way that IT services are delivered to end users, how technology change is performed, and can help cause or mitigate incidents. Given the current speed that technology changes, and the competitive nature of businesses, IT leaders are working to ensure that their IT infrastructure is designed such that changes can be made quickly with minimal disruption to service.
Impact of legacy technology
Chart
Data table
Many FS firms run on complex legacy infrastructure that has been patched over many years and operates alongside new technologies. This can result in difficulty in understanding overall IT estates and the potential impact of a change. Firms participating in this review said that it is important to invest in estate discovery to understand the data and the application portfolio when designing a change, which can be difficult when dealing with legacy infrastructure.
Shifting away from legacy infrastructure presents firms with a substantial challenge. Several of the highest impact technology incidents experienced over the last 10 years have been a result of failed migration of legacy infrastructure.
According to our review, over 90% of reviewed firms are still reliant on legacy infrastructure and applications to deliver production services. This proportion is highest in the insurance sector, where over 70% of respondents classified most of their infrastructure and applications as legacy, with 100% of insurance sector respondents relying on legacy technology in some form. We also found that firms with higher amounts of legacy infrastructure and applications had a higher change failure rate. This supports the view that technology change is more difficult to implement without disruption when dealing with legacy infrastructure.
The impact of legacy infrastructure on security and technology resilience is well documented and it has a significant impact on firm’s change management capabilities. Having a significant proportion of legacy infrastructure within an IT estate limits the flexibility of processes and prevents firms from taking advantage of new developments, release and deployment methodologies, which we observed in our review as contributing to change success. The high-level of uncertainty and risk involved in changing legacy infrastructure requires substantial testing and management oversight.
Public cloud-based infrastructure
Chart
Data table
Public cloud computing is changing the way infrastructure is designed and implemented. While companies have traditionally relied on physical data centres or colocation facilities, they are increasingly taking advantage of the scale and flexibility of public cloud offerings. Migrating to and managing public cloud environments presents several challenges, including a lack of oversight and direct control. However, public cloud environments can also provide numerous benefits.
Public cloud computing is not a new concept and has been a critical part of firms’ IT estates for a number of years. A large number of high profile firms have announced their move towards public cloud computing over the last few years. However, our research found that the majority (78%) of production applications are still hosted in on-premise environments. While adoption rates varied across participants, attendees at the workshops expected their firm’s use of public cloud computing to increase.
One of the main benefits of change management in the cloud is that it enables a high degree of automation. This can reduce the manual risks of change and increase the agility of incident response. While automation doesn’t guarantee that a change won’t have an adverse impact, it can reduce the risks involved. Automation drives repeatability and consistency through the change lifecycle, reducing the risk of human error and enabling the creation of identical environments for predictable and testable outcomes and automating aspects of recovering from change incidents. Public cloud offerings also allow for the automated scaling of IT environments, removing the need to provision additional resources to meet business demand.
Annex A
Methodology
We used a data-driven approach for this review, based on the IT service management information held by firms. Our analysis covers over 1m changes implemented into production environments over 2019. We also reviewed metrics from the nearly 20,000 incidents resulting from change over the same period, as well as qualitative data on how firms deploy and assure change management activities. A confidential questionnaire was also sent to firm board members to better understand the drivers for change and their level of confidence in their organisation’s ability to deploy change effectively.
The information request sent to firms was broken down into 4 sections:
- IT Service Management quantitative questionnaire
- Change Management qualitative questionnaire
- Full change and incident ticket analysis
- Confidential Board questionnaire
1. IT Service Management Questionnaire:
To ascertain the level of disruption from failed changes and determine what types of change carry the highest level of risk, we requested 18 key operational metrics from 23 firms. This included the total number of IT changes implemented by change type over 2019, the number of incidents resulting from change by type, the proportion of incidents that had an impact on customers and the average change implementation duration over 2019.
2. Change Management Qualitative Questionnaire
Our Change Management Qualitative Questionnaire looked to obtain information around 4 key areas that have an impact on change management processes.
The information requested aimed to capture information around:
- technology change: the key controls in place to manage the risk of change, how firms track, categorise and record changes across their estate, the length of time governance arrangements have been in place, firms change budgets and the key drivers of change within the industry
- programme and project management: the methodologies used by firms to deliver change, the common characteristics of 'risky' changes, benefit realisation and programme risk management
- infrastructure: the underlying network, hardware and software components that support IT services
- build, test and deployment: the methodologies, techniques and technologies used to release and assure technology releases and deployments
3. Full Change and Incident Ticket Analysis
4 additional firms were asked to provide their full change and incident logs for in-depth analysis. These logs contained all the change and incident tickets created by the organisations over a year. This provided the project team with an opportunity to perform an in-depth analysis of how incidents and changes were categorised and an understanding of incident trends.
4. Confidential Board Questions
A short, confidential questionnaire was sent to participating firm’s board members to ascertain their level of confidence with their own ability and their firm’s ability to perform technology change. The questionnaire was sent via the FCA’s survey tool and no attempt was made to identify individual respondents. The questionnaire asked respondents to rate their level of confidence in their ability to approve technology change as well as how confident they were in their organisation’s ability to deploy change without incident.
Target Population and Sampling Methodology:
A sample of 23 firms of varying sizes, business models and customers bases were selected from across the FS industry. The dataset was separated into the broad sectors based on their Group regulatory model. Included in the main review were 8 Wholesale, 8 Retail, and 7 Insurance firms alongside a number of additional firms invited to our change workshops.
Statistical Analysis Methods:
- Cluster Analysis: We derived IT change performance profiles with a data-driven approach, using hierarchical cluster analysis. In this approach, the response of those in one group are similar to each other and dissimilar from those in other groups. To calculate change performance a range of metrics were used including the average number of incidents per change, the severity of the resulting change related incidents, whether the incidents were customer-facing and the average duration of incidents. Firms were grouped by their change performance scores and common traits identified.
Regression Analysis: To estimate the relationship between independent variables regression analysis was used.
Annex B
Proof of Concept – Summary
The FCA’s PoC requested 14 metrics from 5 firms covering calendar year 2018. Unfortunately, 1 firm was removed from scope due to the quality of data submitted. Unlike the 2020 information request, the PoC requested full change ticket and incident logs for all systems and applications. The scope of the request included all regulated entities/intra-group changes and incidents that impact UK business and third-party incidents. This amounted to nearly 8.5 million data points.
Our PoC helped to inform our approach to the methodology of the information request, but also the findings were in line with our overall report findings. Our key findings from the 4 firms who responded were:
- Firms are putting through a substantial amount of changes each year, these 4 firms implemented over 100,000 changes.
- The change success rate was 98.9%.
- The average emergency change rate was 7.5%.
- 16.4% of total incidents had a root cause relating to change activity.
- CABs were also not being used as effectively as they could be to mitigate the risks associated with change.
Annex C
Glossary
| Term | Definition | 
|---|---|
| A/B Testing | Similar to canary deployments A/B testing involves splitting traffic between version A and version B of software. However, the subset of users routed to the new functionality are chosen due to specific conditions (eg location, customer type, query parameters etc). | 
| Agile Project Management | A project management approach based on delivering requirements iteratively and incrementally throughout the project life cycle. | 
| Application | Computer program or set of programs that performs the processing of records for a specific function | 
| Automated Deployments | Where any code commit that passes the testing phase and obtains the necessary approvals is released into production without or with extremely limited (e.g. a small number of clicks) manual intervention. | 
| Availability | Property of being accessible and usable on demand by an authorised entity. | 
| Business Continuity | Business continuity refers to the core conceptual resources that address future threats to a business and help business leaders handle the impacts of these threats. | 
| Benefit Realisation | The process of planning, structuring, measuring and delivering the benefits of a business change or improvement project. | 
| Business impact testing | Business impact testing is a component of business continuity testing that helps to identify critical and non-critical systems. | 
| Blue/Green Deployments | Version B (green) is deployed alongside version A (blue). Traffic is routed to both versions. After testing that the new version meets its intended objectives traffic is transferred from version A to B. | 
| Canary Deployments | Similar to blue/green deployments canary deployments involve running version B and version A at the same time. Traffic is split and gradually increased once confidence in the new release is established. For example, when version B is first released traffic could be slit 10%/90% with version A. This ratio will gradually be increased until version B has 100%/0%. | 
| Change | The addition, modification or removal of anything that could have an effect on IT services. The scope should include changes to architecture, processes, tools, metrics and documentation, as well as changes to IT services and other configuration items. | 
| Change Advisory Board | A governance body that support the assessment, prioritisation, authorisation and scheduling of changes. | 
| Change Management | The process responsible for controlling the lifecycle of all changes, enabling beneficial changes to be made with minimum disruption to IT services. | 
| Change Ticket | A request containing a formal proposal for an alteration to a product or system. | 
| Cloud | Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centres available to many users over the Internet. e.g. AWS (Amazon Web Services), Microsoft Azure, Google Cloud etc. These can be rapidly provisioned and released with minimal management effort or service provider interaction. | 
| Core Applications | An application that has been deemed important or critical to the on-going operational success of the business. | 
| Customer-facing | An incident that is noticeable to end consumers either through service unavailability or degradation. | 
| Deployment | All the processes involved in getting new software or hardware up and running properly in its production environment, including installing, configuring, running, testing and making necessary changes. | 
| DevOps | A set of practices that combines software development and technology operations with the aim of shortening the software development lifecycle and aligning the objectives of operations and development teams. | 
| Duration | The elapsed time (in minutes and hours) between when the incident first started and when it was resolved. Please note that the incident start date can differ from 'Incident Date' if it remained undetected for some time. | 
| Effective or Effectiveness | High level of assurance that the proposed change(s) or action that will be implemented or has been undertaken will bring or has brought about the desired or intended result. | 
| Emergency Change | A change that must be implemented as soon as possible and follows an expedited process, for example a change required to resolve a major incident or implement a security patch. | 
| Environments | A segregated system in which a computer or application operates. This could include operating system, hardware and database etc. | 
| Hybrid Cloud | A computing environment that uses any combination of on-premise, private cloud or public cloud services with linkage between the 2 or more platforms. | 
| IT (ICT) | IT - Information and communications technology (ICT) is an extensional term for information technology (IT) that stresses the role of unified communications and the integration of telecommunications (telephone lines and wireless signals) and computers, as well as necessary enterprise software, middleware, storage, and audio-visual systems, that enable users to access, store, transmit, and manipulate information | 
| Incident | An unplanned interruption to an IT service or reduction in the quality of an IT service or a failure of a configuration Item that has not yet impacted an IT service. | 
| Incident Management | The process responsible for managing the lifecycle of all incidents. Incident management ensures that normal service operation is restored as quickly as possible and the business impact is minimised. | 
| Integration Testing | Where individual components are combined and tested as a group to expose faults in the integration between dependant units. | 
| Interface Testing | Software testing that verifies whether the communication between 2 different software systems is operating as expected. | 
| Intragroup Teams | The use of another entity within the firm’s group to provide services to a UK regulated entity. | 
| Kanban Board | An agile project management tool designed to help visualise work. Kanban boards visually depict work at various stages of a processes often using cards to represent work items. | 
| Legacy Technology | An outdated application, technology or programming language that is still in use instead of available upgraded versions. | 
| Major Change | A change that is deemed to be high risk or high impact. These changes usually require management and CAB approval. | 
| Microservices | An architectural style that structures an application as a collection of loosely coupled and independently deployable services. | 
| Normal Change | A change that must follow the complete change management process. | 
| Off-Shore | Where specific activity is conducted in a separate country to the majority of work or where your legal entity is based. | 
| Operational Resilience | The ability of a firm to: (i) maintain essential operational capabilities (resumption) under adverse conditions or stress, even if in a degraded or debilitated state; and (ii) recover to effective operational capability in a time frame consistent with the provision of [sic] services (recovery). (IOSCO definition: http://bit.ly/1YDIyie). | 
| Private Cloud | A single tenant cloud environment dedicated to the end user where resources are hosted on scalable, virtualised servers and accessed across a network. Private clouds can be operated on on-premise or vendor-owned infrastructure. Private clouds leverage a cloud management platform and orchestration to allow for rapid elasticity, on demand self-service and measured services. | 
| Programme | Programmes co-ordinate, direct and oversee the implementation of a set of interrelated projects to deliver outcomes and benefits which are aligned to an organisation's strategic objectives. | 
| Project | A planned piece of work or activity that is finished over a period of time and intended to achieve a particular purpose | 
| Public Cloud | A platform run by third parties that makes standardised computing services available to the public over the internet. (e.g. Amazon Web Services, Azure, Google Cloud Platform) | 
| Quality Assurance | Quality Assurance (QA) is a way of preventing mistakes and defects in manufactured products and avoiding problems when delivering products or services to customers; | 
| Regression Testing | A suit of tests designed to confirm that a recent change has not adversely affected existing features. | 
| Resilience | To identify, document, analyse and manage the resilience needs placed upon business services, and to ensure the delivery of these services meets those needs | 
| Reviewed | An evaluation of a change, problem, process, project to ensure that all deliverables have been provided, and to identify opportunities for improvement | 
| Rollback | Revering to an older version of the system or data due to an inadvertent change or mistake. | 
| Rolling-Update Deployments | The rolling-update strategy consists of slowly rolling out a version of an application by replacing instances one after the other until all instances have been updated with the new software version. | 
| Root Cause Category | A root cause tag that gives an indication of the underlying or original cause of an incident e.g. 3rd party failure, hardware, software, human error, change management, cyber-attack. | 
| Sanity Testing | A quick, broad and shallow functional test used to determine whether it is possible to proceed with further testing e.g. determine whether the system is accessible and the application logic is responsive. | 
| Service Level Agreement (SLA) | A service level agreement (SLA) is a contract between a service provider and its internal or external customers that documents what services the provider will furnish and defines the service standards the provider is obligated to meet. | 
| Severity | An incident categorisation that indicates the impact or urgency of the event for the business/organisation often based on impact and priority. | 
| Software | All or part of the programs, procedures, rules, and associated documentation of an information processing system | 
| Standard Change | A pre-authorised change that is low risk, relatively common and follows a specified procedure or work instruction. | 
| System Testing | Where the complete and fully integrated software is tested in the wider computing environment to evaluate the system's compliance with specified requirements. | 
| Systems | Servers, applications, network devices | 
| Testing | A process to evaluate the functionality of code with an intent to find whether the developed software meets the specified requirements and to identify any defects. | 
| Third-Party | A supplier of goods or support for a product or service, who is neither the primary vendor nor the purchaser | 
| Third-Party Management | Any business relationship or contract between a firm and another organisation, including a company in its group, to provide a product or service. There may be third-party arrangements that are not outsourced relationships i.e. functions that the organisation may not be typically be able to perform themselves | 
| Third-Party Provider(s) | A person, organisation or other entity that is not part of the service provider’s own organisation and is not a customer – for example, a software supplier or a hardware maintenance company | 
| Types of Change | The categories used to group changes with common characteristics e.g. standard, emergency, pre-approved etc. | 
| Unit Testing | Where individual components of software are tested to validate that each unit performs as designed. | 
| User Acceptance Testing | A type of testing performed by the end user or client to verify and accept the software before moving the application to production. | 
| Waterfall Project Management | A project management approach that utilises distinct stages and moves step by step toward release to customers. | 
