Blog Spain

How to reassure Service Managers: TestOps

Santiago Martínez, Head of QA & Testing

This situation has obviously placed more pressure on QA teams and has necessitated resorting to improved Shift Left Testing to anticipate problems, test automation, Continuous Testing, etc., before they appear directly in production.

Santiago Martínez

Santiago Martínez, Head of QA & Testing

Traditionally, testing in production environments has always been seen more as a problem than as a solution. By testing and looking for errors directly in operating environments, productive environments could be easily destabilised, and we also ran the risk of corrupting business data.

So the response was: Highly forbidden! Technological annihilation!

Necessity is powerful:

In recent years, however, the more vigorous push for agility as an approach to software development and the emergence of DevOps capabilities have greatly accelerated the process of delivering software to productive environments, in order to meet business needs and growing user demands.

But this is not enough…

To properly address all the challenges that a flood of – if not continuous, then certainly very frequent – deliveries and deployments pose to production, we need to put even more energy into the QA process.

In this scenario, we are more likely to have unstable code in production and problematic deliveries that need to be reversed to bring stability back to the system. Typically, problems in production focus on configuration issues, loss of performance and poor UX.

This, of course, makes Operations Service Managers and business teams very nervous…

How can we resolve this?

From my experience, the answer would be: QAOps, TestOps, Shift Right Testing or whatever we want to call it… but you have to act.

We need to try to implement new QA techniques that will help us to mitigate those risks, while also avoiding the instability or data distortion issues that could be caused by the tests.

Some techniques would be:

1. Continuous quality monitoring: The aim is to evaluate the quality of the system at any stage of the software life cycle. The quality of the application is monitored using various tools and techniques available on the market.

Code instrumentation: This is an already long-established technique, including libraries for monitoring, debugging and logging in the source code of our applications to give us information when a crash occurs in the execution. We will have logs with traces that “illuminate”, allowing us to track the errors that are occurring.

Actual monitoring: We would use real users (Friends & Family, for example, or Crowdsource testing services) to get feedback on the application regarding usability, compatibility and user experience. This information will be used to fine-tune the application in future releases.

Synthetic monitoring: We will build predefined automated tests that probe the availability and performance of certain business flows that may be considered critical to the application.

2. Chaos testing

There will be errors that we are not able to find in previous stages that are not production or mirror environments. Based on Chaos Ing. by Netflix, the aim is to intentionally introduce errors or attacks in the infrastructure on which our system runs, in order to check the resilience of the system in a controlled environment and prepare for when these actually occur. We can introduce, for example, storage stress, machine crashes, network delays, CPU overloads or several of these failures in a cascade to see how resilient our platform is and what the emergency solutions would be if this situation were to occur.

3. A/B Testing

This technique is used to compare a new version with the previous one and check which one works better and provides better functionality, performance, usability, etc., so that this is the candidate chosen for production. Testing to compare versions is carried out using a combination of users and monitoring tools. The emergence of cloud, dockers, infra as code and kubernetes has helped us significantly in terms of implementing A/B testing more easily.

4. Canary Testing

Also known as “incremental release testing”, this is a technique to distribute new functionality to just a small group of users, so that if there are errors in the new software distribution the impact is minimal and they are detected in a controlled environment. Once again, Friends & Family (people close to the organisation) can be used to ensure quick notification of the error without it affecting the brand image. Again, with the help of DevOps, this technique is now easier to perform.

By the way, as some of you may know, the canary idea comes from the birds’ use in mines as “testers” to detect toxic gases. In this case, there is no need for the software, the tester or the business user to faint like a canary when toxic bugs appear in the production environment, because the aim is to avoid this situation.

In summary:

By applying these techniques, teams focused on using TestOps should be able to get Service Managers and Business Managers to have more confidence in their move towards production and avoid “heart attacks” due to serious and unexpected errors in performance when a new release is deployed.