Improving Testing for Operability

Accenture (Vivaldi)

Reliability, handling failure gracefully and recovering quickly are becoming increasingly important as the software development world adopts DevOps culture and practices. Outages and security failures are big news and many companies are investing heavily to avoid these challenges. Operable systems are easy to deploy and test, provide actionable information about their state and behave more predictably in adverse conditions.

Testers on development teams are often used to testing changes to the functionality of an application but less so testing how operable a system is. My recent experience has seen testers on teams charged with improving operability for systems through better logging, monitoring and system control measures (such as feature flags) to emit better information. This information on system stability and state is critical to testing and we can influence its creation profoundly.

Why it is important for testers

  • As the operability of systems becomes a greater focus, testers need to be equipped with models to think about how to add value in this context.
  • As testers, we strive to add value and testing for reliability enables us to use our risk analysis skills to explore for failures and how to recover.
  • If we get involved with helping our system to emit better information from an operability standpoint, testability through observability and control will likely be enhanced.
  • Rather than having shallow status checks, testers can contribute to meaningful monitoring of customer journeys and how reliability and recovery are measured.

Takeaways

  • Recognize the key terminology pertaining to logging, monitoring and system control measures and their role in operability.
  • Understand how to test systems for the quality of operational information that they emit and how this can help improve information gained through testing.
  • Apply the understanding of operational insights to testing deployment pipelines and operational hooks to enhance overall operability.

 

Prerequisites

App install instructions
Prometheus and Alert Manager Install Instructions

Schedule

09:00 – 10:30  Part 1
10:30 – 11:00  Coffee Break
11:00 – 12:30  Part 2
12:30 – 13:30  Lunch
13:30 – 15:30  Part 3
15:30 – 16:00  Coffee Break
16:00 – 17:30  Part 4

Rate this tutorial

Sli.do – Y035

 

Post Workshop materials

Thanks for your attention and participation on Thursday, great to meet
you all.

All the slides and other materials can be found here:
https://github.com/northern-tester/moiling-operable
The app under test is here:
https://github.com/northern-tester/conferencesApp

There are 4 branches:
* master - the base version of the app
* structured logging - the bulk of the exercises we did
* instrumented-api - the version of the app that will emit data to
Prometheus
* operability-hooks - app with action and info operability hooks added

Some specifics asked about during the session:
* Writing logs to file for ingestion - code is here [1]
* Prometheus and Alert Manager - notes and sample queries here [2]

To get your teams started I would facilitate - or get someone else to
facilitate a run book draft creation session:
https://blog.softwareoperability.com/2013/10/16/operability-can-improve-if-developers-write-a-draft-run-book/

Good luck with improving your testing through better operability!
Automation Full Day Tutorial