SpiderOak's highly creative team often means that we have excellent new concepts for initiatives, but little bandwidth to experiment and implement these ideas. In 2018, and continuing into 2019 our QA team has sought to create time in our schedule so that we can explore new testing methods, dive deeper into functionality that we feel would benefit from more rigorous review, or just experiment with our software to create increasingly stable software that can support more off-label uses.
At the same time, we've recognized that just as an over-the-road professional driver can experience Highway Hypnosis from consistently engaging in the same activities, QA Analysts performing the same series of tests in tight succession can slip into a state in which everything starts to look like a bug, or nothing looks like a bug. When QA Analysts experience automaticity, the quality of the testing performance slips. Constantly performing rote testing also reduces the positive impact of QA on the development cycle.
We wondered if changing rote testing could reduce testing fatigue and buy us the time we wanted for more creative testing. But we didn't want to ditch it completely. Rote, proscribed testing is an incredibly powerful tool for QA. It allows for precise identification of regression bugs in software, clearly defines testing scope, and provides consistent metrics over time. For these reasons, rote testing of software with pre-defined steps and outcomes will continue to be the backbone of a good QA process. However, we've recognized that rote testing has limitations that can stymie more nuanced QA efforts.
- Rote testing can trigger automaticity (highway hypnosis, QA-style.)
- Rote testing takes a significant amount of time as steps must be followed precisely. This has opportunity costs.
- Rote testing often heavily tests stable functionality, which reduces the time QA can spend testing new features, or features with prior stability issues.
- Rote testing may need to be restarted, from scratch, depending on fixes required for bugs found during testing.
- Rote testing does not allow for creativity in testing.
- Even the best rote tests fall short of simulating realistic user activity.
In short, rote testing has a fairly even effort-to-value ratio. We felt that our biggest challenge was adding value to rote testing.
Automation of tests is a popular choice when seeking to simplify rote testing efforts, but automation has disadvantages – high up-front costs, false positives, false negatives, and so on. Automated testing can take as much time as rote testing, and be far more frustrating. When a product is rapidly changing, automated tests can lag behind development. For a small business, these disadvantages may outweigh the possible benefits, especially during rapid prototyping sprints, or when rolling out a series of refreshed products. We felt that automation was not the right step for SpiderOak.
A best fit solution for us was streamlining the current suite of rote tests. Streamlining seeks to reduce the number of tests run, but maximize the impact of the testing done.
When we began streamlining our testing cycles, we first reviewed the rote tests and sorted them into three categories:
Core Functionality: functionality that is likely to be accessed in a typical week of using the application, or is required to use the application at all, such as signing up. This category includes common error paths.
Peripheral Functionality: functionality that the average user will use on occasion, or functionality that is role-specific. This category may include unusual errors, or uncommon/secondary ways of triggering more common functionality within the application. This category may also include functionality that only affects some platforms, but not others.
Uncommon Functionality: functionality that is rarely used, or rarely encountered by any user. This category should include highly unusual errors, or functionality that exists but will be rarely seen by a user. An example of such would be the default settings for a screen that the user would only visit after a change of events brought their attention to it. That is, the user would rarely, if ever, see the default version of the screen. This category may also include functionality that only affects some platforms, but not others.
While not all tests will suit the following guidelines, most will. After sorting, we determined that:
- Tests covering core functionality should always be tested, and if an application is deployed across several platforms, core functionality should be tested on each of them.
- Tests covering peripheral functionality should be run only on a few platforms during typical testing. For example, if a product is destined for Ubuntu, Debian, Fedora and CentOS, then this functionality might be tested on Ubuntu or Debian, and Fedora or CentOS, but not on all four. However, note that each platform receives coverage because test are run against both .deb and .rpm builds.
- Tests covering uncommon functionality should only be run on one platform during typical testing. This category is especially useful for tests which trigger server-side functionality using the client, or reviewing links within the client that take users to web locations.
Keep in mind that even though new rote tests or suites may be created for newly launched functionality, we assumed that new functionality would be tested more thoroughly at launch, via playtesting, and free-form testing outside of the rote testing cycle. It’s a good idea to make exceptions upwards (testing more often) for the first few rounds, and also when uncommon functionality is known to be a weak point in the application or if a significant subset of users are known to use the application in a unique fashion which means they encounter functionality more frequently than was intended. The sane approach, from both a business angle and a QA angle, is to keep tests as core functionality for an extended period rather than pull them back in after moving them. That said, exceptions downwards (testing less often) can also be done, and this makes sense in situation when functionality being tested is a variation on functionality that is already tested. For example, if one of the tests is sending another user a picture, and another is sending another user a very large photo file, it may make sense to test large files, or unusual file types more rarely than smaller, more common files.
However, the magic ingredient, which vastly reduces QA fatigue and encourages creativity to the rote testing of core functionality is the “Sandwich” test. We nicknamed these tests "Sandwich" tests because the way they are written resembles a “Build Your Own Sandwich” order sheet. Sandwich tests address the reality that most core functionality can be triggered via several paths. Testing each path on each platform is exhausting, and often not revealing. Sandwich test cases are semi-structured, with a selection of possible actions the tester may take, or it may simply list examples of how to trigger an in-line error on a field that may be triggered by a multitude of input mistakes. The tester selects from the presented options, with the caveat that its expected that different pathways or selections will used the next time the test is performed. (At this time, we trust our analysts to keep track of what paths they have tested, but spreadsheets could be used in the future.) When testing this way, all pathways are eventually tested during the total testing effort, just not on the same platform. "Sandwich” tests can be applied to tests that aren’t part of the core functionality, but care will need to be taken to determine if all pathways will be tested within a reasonable time frame. In our experience, it may meet business needs for some functionality to be tested quarterly or yearly, but some functionality may require more frequent review.
Sign-in functionality is a good example of functionality that can be broken down into a "Sandwich" style test. In many applications, sign in can take several forms.
The user could:
- Sign in using an existing account on the system.
- Create a new account.
- Use an account they have elsewhere, such as Facebook or Google, to sign into the service.
All of these options could have a happy path (successful sign in) or a number of error paths such as:
- Signing in with partially correct, partially incorrect credentials.
- Signing in with wholly incorrect credentials.
- Providing invalid input into sign-up fields. This in particular may take many forms such as:
- Non-matching passwords
- invalid email format
- failing a captcha
- missing fields
- "junk" input, such as providing &^% bLOB street ste 40000000000000000000 as an address
In a "Sandwich" style test, a user may choose from one of the sign in options (for example, signing in using Google) and choosing two errors from the list, and then verifying that they can sign in with correct credentials. When testing the application on the next platform, they may choose to create a new account, select two errors related to signing up, and then test the happy path. After testing is complete on all platforms, the functionality has been fully tested.
By categorizing tests and running them in this fashion, we found a time savings that amounted to almost an entire full-time employee. As we grow larger and implement new iterations of this idea, we hope to realize even more significant time savings. As it stands, these time savings have allowed QA to test the application in more creative and realistic ways. These testing efforts may not have a broad scope, but allow Analysts to dig more deeply into what they are testing. We believe that the streamlined approach gives us the flexibility to tailor our testing effort to match the character of the release being tested; more intense testing for releases that feature exciting changes, and lower-intensity testing for minor upgrades, while retaining high confidence in our results. We believe our customers ultimately receive a higher-quality product when our QA team has the time to be creative during test cycles.
If streamlining sounds interesting, we'd suggest cribbing a few tips from our team. Though we know we haven't perfected this method yet, we feel that it's best to:
- Start with a simpler product that's been stable for several releases.
- Document decisions and discussions regarding how to evaluate tests, and how to categorize them. This is one step we wished we'd taken. Documenting it after the fact is far less effective.
- Take advantage of the review and discuss awkward functionality and design with the appropriate teams.
- Review the product with your customer relations team – ours was quick to point out functionality that was at the core of many customers's concerns. We made sure that tests that reviewed those protions of apps were tested every cycle, on every platform.
- Review the results. After our first cycle testing with streamlined tests, we made changes.