At Traffic Parrot we have seen several companies embark on the API-first journey, a few of which also approach their API strategy consumer-first and implemented Consumer-Driven contracts.
In this short article, we summarise two categories of engagements we have seen faced by the teams:
Implementing API-first approach for teams that have not done API-first before and starting their journey with microservices (up to 5-20 microservices in production)
Implementing Consumer-Driven Contract Testing on teams that are already proficient in API-first development approach and face issues with large scale contract testing of microservices (more than 50-100 microservices in production)
Useful terms to know
API-first development: the move to microservices typically also drives an API-first development approach, where teams define upfront business contracts between each other using APIs. A sample API specification language: OpenAPI or Protobuf.
Consumer-Driven Contracts: When you design an API, its syntax and semantics will be designed by the API producer team by working closely with the teams that will consume the API as opposed to the API producer team making assumptions based on the data model how the API should look like. The API consumer teams drive the shape of the APIs.
Consumer-Driven Contract Testing: A type of contract testing where we are ensuring in automated tests that the contracts designed in a consumer-driven fashion are met. This allows confidence that a new release of a microservice will not result in breaking API changes in production. A sample tool: Pact-JVM.
High-level summary
It has been our clients’ experience that:
The API-first approach is an effective way to parallelise work between teams working in fast-paced environments and microservice-based architectures.
Allowing teams to work closely with each other and design APIs in a consumer-driven fashion helps drive quality and reduce time to market no matter the team size and deployment scale.
Although consumer-driven testing is a well-grounded way of testing contracts it can be counterproductive to introduce it to teams that are not experienced in the API-first approach or run less than 5-20 microservices in production.
Case study: a large media company using an API-first approach and Consumer-driven testing
A global media company had an existing stack of 100+ microservices that was primarily tested with automated BDD E2E tests. The releases were done every 2-4 weeks.
These tests were costly to maintain because it took significant developer time to write and debug the suite of tests. The suite of tests took 2 hours to run, and up to a week to investigate issues before every release.
Developers were often frustrated as the tests were flaky due to the complexity of the system under test, leading to many non-deterministic failure points. These tests would prevent them from releasing new features on-demand as the tests took a few hours to run.
With this experience, the company decided to avoid E2E testing for the new product they were working on.
The main way they decided to grow confidence in the contracts between new microservices and the behaviour of the product as a whole was to design contracts in a consumer-driven way. The company chose consumer-driven contract testing with Pact-JVM to test those contracts. Most of the teams were entirely new to consumer-driven contracts.
Equipped with a solid API-first design approach background across all teams, and several years of experience with automated BDD-style Acceptance and E2E testing, as well as TDD on unit-level they felt confident in learning the consumer-driven contract testing approach. Unfortunately, that proved not to be true after a few months of working on the problem. It was the teams’ experience that about a fifth of the developers did pick up the new workflow rapidly but the majority of the team had issues understanding the new workflow even after a few months of working on it and about a fifth of the team even after 12 months did not onboard themselves fully to the new process.
There was a change to infrastructure happening at the same time, moving away from bare-metal to Kuberenetes and Docker deployments.
The J-Curve Effect of this transformation was noticeable but manageable. Teams adopted just a few new tools and practices as part of this transformation (PactJVM, Docker, Kubernetes) and it was enough to keep developers and testers busy changing their daily habits. In retrospect, the investment in consumer-driven contract testing was justified but the ROI was not significant, close to breaking even.
Case study: a global e-commerce giant using API-first approach
The company decided to move away from a monolithic architecture to more autonomous teams and microservices. As part of that transition, they decided to recommend good practices rather than force the use of specific technologies and solutions onto teams, trusting the teams to make the right judgements on which tools and techniques will make the most positive impact on their daily activities.
The developers wrote integration tests and used an API mocking tool Traffic Parrot to mock dependent components. They also wrote Cucumber/Gherkin BDD acceptance API tests to capture the business requirements (they called these "contract tests"), which use a Docker image of the microservice and a Docker image with the Traffic Parrot API mocks. The BDD tests verify both the microservice API and interactions with dependent components by verifying the interactions on the API mocks. That way, the BDD tests verify both microservice API request and response and all communication with dependent components by assertions and verifications.
The company decided to create the API mocks in two ways.
First, if the API that a developer wants to consume already exists, they create the API mocks by recording requests and responses. A developer starts by creating a new test on their computer. They then run the test and create API mocks by recording them. They commit the tests and mocks to the microservice project in Git. In a QA pipeline (a pipeline that is run per commit to checking the quality of the product), they start a Docker container that runs the API mocking tool and mounts the mock definitions from the microservice project.
Second, if the API the microservice will consume does not exist yet, a developer will create the API mocks from OpenAPI specifications for HTTP REST APIs or create the API mocks from protocol buffer specification files for gRPC APIs.
They also develop and run automated E2E smoke tests. This is one of the techniques to test contracts between microservices and it makes sure groups of microservices work well together. The presence of the E2E test suite is justified as it is tested not only the producer side of contracts, which is tested in the BDD tests, but also the consumer side, and so provides more confidence. The architects monitor the number of E2E tests. They keep the complexity of the E2E test suite at a level that does not cripple the release process or daily development activities.
The move to consumer-driven contract testing was not justified in the eyes of the architects and individual teams on an enterprise-scale as the J-Curve Effect impact would be too significant and the ROI not justifiable in a reasonable timeframe. Out of the hundreds of developers that work for the company, only a handful decided to use consumer-driven contract testing, as they were already very familiar with the API-first approach and had several years of experience in BDD.
Case study: an InsurTech startup using API-first approach
The company had a handful of teams working on a dozen of microservices. The microservices were replacing part of a deprecated monolith.
The producer teams designed the APIs for the teams to consume. To manage contracts between teams and allow teams to work in parallel, they decided to use API mocks that the API producers created and shared with the API consumers. They created the gRPC API service mocks using an API mocking tool Traffic Parrot.
They also used a handful of manual E2E tests in a pre-production environment to make sure that the microservices would work together in the absence of sufficient automated testing. They would fill the gaps in automated testing on an ongoing basis, which took 12 months of ongoing part-time effort.
To make sure mocks do not get out of date, and to hit their aggressive deadlines, the company decided to test the API mocks by firing a request at both a mock and a real microservice. They compared both responses to a contract definition of expected request/response pairs as defined in a company-specific custom format. This way, the API mocks and the real service was proven to be up to date when compared to the latest definition of the expected behaviour defined in the contract file. This proved to be a good enough solution to remove the bottleneck of testing contacts and allow the teams to focus on delivering the features to customers.
The team contemplated using consumer-driven contact testing but at first glance, it seemed like they would be shooting a fly from a cannon and they have decided to stick to their original good-enough testing plan and revisit the issue in 12-24 months while keeping an eye on the complexity of the existing suite of tests so the maintenance costs do not become significant.
No comments:
Post a Comment