Skip to main content

· 8 min read
Thomas Rooney

Reflow v4.14.0

If your end-to-end tests are slow, you will either avoid running them, or waste time and delay your release cadence. A fast end-to-end test suite is a valuable asset to your team's productivity.

We released Reflow v4.14.0 today, which reduces our average end-to-end test sequence time by ~70%. We're writing this article to explain how. Reflow is a low-code tool that helps your team develop and maintain resilient end-to-end tests. We use playwright under the hood, so anything we've done can be applied regardless of if you use Reflow or not.

Embarrassingly Parallel Testing

Your end-to-end test suite must run user steps in a flow syncronously, but should run multiple independent flows in parallel.

In reflow, we embrace this with our distributed architecture. Test Composition with Pipelines allows for N servers to be created to handle N user flows: you just need to design your test suites to be able to run in parallel.

Despite this ethos, synchronous code can easily creep in.

For example, the following code handles a progressive test update providing realtime user feedback: taking in an array of updating steps and executing them sequentially.

for (const action of toUpdate) {
await queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
mutation: shallowUpdateActionInContext,
variables: {
input: action,
},
});
}

This can trivially be re-written to run in parallel.

await Promise.all(toUpdate.map((action) =>
queryAll<ActionInContextModel, MutationUpdateActionInContextArgs>(client, {
mutation: shallowUpdateActionInContext,
variables: {
input: action,
},
}))
);

Reflow uses AppSync Resolvers with Serverless DynamoDB to power our APIs. This scales up/down as needed, so we see absolutely no negative impact doing more in parallel. We fixed this is v4.12.1, and in v4.14.0 we're enhancing that further by introducing an additional caching layer to reduce S3 object pushes for screenshot images.

Conditional Stability

Reflow learns about your application, and uses that knowledge to keep recorded tests stable. However, keeping recorded tests stable often takes time; from v4.14.0 Reflow now dynamically reduces its stability methods by introducing additional logic to determine if they're necessary or not.

Prior to v4.14.0, this section of code would always wait for the three events after every action: a load event, a networkidle event, and screenshotstable event.

private async pageStable(baselineAction): Promise<void> {
try {
await this.page.waitForLoadState('load', { timeout: 30000 });
} catch (e) {
logger.verbose(this.test.id, "waitForLoadState('load')", e);
}
try {
await this.page.waitForLoadState('networkidle', { timeout: 5000 });
} catch (e) {
logger.verbose(this.test.id, "waitForLoadState('networkidle')", e);
}

await this.screenshotStable(baselineAction?.preStepScreenshot?.image);
}
  1. load - wait for the load event to be fired. This event fires when all markup, stylesheet, javascript and all static assets like images and audio have been loaded.
  2. networkidle - wait until there are no network connections for at least 500 ms.
  3. screenshotStable - waits until the screenshot either matches a given baseline screenshot, or the page stops animating as two subsequent screenshots look the same

Unfortunately, if the page is already stable, but has background network requests, this will introduce a 5s delay due to the networkidle state. To avoid this, we now only wait for networkidle when the action historically had a navigation, or when the run is executed with an optional flag to enforce all stability checks.

The lesson here is to only use stability methods when it's truly necessary to do so. Do not add them arbitrarily to your code, but sprinkle them across it when test actions need a stability enhancer.

Reflow will dynamically learn about your application by hooking into DOM and Network events. If you want to spend less effort maintaining your end-to-ends, give it a try.

No more explicit waits

In many companies, when there's an issue with end-to-end stability, adding a await wait(1000) or equivalent to just delay test action execution is a common strategy.

This is a far greater sin than arbitrarily using stability methods. Stability methods will exit early once their stability event fires. If you're feeling lazy, at the very least try a waitForLoadState rather than a wait first.

If you're up for writing a bit more code, try writing a waitUntil clause that explicitly waits for some page state to be set when the page is stable. Whilst reflow supports a wait action, we advise our clients to always use a visual assertion instead. This will wait (up until an arbitrary configurable maximum) until a page element looks like a recorded baseline. If it doesn't reflow can be configured to continue or fail the test.

A few wait(X) statements creep into the reflow codebase over time. It's tempting to add them into new features to lazily increase stability enough to ship the feature. In v4.14.0, we've ruthlessly culled all explicit wait invocations from hot code paths, replacing them with dynamic wait times tuned based on application under test.

Move compute outside hot pathways

Reflow uses a visual comparison algorithm (SSIM weighted Pixel Diff) to compute visual changes. It captures full-height web pages, then compares these to baseline images to compute stability and inform the user of page changes.

This is done entirely on the CPU, and is therefore relatively slow: for a page that's 1080 x 10000px in height we find it can block the main thread for 2-3 seconds. This has been a major performance bottleneck for large applications.

Some of these visual comparisons are done to compute stability, but for those that are for providing visual feedback they don't need to be on the main thread; instead we now return a promise that gets dynamically resolved when needed.

private async screenshotStable(baselineScreenshot: S3ObjectInput | undefined): Promise<{
diff: Promise<ComparisonModel | undefined>;
current?: { image: S3ObjectInput };
}> {
/* ... */
}

Move compute into worker threads

By default, all compute in Node.JS is single-threaded. It's usually not worth the effort to build multi-threaded applications: most node.js processes can handle work by distributing it amongst processes, rather than having to deal with threads.

In Reflow, because of the compute-heavy nature of image comparison, we've moved it into a worker thread. This ensures that there's no halting of the application during a visual comparison: all other asyncronous processes (such as realtime uploads of test progress) can be handled in parallel with image comparison.

We did this via the npm threads wrapper and esbuild. We first moved all of our compute code into a new file with minimal imports, called imageCompare.worker.js. We then added a pre-compilation step with esbuild to compile this file into a bundle. We then spawn the worker using this generated file as a blob, and interact with it via the threads promise interface.

import fs from 'fs';
import { expose } from 'threads/worker';
import { isMainThread } from 'worker_threads';

/* ... */

const workerExports = {
configureWorker,
compareFiles,
compareScreenshots,
};

if (!isMainThread) {
expose(workerExports);
}
export type ImageCompareWorkerExports = typeof workerExports;
import { spawn, BlobWorker } from 'threads';

import type { ImageCompareWorkerExports } from './imageCompare.worker';
import { configureWorker, comparePNG as workerComparePng, cropImage as workerCropImage } from './imageCompare.worker';
import { source as workerBlob } from '../../generated/imageCompare.workerSource';
import logger, { getLevel } from '../logger';

let worker: ImageCompareWorkerExports;

export async function bootImageCompareWorker() {
try {
worker = await spawn<ImageCompareWorkerExports>(BlobWorker.fromText(workerBlob));
configureWorker(getLevel(process?.env?.LOG_LEVEL));
return worker.configureWorker(getLevel(process?.env?.LOG_LEVEL));
} catch (e) {
logger.fatal('Error starting worker', e);
}
}

export async function compareFiles(imageA: string, imageB: string, outFile: string): Promise<void> {
return worker.compareFiles(imageA, imageB, outFile);
}

export async function compareScreenshots(preData: Buffer, postData: Buffer, options): Promise<ScreenshotCompareOutput> {
return worker.compareScreenshots(preData, postData, options);
}

Track your releases, run end-to-end tests once per release

Your end-to-end tests should be focused on catching regressions, not heisenbugs: if they pass once on a release you shouldn't execute them again.

In v4.14.0 we made our first step towards helping QA teams do this: source-map powered release tracking. Reflow now hashes and downloads all source maps associated with a given release (conditionally with a regular expression to filter them to application code) to create a version identifier. This means that it can track when your application is released, and hence allow you to determine if an end-to-end sequence has already been successfully executed on that release.

This is currently opt-in, as we're still working out how to make downloading sourcemaps not affect the performance of a test execution when large numbers of them are exposed. We've built a sourcemap explorer and a simple UI to help guide our customers on what's currently executing within an environment.

TL;DR

  1. Cull all sleep statements: always wait for specific events. Only introduce stability methods when necessary.
  2. Execute your tests in parallel
  3. Intelligently Cache Tests; don't bother executing them more than once per release

If you're feeling adventurous, try reflow: a low-code record/replay test automation SaaS that will try to do all this for you, fast.

Minor caveat: self-host reflow for maximum speed. We use AWS Fargate to spin up per-user ephemeral browser instances, which have ~1 minute cold startup times on the first use of a test recorder.

· 12 min read
Thomas Rooney

We want to build bug-free and high quality software, that satisfies a customer need. Unfortunately, it's not easy!

Software QA is a set of processes and practices to help ensure that the software we produce is of high quality, meets our requirements and is free of defects.

There are no silver bullets, no optimal tools or strategies to achieving good QA in every organization. However, there are often very significant QA inefficiencies and, with founders observing this, there has been significant development effort placed into building new tools to help streamline QA. These tools can drastically drop the cost of high-quality software development, and hence are very worthwhile to evaluate.

In this article I'll summarize 3 QA patterns we've seen inside organizations, how these can be made to work, and how they often fall short. I'll discuss how to evaluate tooling, and give a brief evaluation of the tool I've been building at reflow.io. I'll finally give a list of other emerging tooling in this space, and some ways these tools differentiate from each other.

Popular Strategies for Software QA

Strategy 1: Only developers do QA

The development team is the gatekeeper to code quality. They are responsible for all code changes, and will naturally build automated tests as part of development. Developers build something, hopefully feel responsible for what they build, and hence hopefully that responsibility can convert into product quality when they manage QA exclusively.

When it works

Organizations that can pull this off have a few things in common. They have:

  • A culture that values high quality work output
  • Hiring practices that embed the skills to build automated tests into development teams
  • Procuring tooling that allow the easy creation of automated tests

When it doesn't work

This is a great idea in theory, but it's rarely a practical reality.

  1. Developer prioritization is often tightly controlled, and QA is (often rightly) prioritized behind feature development. It's hard to get automated testing to keep up with the speed of development.
  2. Developers will often have a different view of what is high quality than the end user of the software. They often have a natural bias towards their concept of the happy path, and lack the perspective of a potential customer.
  3. Developer compensation and career growth is often tightly linked to feature development, rarely linked to QA output. Exclusively developer-managed QA leads to a natural tendency to want to "ship the feature" and move on to the next thing.

Strategy 2: A siloed QA team does QA

A team of QA specialists are brought in to test software, often working in an isolated silo away from the development team. They have full control of QA priorities, and are charged with creating automated tests, as well as manual testing as required.

When It Works

  • Should the QA team establish themselves as a trusted advisor to the development team, this can work well.
  • QA teams that can both build that relationship and deeply engage with the product from a customer-centric perspective can offer value not just in building product quality, but in aligning the product with the customer's needs.

When it doesn't work

  1. When the QA team lacks the experience to engage with the product, it can be hard for them to do valuable QA activities.
  2. When the QA team lack the tools to effectively automate testing in a cost-effective way, they can quickly become overwhelmed and grow into a liability instead of an asset.
  3. There is a natural tendency for QA to become a bottleneck to release, and a scapegoat if things go wrong.

Strategy 3: Dev team embeds QA personnel within it.

In this model, the development team embeds QA personnel within it. The QA team is responsible for QA, but they are part of the development team, and are integrated into the development team's process.

This is a good compromise between the developer-only QA model, and the siloed QA model. When QA personnel are part of the development team, they are naturally more invested in the quality of the product, and are naturally integrated into the development process, allowing them to provide input into the process at an earlier stage.

When it doesn't work

  1. When QA teams are embedded in development teams, they are often pulled into the development team's process, and it can be hard for them to maintain their QA focus.
  2. QA Personnel integrated with the development team naturally become influenced by pressures that influence the development, and inherit the biases of the development team, especially if they also work on features.
  3. When QA personnel aren't deeply experienced in automated testing, they can be a drag on development team productivity.
  4. Since QA personnel are often individually allocated to a project, upon their leaving knowledge can be easily lost. This strategy can introduce significant Churn vulnerability.
  5. As the Dev team scales, most teams specialize and silo. This pattern often naturally falls apart as priorities are shifted and teams are split.

Tooling

Whichever strategy is taken, tooling choices are key to successful software QA. Without tooling, QA is manual, and manual QA does not scale anywhere near as well as software.

All QA tooling has a fundamental aim: "Reducing the severity of bugs", and/or "Reducing the likelihood of bugs". Both commercial and non-commercial tool options should be evaluated against these goals in a specific context.

Fast deployment / monitoring: Reduce severity of bugs

If the product can be deployed fast, is heavily monitored, and has resilient data architecture, then QA is fundamentally less important. E.g. if a bug is fixed minutes from when its found, and the severity is such that there's no long-lasting impact from the bug, then resources spent on pre-release becomes less well-spent.

Hence, especially for early-stage products, investments in Monitoring and Continuous Deployment tooling are of higher value than investments in test automation.

This will usually change over time as a product becomes more complex and developers churn. When this happens, investments in Test Automation tooling to enable a team to release faster can be incredibly cost-effective.

Efficiency of Test Automation: Reduce likelihood of bugs

Test Automation is the only foolproof method we have to ensure that bugs are found before a release, that scales to arbitrary product complexity. As such, for any products where the severity of bugs is high, and that undergoes continuous development, this is the only way of achieving a high quality product in the long term.

There are a large (and growing) number of tools that try to improve product team's effectiveness of test automation, focusing on different pain points and user stories. There is a natural efficiency gain by using a commercial tool: by embedding machine-learned application configuration into a database, a commercial tool can provide features that are very difficult to provide by using an open-source tool alone.

By virtue of the vast number of different ways that product teams develop software, be wary of picking the wrong tool, and evaluate carefully based on your unique needs. All SaaS Test Automation tools cause some degree of lock-in. When your product team churns, or your QA needs out-grow the tool, then you might find that your testing effort stagnates.

About Reflow

Reflow is a tool that allows non-technical QA personnel to implement end-to-end tests using a no-code recording UI, supported by engineers importing snippets of their end-to-end test suite.

It is designed to be highly flexible, with developers able to add their own browser interactions into the no-code UI using playwright. It is also designed for record/replay processes to be available both in a web UI at app.reflow.io, and running on a local machine via a CLI tool, including in a Continuous Integration server.

  • "Reducing the severity of bugs": Reflow can be used as a low-code tool for visual monitoring of a software product, to quickly recognize when a site has regressed. Target your production application to use Reflow for Synthetic Monitoring.
  • "Reducing the likelihood of bugs": Reflow can be used to enhance the efficiency of automated testing processes, by enabling non-technical QA personnel to design, develop, and implement test automation strategies; allowing their work to be run automatically by a CI server or on a developer's machine locally.

When It Works

Reflow is best used by teams that have a mix of technical and non-technical staff testing an application. Its workflows focus on enabling them to work together better, by enabling the development team to enhance QA team workflows with snippets from their end-to-end test suite; and the QA team to enhance a developer's workflow by giving them business-specific sequences and tests that they can run as part of their Continuous Integration tooling.

Reflow works best when Engineering teams integrate the CLI tool directly into their codebase; executing QA-managed test sequences against locally running software.

When It Doesn't Work

Reflow will provide little value when a product is entirely tested by a development team. A development team can almost always manage a versioned end-to-end testing framework with greater ease than an invoked tool that they record in a web UI.

The value that reflow can provide to an entirely developer-owned qa-cycle are observability of end-to-end test runs, audit records and screenshot-capture/comparison. However, if these are not important to your product, it's worth looking for other tools.

Reflow is also currently only available via AWS Marketplace as a Usage Based Subscription Product, meaning paying users must have an AWS account.

Request for feedback

Reflow is a bootstrapped product, built by 1 engineer, with 1 engineer's insights into QA, and a small amount of external feedback from its growing user-base. We'd love to get more feedback to we ensure we build the best product we can.

Tooling Category: Record/Replay Test Automation Tools

These tools are aimed towards product teams that want to use code-free automated testing: this enables non-developers to contribute towards automated testing of a product. They often have features like auto-healing, visual-regression-testing, data-driven-testing and cross-browser-testing/responsiveness-testing.

Differentiating these tools: Which one is best for your needs?

Whilst these tool have many commonalities, they each have a set of capabilities that aligns them to different products and product teams.

  • Local Testing: Some tools work locally, some do not. Some partially support it through the use of a network tunnel.
    • Reflow supports Local testing via a npm package, allowing the entire record/replay process to be executed on your Windows, Mac or Linux system.
  • Mobile Testing: Some tools support mobile device emulation, some use real mobile devices, some do not support this at all.
    • Reflow supports mobile device emulation, but does not run tests on real mobile devices.
  • Cross Browser Testing: Most tools support execution of tests in more than one browser engine, but some only support one browser.
    • Reflow wraps playwright, and thus allows for test execution in Edge, Chrome, Firefox and Safari. However, tests can only be recorded in Chromium based browsers.
  • Selector Stabilisation: Some tools rely entirely on visual mechanisms to stabilize element selectors: some rely entirely on CSS selectors. Some do both.
    • Reflow stabilizes/auto-heals selectors with both CSS and visual data. However due to this it only supports web applications.
  • Customization: Most tools allow for the execution of JavaScript in the browser, but not all allow for deeper configuration of the test execution environment.
  • Data Export: Using a code-free tool usually means your data will be held inside the tool, and not in your codebase. Each tool differs in how to leave it.
  • Integrations: Some tools have vendor-specific integrations. Some are self-contained, and can be executed over a CLI in any server.
    • Reflow can be executed in a CLI runner after just an npm install reflowio command. This means it can be added directly into any existing CI/CD pipeline.
    • Reflow does not have vendor-specific integrations for tools like JIRA, Slack, or TestCafe
  • Cost: Some tools have a trial period, some have a free-tier. Many do not publish pricing details and may provide company-specific quotes after they understand what your business will afford.
  • API Testing: Some tools provide specific workflows for API Testing, and allow for it to be done using a code-free workflow
    • Reflow requires API testing be done via snippets, and lacks a code-free workflow for API testing.
  • Desktop Application Testing: Some tools support more than just Web UIs, also executing on desktop applications
    • Reflow only supports Web UIs
  • Isolation: Some tools are experimenting with Test Isolation: I.e. isolating the test execution from different parts of the application. For example, some may have first-class network mocking/replay capabilities, that enable fully deterministic UI replay.
    • Reflow is experimenting with Unit Test Generation for React components using Runtime Execution Analysis via Source maps. However this isn't yet live. We're not sure on the utility for test isolation yet.
    • Reflow snippets can enable network mocking/replay functionalities via playwright APIs.
  • Visual Regression Testing: Almost all tools have different approaches to visual regression testing. These approaches will work better for some products, but not all.
    • Reflow Visual Regression Testing is implemented with SSIM-weighted pixeldiff. This will minimize picking up minor rendering differences between browsers, and highlight significant rendering differences when they are close together.

· 10 min read
Thomas Rooney

End-to-end testing is a Quality Assurance technique that involves exercising and validating an application's workflow from start to finish.

This technique aims to reproduce real user scenarios and validate expectations over application output.

Unfortunately, end-to-end testing has problems that make tests difficult to maintain:

  1. The testing process must generally assume that the system is in a consistent starting state. This might require seeding or wiping test data before/after the test is run.
  2. When application changes are made, there may need to be changes to the test sequence to handle the application update.
  3. Even when nothing changes, sometimes tests will fail anyway. Such tests are denoted as flaky.

In building reflow.io, we have accrued a toolkit of strategies to fix flaky tests, and automatically heal the test sequence when there are application changes. Reflow is a SaaS tool that allows your QA team to build/maintain your end-to-end test suite using a no-code UI. It has powerful primitives that cover the majority of automation scenarios, but can also be enhanced by importing code snippets from your end-to-end tests to power workflows specific to your application. This article is a summary of the techniques we use, and how to apply them into your test suite, regardless of whether you use reflow or not.

Why does this matter?

Time is a precious resource for all product teams. It must be protected and spent wisely. Flaky tests cost time.

  • Every time a test flakes, generally someone will investigate whether the failure is a true-failure, or a false-positive. The time cost of this can become very significant.
  • A test may have downstream dependencies that cannot run until the test is re-run, or fixed. This can often escalate the time cost.
  • The failed test may place your system into a non-deterministic state, which may then require manual effort to fix.

Emotional Damage

  • The very existence of an automated tests implies that someone cared enough to automate away some manual effort.
  • When that manual effort returns through a test flake, there is an emotional toll.
  • If the flake cannot be removed, the team will be very aware that they are stuck maintaining them in perpetuity.

QA Death Cycle

  • If the tests are flaky, the tests will not be trusted.
  • If the tests are not trusted, they will become gradually deprecated and eventually removed.
  • If the product team does not replace these tests, the codebase will lose test coverage, and may become less trusted.
  • If the codebase is less trusted, productivity will suffer.

Mitigations

Strategy 1: Generic pre-action stability

The most common flake reason that we see is caused by interacting with the page too quickly. Page elements often get rendered before they can be successfully interacted with. This means just checking for their existence is often not enough.

There are a few generic actions that can be done to increase the likelihood that the page is fully loaded before it is interacted with.

In playwright, these stability method are available at the waitForLoadState api.

await page.waitForLoadState('domcontentloaded', { timeout: 15000 });
await page.waitForLoadState('load', { timeout: 30000 });
await page.waitForLoadState('networkidle', { timeout: 5000 });

These 3 load events are:

  1. domcontentloaded - wait for the DOMContentLoaded event to be fired. This fires when the initial page document is loaded and parsed. For SPAs, stylesheets, images and javacript will generally not be loaded when this event fires. Hence it is usually preferable to rely on the load event instead.
  2. load - wait for the load event to be fired. This event fires when all markup, stylesheet, javascript and all static assets like images and audio have been loaded. For many SPAs, which dynamically load data after the page renders its initial document, the load event may not be enough.
  3. networkidle - wait until there are no network connections for at least 500 ms. This is a useful event to hook into for SPAs which load data after they have their initial render. However, it may either be too early or too late for page interaction.

In reflow, there is an optional 4th event: screenshotstable. This event is fired when the page:

  1. Has not changed within 5s.
  2. Looks like the page in its most recent successful test.

We introduced this event as most applications, when loading, either show a loading animation or continuously re-render whilst they load data. It introduces a minor delay to test actions, but we believe the decrease in flakiness makes it worthwhile.

Similarly, for a deterministic end-to-end test sequence, the page almost always looks the same between test runs. Hence we can save a screenshot in S3 to track how the page looks like before an action execution, and compare it during the test execution to the previous successful run of the same device/browser/operating system.

As always, a timeout needs to be set to avoid this waiting forever when the application has changed, or is not expected to stop rendering. In reflow, these timeouts are automatically configured based on how the page behaved when it was originally recorded or, if it exists, the most recent successful test execution.

Strategy 2: Intelligent Waiting

If an action expects to be on a given page, we can wait until the navigation to that page is complete with page.waitForNavigation. This can be important, as multiple load events may fire during a navigation sequence: hence a page.waitForLoadState will not always be enough.

If an action affects a specific element, we can do further action-specific steps. For example;

  1. Wait for the element to be Attached to the DOM.
  2. Wait for the element to be Visible.
  3. Wait for the element to be Stable, as in not animating or completed animation
  4. Wait for the element to be able to receive events
  5. If the element is a clickable element, wait for the element to be enabled.
  6. If the element is a text-entry element, wait for the element to be editable.

In playwright, these pre-action stability statements are automatically applied based on the interaction type.

In reflow, stability is further enhanced with:

  1. Waiting for the element to look the same as it did in the last successful run. I.e. continuously take screenshots of the element, and wait until they look the same (or a timeout).
  2. Apply a specific timeout to the element based on how long it took for the last DOM attribute update to be applied during the last successful run. E.g. if the button turned green via a class attribute change after 7s, wait at least 7s for the element to get DOM attribute changes, before further timeouts.

Strategy 3: Pick Good Selectors

When an application changes, the locators used to identify elements often also change. These locators are generally known as selectors.

A smart development team will combat this with a strongly consistent element attribute, such as data-test-id. E.g.

<button
data-test-id={`test-actions-${testId}`}
/>

To click such a button we can use the data-test-id attribute to locate the element.

await page.click(`[data-test-id="test-actions-${testId}"]`);

Other good options, for scenarios where adding a specific element selector is undesirable, are:

SelectorDescription
placeholder="..."Placeholders are often unique to the element
[aria-label="..."]An attribute used by assistive technologies to help identify an element (e.g. for a screen reader)
img[alt="..."]An attribute used to denote alternate text for an image, if the image cannot be displayed
role="..."A role attribute is used to add semantic meaning to an element for assistive technologies (e.g. screen readers)
input[type="..."An input often has a unique/unchanging type that references it for short forms
nodeNameIf an element node type is only used once, the nodeName (e.g. [a ; input ; button]) is often a good choice
#...Elements are often assigned unique id attributes to aid in locating them by javascript.

Reflow will pick up such selectors automatically during test recording, and score them by uniqueness and type. It will also:

  1. Combine parents selectors with their children to reduce ambiguity. E.g. [data-test-id="foo"] >> [data-test-id="bar"].
  2. Calculate all the possible sets of selectors, rank them, and inspect the page for the most likely element based on how close they are to the most recent successful run's element.
  3. Compare all page elements made from partial selector matches to a screenshot of the most recent successful run's element, to enable auto-healing the locator when it cannot be found or is ambiguous. We call this a Visual Selector

Strategy 4: "Wait Until" checkpoints

This strategy relies on writing application-specific logic to be built to ensure the application is in a specific state.

For instance, if a page element represents a calculation, and we want to wait for that calculation to be completed before moving on, a developer could write an assertion like:

await page.waitForSelector('[aria-label="calculation"] >> text=29.76')

In reflow, such an assertion can be made with as long a timeout as desired. For instance, in most of reflow's internal test suite (powered by itself) it starts by:

  1. Making a new reflow test
  2. Navigating to the recording UI of that test
  3. Waiting until the recording UI looks like a pre-recorded screenshot, up to 5 minutes.

The Visual Assertion on step [3] removes the flakiness of both server startup times, and DNS propagation. It doesn't require any code, and works very reliably. When the starting URL has any visual changes, reflow will fail the test and offer an auto-healing option to quickly replace it with the new visual snapshot.

Strategy 5: Zero-dependency Data Seeding

This strategy helps combat the problem "The testing process must generally assume that the system is in a consistent starting state", by breaking out an initial test suite that resets application data.

In reflow, this is done by adding a user to a team, accepting the invite, then removing them. After this sequence all the user's end-to-end tests are associated with the team that they just left, and they are in an empty team.

In other applications, this might be done via invoking a REST endpoint directly in the test suite to reset application data. E.g. invoking a REST handler that resets the database:

 await page.request.post(`${event.variables.url}/reset/scenario/empty?token=${event.variables.secret}`);

Conclusion: Is there a way to eliminate all flaky tests forever?

No.

An engineering team can be smart, and drastically reduce the amount of time spent maintaining end-to-end tests by applying these strategies. However that time will never go to zero whilst an application is being actively developed.

Test Automation helps ensure that QA effort is placed on the boundaries of new feature development, rather than endlessly covering existing features on every application change.

Commercial low-code automation tools like reflow.io can allow product teams to work more effectively by reducing the cost of automation. They are not a magic pill, but can be a very valuable tool for teams where QA and development personnel want to work closer together on test automation, regardless of coding ability.

Please sign up if you'd like to try it, have a look at our README to understand its capabilities, or reach out if you'd like a demo.

· 13 min read
Thomas Rooney

With a small amount of code, you can expose your running Fargate Tasks to the internet directly on an individual subdomain. For example, ${taskid}.eu-west-2.browser.reflow.io.

This is almost always a bad idea. However, if you really need to do it, this guide should help.

Why you shouldn't do this

AWS provides multiple battle-tested patterns to operate containerized workloads. These patterns are much easier to configure, less likely to break, and will work for the vast majority of customer use-cases. For example:

  • Application Load Balanced Fargate Service: A Fargate service running on an ECS cluster fronted by an application load balancer. Benefits:
    1. Supports health-checks implicitly, diverting traffic away from unhealthy instances before re-creating them.
    2. On a deployment via CodePipeline, managed services monitor the stability of newly provisioned services before gradually moving traffic onto them.
    3. Traffic is automatically distributed across different Availability Zones to provide data-center resilience.
  • Queue Processing Fargate Service: A Fargate service auto-scaled to handle jobs in an SQS Queue. Benefits:
    1. Can automatically retry jobs upon a failure.
    2. Will scale up/down based on asynchronous workloads
    3. Can handle long-lived jobs.

Why we do this

At reflow, we run web browsers to record and execute end-to-end tests on. To record tests, we need to be able to create a websocket with a server to hold transient state.

ECS with Fargate is a great choice for this, as it removes the risk and operational overhead of running servers, but without the complexity introduced by Kubernetes.

Our first design used the Application Load Balanced Fargate Service pattern, but we ran into the following problems:

  1. We wanted to run untrusted customer code on our servers, which requires both isolating customer workloads from each other, and zero privilege servers. Unfortunately it is non-trivial to do any customer-level physical isolation with ALB-fronted ECS Services.
  2. We wanted a multi-region architecture, but the cost of keeping one instance and a NAT Gateway warm in every region is significant to a bootstrapped startup. There didn't appear any way to allow clusters to scale to zero when not in use.
  3. We have a feature whereby multiple customers in the same team could share a single browser instance to collaborate on recording tests. However, this didn't work in our cloud instances, because we couldn't guarantee that multiple users in the same team would reach the same server.

Using the following pattern, we have:

  1. Clusters which scale to zero when not in use. This means we don't need to pay a warm-instance fee for going multi-region.
  2. All customer workloads physically isolated from each other, and the ability to share transient server state between multiple users in the same team.
  3. The ability to bake expected customer ids into the server process's environment variables, simplifying authentication
  4. No need to run a NAT Gateway in each availability zone.

With the main negatives:

  1. DNS propagation delays mean that the first time a customer uses a recording instance, they have to wait approximately 1 minute before the server is DNS-available.
  2. More moving parts to monitor.

Logical Components

  1. LetsEncrypt SSL wildcard certificates, on a relevant domain. E.g. *.eu-west-2.browser.reflow.io
  2. An AWS Lambda task to renew the above certificate automatically, and alert us if there are any problems. This is scheduled to run monthly.
  3. An AWS Lambda task to automatically create and destroy DNS records for every task registered in our cluster.
  4. An ECS Cluster and Task Configuration to run our service on demand, and automatically register a public IP address.
  5. Extra application logic to register the task hostnames and get them to the client. We use AppSync/DynamoDB to get these to web-clients; when a server boots it saves to a dynamodb record that a client can read. Finally we bake a TeamId into the server's environment variables, a cognito custom attribute that must be signed within a client JWT for all web requests.

Components [1], [2] and [3] are completely generic, so I'll describe them here. Once configured, any tasks created in the target ECS Cluster will be exposed via DNS. [4], [5] are domain-specific, so are unlikely to be useful or relevant for anyone else, but feel free to reach out if you have questions.

LetsEncrypt SSL Certificates

CDK

We use CDK to manage all our infrastructure. The following is a Construct we use to manage our LetsEncrypt Lambda function.

It creates:

  • A S3 bucket to hold our certificates in
  • A SNS topic to notify us when our certificates renew
  • A Lambda function to actually renew everything. We have a wrapper ReamplifyLambdaFunction that allows us to pre-compile our code outside of CDK, but this can just as well be a NodeJSFunction.

It references:

  • The hosted zone where we will place our task instance's DNS records, and the associated domain to suffix our tasks with.
  • A workspace parameter, e.g. dev / prod. This allows us to provision multiple instances of this construct within one AWS account.
  • An email address to send renewal notifications to.
  • The region/account to store everything in
cdk/certbot.ts
import { Construct } from 'constructs';
import { Duration, RemovalPolicy, StackProps, Tags } from 'aws-cdk-lib';
import { BlockPublicAccess, Bucket, BucketEncryption, ObjectOwnership } from 'aws-cdk-lib/aws-s3';
import { Topic } from 'aws-cdk-lib/aws-sns';
import { EmailSubscription } from 'aws-cdk-lib/aws-sns-subscriptions';
import { ReamplifyLambdaFunction } from './reamplifyLambdaFunction';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';
import { IHostedZone } from 'aws-cdk-lib/aws-route53';
import { Rule, Schedule } from 'aws-cdk-lib/aws-events';
import { LambdaFunction } from 'aws-cdk-lib/aws-events-targets';

interface CertbotProps {
adminNotificationEmail: string;
hostedZone: IHostedZone;
domain: string;
workspace: string;
env: {
region: string;
account: string;
};
}

export class Certbot extends Construct {
public readonly certBucket: Bucket;
constructor(scope: Construct, id: string, props: StackProps & CertbotProps) {
super(scope, id);
Tags.of(this).add('construct', 'Certbot');
const certBucket = new Bucket(this, 'bucket', {
bucketName: `certs.${props.env.region}.${props.workspace}.reflow.io`,
objectOwnership: ObjectOwnership.BUCKET_OWNER_PREFERRED,
removalPolicy: RemovalPolicy.DESTROY,
autoDeleteObjects: true,
versioned: true,
lifecycleRules: [
{
enabled: true,
abortIncompleteMultipartUploadAfter: Duration.days(1),
},
],
encryption: BucketEncryption.S3_MANAGED,
enforceSSL: true,
blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
});
this.certBucket = certBucket;

const topic = new Topic(this, 'CertAdminTopic');
topic.addSubscription(new EmailSubscription(props.adminNotificationEmail));

const fn = new ReamplifyLambdaFunction(this, 'LambdaFn', {
workspace: props.workspace,
lambdaConfig: 'deploy/browserCerts.ts',
timeout: Duration.minutes(15),
environment: {
NOTIFY_EMAIL: props.adminNotificationEmail,
CERTIFICATES: JSON.stringify([
{
domains: [`*.${props.domain}`],
zoneId: props.hostedZone.hostedZoneId,
certStorageBucketName: certBucket.bucketName,
certStoragePrefix: 'browser/',
successSnsTopicArn: topic.topicArn,
failureSnsTopicArn: topic.topicArn,
},
]),
},
});

fn.addToRolePolicy(
new PolicyStatement({
actions: ['route53:ListHostedZones'],
resources: ['*'],
})
);
fn.addToRolePolicy(
new PolicyStatement({
actions: ['route53:GetChange', 'route53:ChangeResourceRecordSets'],
resources: ['arn:aws:route53:::change/*'].concat(props.hostedZone.hostedZoneArn),
})
);
fn.addToRolePolicy(
new PolicyStatement({
actions: ['ssm:GetParameter', 'ssm:PutParameter'],
resources: ['*'],
})
);
certBucket.grantWrite(fn);
topic.grantPublish(fn);

new Rule(this, 'trigger', {
schedule: Schedule.cron({ minute: '32', hour: '17', day: '3', month: '*', year: '*' }),
targets: [new LambdaFunction(fn)],
});
}
}

AWS Lambda Function

Dependencies:

  • acme-client: 4.2.3

This leans very heavily on acme-client to do all the heavy lifting, with a scattering of logic to:

  • Maintain SSM parameters to ensure that only one account is managed within LetsEncrypt, rather than creating a new account each time, but ensure that this can be run without any pre-dependencies when spinning up a new environment.
  • Answer the LetsEncrypt challenges with DNS records to prove we own the given domain.
  • Store the resultant certificates in S3.
  • Notify an admin that the certificate has been issued (or not, if there was a failure).
lambda/deploy/browserCerts.ts
import AWS from 'aws-sdk';
import acme from 'acme-client';

const route53 = new AWS.Route53();
const s3 = new AWS.S3();
const sns = new AWS.SNS();

export function assertEnv(key: string): string {
if (process.env[key] !== undefined) {
console.log('env', key, 'resolved by process.env as', process.env[key]!);
return process.env[key]!;
}
throw new Error(`expected environment variable ${key}`);
}

export const assertEnvOrSSM = async (key: string, shouldThrow = true): Promise<string> => {
const workspace = assertEnv('workspace');

if (process.env[key] !== undefined) {
console.log('env', key, 'resolved by process.env as', process.env[key]!);
return Promise.resolve(process.env[key]!);
} else {
const SSMLocation = `/${workspace}/${key}`;
console.log('env', key, 'resolving via SSM at', SSMLocation);

const SSM = new AWS.SSM();
try {
const ssmResponse = await SSM.getParameter({
Name: SSMLocation,
}).promise();
if (!ssmResponse.Parameter || !ssmResponse.Parameter.Value) {
throw new Error(`env ${key} missing`);
}
console.log('env', key, 'resolved by SSM as', ssmResponse.Parameter.Value);
process.env[key] = ssmResponse.Parameter.Value;
return ssmResponse.Parameter.Value;
} catch (e) {
console.error(`SSM.getParameter({Name: ${SSMLocation}}):`, e);
if (shouldThrow) {
throw e;
}
return '';
}
}
};

export const writeSSM = async (key: string, value: string): Promise<void> => {
const workspace = assertEnv('workspace');

const SSMLocation = `/${workspace}/${key}`;
console.log('env', key, 'writing to SSM at', SSMLocation, 'value', value);

const SSM = new AWS.SSM();
await SSM.putParameter({
Name: SSMLocation,
Value: value,
Overwrite: true,
DataType: 'text',
Tier: 'Standard',
Type: 'String',
}).promise();
};

async function getOrCreateAccountPrivateKey() {
let accountKey = await assertEnvOrSSM('LETSENCRYPT_ACCOUNT_KEY', false);
if (accountKey) {
return accountKey;
}
console.log('Generating Account Key');
accountKey = (await acme.forge.createPrivateKey()).toString();
await writeSSM('LETSENCRYPT_ACCOUNT_KEY', accountKey);
return accountKey;
}

export const handler = async function (event) {
const maintainerEmail = assertEnv('NOTIFY_EMAIL');
const accountURL = await assertEnvOrSSM('LETSENCRYPT_ACCOUNT_URL', false);
const certificates = JSON.parse(assertEnv('CERTIFICATES'));
const accountPrivateKey = await getOrCreateAccountPrivateKey();

acme.setLogger(console.log);
const client = new acme.Client({
directoryUrl: acme.directory.letsencrypt.production,
accountKey: accountPrivateKey,
accountUrl: accountURL ? accountURL : undefined,
});

const certificateRuns = certificates.map(async (certificate) => {
const { domains, zoneId, certStorageBucketName, certStoragePrefix, successSnsTopicArn, failureSnsTopicArn } =
certificate;

try {
const [certificateKey, certificateCsr] = await acme.forge.createCsr({
commonName: domains[0],
altNames: domains.slice(1),
});

const certificate = await client.auto({
csr: certificateCsr,
email: maintainerEmail,
termsOfServiceAgreed: true,
challengeCreateFn: async (authz, challenge, keyAuthorization) => {
console.log(authz, challenge, keyAuthorization);
const dnsRecord = `_acme-challenge.${authz.identifier.value}`;

if (challenge.type !== 'dns-01') {
throw new Error('Only DNS-01 challenges are supported');
}
const changeReq = {
ChangeBatch: {
Changes: [
{
Action: 'UPSERT',
ResourceRecordSet: {
Name: dnsRecord,
ResourceRecords: [
{
Value: '"' + keyAuthorization + '"',
},
],
TTL: 60,
Type: 'TXT',
},
},
],
},
HostedZoneId: zoneId,
};
console.log('Sending create request', JSON.stringify(changeReq));
const response = await route53.changeResourceRecordSets(changeReq).promise();
const changeId = response.ChangeInfo.Id;
console.log(`Create request sent for ${dnsRecord} (Change id ${changeId}); waiting for it to complete`);
const waitRequest = route53.waitFor('resourceRecordSetsChanged', { Id: changeId });
const waitResponse = await waitRequest.promise();
console.log(
`Create request complete for ${dnsRecord}: (Change id ${waitResponse.ChangeInfo.Id}) ${waitResponse.ChangeInfo.Status}`
);
},
challengeRemoveFn: async (authz, challenge, keyAuthorization) => {
const dnsRecord = `_acme-challenge.${authz.identifier.value}`;

const deleteReq = {
ChangeBatch: {
Changes: [
{
Action: 'DELETE',
ResourceRecordSet: {
Name: dnsRecord,
ResourceRecords: [
{
Value: '"' + keyAuthorization + '"',
},
],
TTL: 60,
Type: 'TXT',
},
},
],
},
HostedZoneId: zoneId,
};
console.log('Sending delete request', JSON.stringify(deleteReq));
const response = await route53.changeResourceRecordSets(deleteReq).promise();
const changeId = response.ChangeInfo.Id;
console.log(`Delete request sent for ${dnsRecord} (Change id ${changeId}); waiting for it to complete`);
const waitRequest = route53.waitFor('resourceRecordSetsChanged', { Id: changeId });
const waitResponse = await waitRequest.promise();
console.log(
`Delete request complete for ${dnsRecord}: (Change id ${waitResponse.ChangeInfo.Id}) ${waitResponse.ChangeInfo.Status}`
);
},
challengePriority: ['dns-01'],
});

// Write private key & certificate to S3
const certKeyWritingPromise = s3
.putObject({
Body: certificateKey.toString(),
Bucket: certStorageBucketName,
Key: certStoragePrefix + 'key.pem',
ServerSideEncryption: 'AES256',
})
.promise();
const certChainWritingPromise = s3
.putObject({
Body: certificate,
Bucket: certStorageBucketName,
Key: certStoragePrefix + 'cert.pem',
})
.promise();

await Promise.all([certKeyWritingPromise, certChainWritingPromise]);
console.log('Completed with certificate for ', domains);

// after client.auto, an account should be available
if (!accountURL) {
await writeSSM('LETSENCRYPT_ACCOUNT_URL', client.getAccountUrl());
}

if (successSnsTopicArn) {
await sns
.publish({
TopicArn: successSnsTopicArn,
Message: `Certificate for ${JSON.stringify(domains)} issued`,
Subject: 'Certificate Issue Success',
})
.promise();
}
} catch (err) {
console.log('Error ', err);
if (failureSnsTopicArn) {
await sns
.publish({
TopicArn: failureSnsTopicArn,
Message: `Certificate for ${JSON.stringify(domains)} issue failure\n${err}`,
Subject: 'Certificate Issue Failure',
})
.promise();
}
throw err;
}
});

await Promise.all(certificateRuns);
};

Automatic DNS Records

CDK

This references:

  • A clusterArn to collect ECS EventStream events for any task state changes in the cluster
  • The serviceDiscoveryTLD (in our case browser.${props.env.region}.reflow.io) to suffix DNS records
  • The route 53 hosted zone to create records in
cdk/browserCluster.ts
import { Rule } from 'aws-cdk-lib/aws-events';
import { LambdaFunction } from 'aws-cdk-lib/aws-events-targets';
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';

// ...

const eventRule = new Rule(this, 'ECSChangeRule', {
eventPattern: {
source: ['aws.ecs'],
detailType: ['ECS Task State Change'],
detail: {
clusterArn: [cluster.clusterArn],
},
},
});

const ecsChangeFn = new ReamplifyLambdaFunction(this, 'ECSStreamLambda', {
...props,
lambdaConfig: 'stream/ecsChangeStream.ts',
unreservedConcurrency: true,
memorySize: 128,
environment: {
DOMAIN_PREFIX: props.serviceDiscoveryTLD,
HOSTED_ZONE_ID: props.hostedZone.hostedZoneId,
},
});

eventRule.addTarget(new LambdaFunction(ecsChangeFn));

ecsChangeFn.addToRolePolicy(
new PolicyStatement({
actions: ['route53:GetChange', 'route53:ChangeResourceRecordSets', 'route53:ListResourceRecordSets'],
resources: ['arn:aws:route53:::change/*'].concat(props.hostedZone.hostedZoneArn),
})
);
ecsChangeFn.addToRolePolicy(
new PolicyStatement({
actions: ['ec2:DescribeNetworkInterfaces'],
resources: ['*'],
})
);

AWS Lambda

This function:

  • Does some sanity checks on if the event should affect DNS records
  • If the task both currently RUNNING and desired RUNNING:
    • Looks up the public IP of the task.
    • Upserts an A record pointing at the tasks public IP, on ${taskId}.${DOMAIN_PREFIX}
  • Else:
    • Deletes the A record associated with the task.
lambda/stream/ecsChangeStream.ts
import type { EventBridgeHandler } from 'aws-lambda';
import AWS from 'aws-sdk';
import { Task } from 'aws-sdk/clients/ecs';

export function assertEnv(key: string): string {
if (process.env[key] !== undefined) {
console.log('env', key, 'resolved by process.env as', process.env[key]!);
return process.env[key]!;
}
throw new Error(`expected environment variable ${key}`);
}

const ec2 = new AWS.EC2();
const route53 = new AWS.Route53();
const DOMAIN_PREFIX = assertEnv('DOMAIN_PREFIX');
const HOSTED_ZONE_ID = assertEnv('HOSTED_ZONE_ID');

export const handler: EventBridgeHandler<string, Task, unknown> = async (event) => {
console.log('event', JSON.stringify(event));
const task = event.detail;
const clusterArn = task.clusterArn;
const lastStatus = task.lastStatus;
const desiredStatus = task.desiredStatus;

if (!clusterArn) {
return;
}

if (!lastStatus) {
return;
}

if (!desiredStatus) {
return;
}

const taskArn = task.taskArn;
if (!taskArn) {
return;
}
const taskId = taskArn.split('/').pop();
if (!taskId) {
return;
}

const clusterName = clusterArn.split(':cluster/')[1];
if (!clusterName) {
return;
}
const containerDomain = `${taskId}.${DOMAIN_PREFIX}`;

if (lastStatus === 'RUNNING' && desiredStatus === 'RUNNING') {
const eniId = getEniId(task);
if (!eniId) {
return;
}

const taskPublicIp = await fetchEniPublicIp(eniId);
if (!taskPublicIp) {
return;
}

const recordSet = createRecordSet(containerDomain, taskPublicIp);

await updateDnsRecord(clusterName, HOSTED_ZONE_ID, recordSet);

console.log(`DNS record update finished for ${taskId} (${taskPublicIp})`);
} else {
const recordSet = await route53
.listResourceRecordSets({
HostedZoneId: HOSTED_ZONE_ID,
StartRecordName: containerDomain,
StartRecordType: 'A',
})
.promise();
console.log('listRecordSets', JSON.stringify(recordSet));
const found = recordSet.ResourceRecordSets.find((record) => record.Name === containerDomain + '.');
if (found && found.ResourceRecords?.[0].Value) {
await route53
.changeResourceRecordSets({
HostedZoneId: HOSTED_ZONE_ID,
ChangeBatch: {
Changes: [
{
Action: 'DELETE',
ResourceRecordSet: {
Name: containerDomain,
Type: 'A',
ResourceRecords: [
{
Value: found.ResourceRecords[0].Value,
},
],
TTL: found.TTL,
},
},
],
},
})
.promise();
}
}
};

function getEniId(task): string | undefined {
const eniAttachment = task.attachments.find(function (attachment) {
return attachment.type === 'eni';
});
if (!eniAttachment) {
return undefined;
}
const networkInterfaceIdDetail = eniAttachment.details.find((detail) => detail.name === 'networkInterfaceId');
if (!networkInterfaceIdDetail) {
return undefined;
}
return networkInterfaceIdDetail.value;
}

async function fetchEniPublicIp(eniId): Promise<string | undefined> {
const data = await ec2
.describeNetworkInterfaces({
NetworkInterfaceIds: [eniId],
})
.promise();
console.log(data);

return data.NetworkInterfaces?.[0].PrivateIpAddresses?.[0].Association?.PublicIp;
}

function createRecordSet(domain, publicIp) {
return {
Action: 'UPSERT',
ResourceRecordSet: {
Name: domain,
Type: 'A',
TTL: 60,
ResourceRecords: [
{
Value: publicIp,
},
],
},
};
}

async function updateDnsRecord(clusterName, hostedZoneId, changeRecordSet) {
let param = {
ChangeBatch: {
Comment: `Auto generated Record for ECS Fargate cluster ${clusterName}`,
Changes: [changeRecordSet],
},
HostedZoneId: hostedZoneId,
};
await route53.changeResourceRecordSets(param).promise();
}

Running this in Production

This has been in production for two months now, and whilst it's not perfect, it's working well for us.

Things we unnecessarily worried about:

  • We were worried that DNS Records would accumulate as some error conditions would result in them not being removed. Many thousands DNS records created later, we haven't seen this as an issue.
  • We were worried about Route53 throttling our DNS change requests. Whilst we've seen this happen a few times, our lambdas do automatically retry and it eventually gets through.

Negatives:

  • We have seen some flakiness in our E2E tests where sometimes a browser will not use the new DNS records until a refresh, even when waiting beyond the TTL. We had to automate around this.
  • Server orchestration logic is a lot more complex when you are managing individual ECS Tasks.

· 10 min read
Thomas Rooney

Demo

Docusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed image

How to keep product images in documentation up to date

Reflow is a tool for end-to-end testing of web products. It auto-heals browser interactions when your product changes, which means it can also be used creatively in long-living process automation tasks.

In this tutorial, we will show you how to use reflow to automatically capture screenshots of your web products, such that you can regenerate them automatically as your product changes.

We'll also make a Themed Image: an adaptive image that renders differently depending on the user's browser and dark mode preferences. The end result will be a single command to produce 6 sets of product images, one for each of the following scenarios, and a react component that will dynamically import the right one based on the user's viewport and browser.

ViewportDark modeLight mode
DesktopScenario 1Scenario 4
MobileScenario 2Scenario 5
TabletScenario 3Scenario 6

Why Do This?

Documentation with relevant and contextual product images is far better than documentation without images.

Unfortunately, there are a few problems with adding images into documentation:

  1. Your product is going to change. When it changes, you either need to update all your documentation images or allow for drift. When there's enough drift, your documentation may appear completely incorrect.
  2. Your users do not all have the same visual experience on your product. The simplest example of this is users that use mobile or tablet browsers, compared to those who use desktop browsers. If you theme both your product and your documentation, this difference can be more striking as the images in your documentation will often be in a different theme to the user preference.

The added benefit of generating images using this process is it also works as a small test suite that you can gradually improve upon to eventually cover your product's full functionality.

How long will this take?

Around 30 minutes.

Step 1 - Install and Run Reflow

Reflow is installed through npm and can be run from the command line. For this tutorial, we'll use Local Reflow, and start by running the following on the command line:

npm install -g reflowio
reflowio dashboard

Step 2 - Create an account / login

When we access reflow via the web interface dashboard, we need to authenticate to synchronize our local configuration with any cloud configuration.

Docusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed image

Step 3 - Create a new test

Click the "Create Test" button in the top right corner of the dashboard.

Docusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed image

Step 4 - Test Configuration

We're going to be using Desktop Chrome for our initial dashboard sequence, though we'll be able to regenerate these screenshots with a different device after recording our initial run.

For our starting URL, for this example we're using the reflowio dashboard running on localhost:3100. However, you should use any URL you like.

We'll use the "Test Name" attribute later, so make it something unique and relevant to the documentation images. In our case, we're going to use "Reflow Documentation Flow".

Docusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed imageDocusaurus themed image

Step 5 - Configure Dark Mode

Our application is configured to be in either dark mode, or light mode, based on the user preferences. We want our documentation to show images with the same theme as the user preference, so we'll add a step to conditionally enable dark mode dependent on a variable passed in.

Click "Add Action to start", and select "Execute Browser JavaScript", then click Edit to bring up the step editor. The Test Editor uses Monaco.

Once here, we see two functions exported: a handler function, and a dependsOn array.

We'll set dependsOn to ['darkMode'] and trigger a test execution. We see our log of the event parameter in the devtools pane, and can now fill in the darkMode variable.

We'll set it to false for the recording run, but by defining it we can trigger runs with different darkMode parameters. We'll also fill in the handler logic to conditionally set the darkMode flag. In our application, we can do this via setting it in localStorage and triggering a JavaScript storage event.

Use "Test Execute" to validate it works, and click "Add" to confirm it and add it to our test actions.

Step 5 - Take our first screenshot

For our first screenshot, we're going to use the entire page body, as we want to show what it's like for a user on first login.

We start by opening the custom action dialog in the Test pane, and selecting "Assert" type, then subtype "Visually Matches". Then, in the right-hand developer tools pane, we'll select the Elements subpanel and navigate to the body element. We should see the image appearing in the left hand pane once we've selected this.

We change the slider to "Raise Warning" as we're not expecting this to always remain the same on each run -- we want the flow to continue executing even when the screenshot differs.

Click "Add" to complete the action.

Step 6 - Add an action description to set the downloaded image name

Optionally, we can give actions a description. In most scenarios, this is unnecessary, but in this case it will help to simplify our image output filename.

For instance, by defining our image as "login", the images output we'll look for later will be login.element.actual.png. If we didn't name it, the image output would be 1.element.actual.png, named after the fact that this is the first step in our test.

To give the action a description, open it, then select the "Pen" icon. Click "Save" to confirm.

Step 7 - Save the test

We're going to save the test now, and start bringing the screenshots into our documentation. However, if you'd like, you can continue to add more product images into the test. Just continue interacting with your site until all the images you want to capture are captured. "Save" the test with the bottom-right button in the test pane, and confirm. You will be taken back to the test dashboard.

Click "Play" to capture the initial execution of the test.

Step 8 - Syncronize the screenshots into your codebase.

In this step, we're going to use the CLI runner to extract the most recent run from reflow into our codebase.

Navigate to a folder where you want the images stored, and run the following command. This will pull our images from the most recent run of this test into the directory "./images".

If you lack an API key, you will be prompted to login to reflow after executing this command.

reflowio pull --image --test-name "Reflow Documentation Flow" --action-description "login" --output-directory "./images"

Step 9 - Full Automation

The last step is full automation. We're going to use the CLI runner to run the test, and then extract the images from the most recent run for all our scenarios.

There's nothing clever here, we'll just run a series of commands to fully automate this.

Note: this final step struggles when you lack an API key to authenticate with. When you use the Reflow CLI in the free version, you will need to manually login via the Web UI to authenticate. After you set up an API key, this can be fully automated.

reflowio test "Reflow Documentation Flow"  --device-emulation-profile "iPad Pro 11 (landscape)" --params "darkMode=false"
reflowio test "Reflow Documentation Flow" --device-emulation-profile "iPad Pro 11 (landscape)" --params "darkMode=true"
reflowio test "Reflow Documentation Flow" --device-emulation-profile "Chrome Desktop" --params "darkMode=false"
reflowio test "Reflow Documentation Flow" --device-emulation-profile "Chrome Desktop" --params "darkMode=true"
reflowio test "Reflow Documentation Flow" --device-emulation-profile "iPhone 13" --params "darkMode=false"
reflowio test "Reflow Documentation Flow" --device-emulation-profile "iPhone 13" --params "darkMode=true"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "iPad Pro 11 (landscape)" --params "darkMode=false" --out-dir "./tablet"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "iPad Pro 11 (landscape)" --params "darkMode=true" --out-dir "./tablet-dark"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "Chrome Desktop" --params "darkMode=false" --out-dir "./desktop"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "Chrome Desktop" --params "darkMode=true" --out-dir "./desktop-dark"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "iPhone 13" --params "darkMode=false" --out-dir "./mobile"
reflowio pull images "Reflow Documentation Flow" --device-emulation-profile "iPhone 13" --params "darkMode=true" --out-dir "./mobile-dark"

Step 10 - Import the images into your documentation depending on user theme

The last step is to dynamically import the images into your documentation, depending on the user theme and browser.

At reflow, we use Docusaurus / mdx files for our documentation. With a little swizzle, our images look like this:

import ThemedImage from '@theme/ThemedImage';

<ThemedImage
alt="Docusaurus themed image"
sources={{
lightdesktop: useBaseUrl('/img/desktop/login.signals.visualComparisonSignal.actual.image.png'),
lightmobile: useBaseUrl('/img/mobile/login.signals.visualComparisonSignal.actual.image.png'),
lighttablet: useBaseUrl('/img/tablet/login.signals.visualComparisonSignal.actual.image.png'),
darkdesktop: useBaseUrl('/img/desktop-dark/login.signals.visualComparisonSignal.actual.image.png'),
darkmobile: useBaseUrl('/img/mobile-dark/login.signals.visualComparisonSignal.actual.image.png'),
darktablet: useBaseUrl('/img/tablet-dark/login.signals.visualComparisonSignal.actual.image.png'),
}}
/>;

With the swizzle overriding ThemedImage with our new options:

/**
* Copyright (c) Facebook, Inc. and its affiliates.
* Modifications Copyright (c) Resilient Software Ltd
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
import React from 'react';
import clsx from 'clsx';
import useIsBrowser from '@docusaurus/useIsBrowser';
import { useColorMode } from '@docusaurus/theme-common';
import useMediaQuery from '@mui/material/useMediaQuery';
import styles from './styles.module.css';

export default function ThemedImage(props) {
const isBrowser = useIsBrowser();
const { isDarkTheme } = useColorMode();
const mobile = useMediaQuery('(max-width: 480px)');
const tablet = useMediaQuery('(min-width: 481px) and (max-width: 1025px)');
const desktop = useMediaQuery('(min-width: 1026px)');

const { sources, className, alt = '', ...propsRest } = props;

const isSplit = Boolean(sources['darkmobile']);
let clientThemes = ['dark'];
if (isSplit) {
if (mobile && isDarkTheme) {
clientThemes = ['darkmobile'];
} else if (mobile && !isDarkTheme) {
clientThemes = ['lightmobile'];
} else if (tablet && isDarkTheme) {
clientThemes = ['darktablet'];
} else if (tablet && !isDarkTheme) {
clientThemes = ['lighttablet'];
} else if (desktop && isDarkTheme) {
clientThemes = ['darkdesktop'];
} else if (desktop && !isDarkTheme) {
clientThemes = ['lightdesktop'];
}
} else {
clientThemes = isDarkTheme ? ['dark'] : ['light'];
}
const renderedSourceNames = isBrowser
? clientThemes // We need to render both images on the server to avoid flash
: // See https://github.com/facebook/docusaurus/pull/3730
Boolean(isSplit)
? ['lightmobile', 'lighttablet', 'lightdesktop', 'darkmobile', 'darktablet', 'darkdesktop']
: ['light', 'dark'];
return (
<>
{renderedSourceNames.map((sourceName) => (
<img
key={sourceName}
src={sources[sourceName]}
alt={alt}
className={clsx(styles.themedImage, styles[`themedImage--${sourceName}`], className)}
{...propsRest}
/>
))}
</>
);
}

And a media query so this works with SSR:

/**
* Copyright (c) Facebook, Inc. and its affiliates.
* Modifications Copyright (c) Resilient Software Ltd
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

.themedImage {
display: none;
}

/* Media Query for Mobile Devices */
@media (max-width: 480px) {
[data-theme='light'] .themedImage--lightmobile {
display: initial;
}

[data-theme='dark'] .themedImage--darkmobile {
display: initial;
}
}

/* Media Query for Tablets */
@media (min-width: 481px) and (max-width: 1025px){
[data-theme='light'] .themedImage--lighttablet {
display: initial;
}

[data-theme='dark'] .themedImage--darktablet {
display: initial;
}
}

/* Media Query for Desktops */
@media (min-width: 1026px) {
[data-theme='light'] .themedImage--lightdesktop {
display: initial;
}

[data-theme='dark'] .themedImage--darkdesktop {
display: initial;
}
}

[data-theme='light'] .themedImage--light {
display: initial;
}

[data-theme='dark'] .themedImage--dark {
display: initial;
}
This site uses cookies to enhance your user experience and conduct analytics to understand how our services are used. By continuing to use our site, you consent to the placement and use of cookies as described in our Privacy Policy.