Customizing the semantic HTML comparison
This library includes comparison and assert helpers that use the AngleSharp Diffing library to perform semantic HTML comparison.
On this page we will go through how the comparison works, and what options you have to affect the comparison process.
Why semantic comparison is needed for stable tests
Just performing string comparison of two strings containing HTML markup can break quite easily, even if the two markup strings are semantically equivalent. Some changes that can cause a regular string comparison to fail are as follows:
- Formatting of markup, e.g. with extra line breaks or indentions and changes to insignificant whitespace.
- Reordering of attributes. The order of attributes does not matter.
- Reordering of classes defined in the
class="..."
attribute. The order of classes does not matter. - Change of boolean attributes to be implicit or explicit, e.g. from
required="required"
torequired
. - Changes to insignificant whitespace inside
<style>
tags. - Changes to HTML comments and comments inside
<style>
tags.
The AngleSharp Diffing library handles all those cases, so your tests are more stable.
Customizing Options
The AngleSharp Diffing library also allows us to customize the comparison process by adding special attributes to the "control" markup, i.e. the expected markup we want to use in verification.
All the customization options below will match with the following markup:
<header>
<h1 id="head-1">
Hello <em>world</em>
</h1>
</header>
Here are the customization options you have available to you:
Ignore comments (enabled by default): Comments in markup and inside
<style>
tags are automatically ignored and not part of the comparison process.Ignore element: Use the
diff:ignore
attribute to ignore an element, all its attributes and its child nodes. For example, to ignore theh1
element, do the following:<header> <h1 diff:ignore></h1> </header>
Ignore attribute: To ignore an attribute during comparison, add the
:ignore
modifier to the attribute (no value is needed). For example, to ignore theid
attribute:<header> <h1 id:ignore>Hello <em>world</em></h1> </header>
Ignore children: Use the
diff:ignoreChildren
attribute (no value is needed) to ignore all child nodes/elements of an element. This does not include attributes. For example, to ignore all child nodes of theh1
element, do the following:<header> <h1 id="head-1" diff:ignoreChildren></h1> </header>
Ignore all attributes: Use the
diff:ignoreAttributes
attribute (no value is needed) to ignore all attributes of an element. For example:<header> <h1 diff:ignoreAttributes>Hello <em>world</em></h1> </header>
Note
The
diff:ignoreChildren
anddiff:ignoreAttributes
attributes can be combined to ignore all child nodes/element and attributes of an element, but still verify that the element itself exists. For example:<header> <h1 diff:ignoreChildren diff:ignoreAttributes></h1> </header>
Configure whitespace handling: By default, all nodes and elements are compared using the
Normalize
whitespace handling option. TheNormalize
option will trim all text nodes and replace two or more whitespace characters with a single space character. The other options arePreserve
, which will leave all whitespace unchanged, andRemoveWhitespaceNodes
, which will only remove empty text nodes.To override the default option, use the
diff:whitespace
attribute, and pass one of the three options to it. For example:<header> <h1 id="head-1" diff:whitespace="preserve"> Hello <em>world</em> </h1> </header>
Note
The default for
<pre>
and<script>
elements is thePreserve
option. To change that, use thediff:whitespace
attribute. For example:<header> <h1 id="head-1" diff:whitespace="RemoveWhitespaceNodes">Hello<em>world</em></pre> </header>
Perform case-insensitive comparison: By default, all text comparison is case sensitive, but if you want to perform a case-insensitive comparison of text inside elements or attributes, use the
diff:ignoreCase
attribute on elements and:ignoreCase
modifier on attributes. For example, to perform a case insensitive comparison of the text in the followingh1
element , do the following:<header> <h1 id="head-1" diff:ignoreCase>HeLLo <em>world</em></h1> </header
To perform case insensitive comparison of the text inside the
id
attribute, do the following:<header> <h1 id:ignoreCase="HeAD-1">Hello <em>world</em></h1> </header>
Use RegEx during comparison: To use a regular expression when comparing the text inside an element or inside an attribute, use the
diff:regex
attribute on elements and the:regex
modifier on attributes.For example, to use a regular expression during comparison of the text in the
em
element, add thediff:regex
attribute to the element and place the regular expression in the body of the element:<header> <h1 id="head-1">Hello <em diff:regex>\w</em></h1> </header>
To use a regular expression during comparison of the text inside the
id
attribute, add the:regex
modifier to the attribute and add the regular expression in the attribute's value:<h1 id:regex="head-\d{1}">...</h1>
Note
The attribute modifiers
:ignoreCase
and:regex
can be combined, for example, as:attr:ignoreCase:regex="FOO-\d{4}"
Examples
Let’s look at a few examples where we use the semantic comparison options listed above to modify the comparison. In tests, we have the MarkupMatches()
methods we can use to perform semantic comparison of the output from a rendered component. For example, we may have a component, <Heading>
, that renders the following markup:
<h3 id="heading-1337" required>
Heading text
<small class="text-muted mark">
Secondary text
</small>
</h3>
In this case, we want to verify that the markup is rendered correctly, using something such as RegEx to verify the id
attribute (it might be generated) and ignoring the <small>
element. In tests we can do this like so with the MarkupMatches()
method:
var cut = RenderComponent<Heading>();
// Assert
// Here we specify expected HTML from CUT.
var expectedHtml = @"<h3 id:regex=""heading-\d{4}"" required>
Heading text
<small diff:ignore></small>
</h3>";
// Here we use the HTML diffing library to assert that the rendered HTML
// from CUT is semantically the same as the expected HTML string above.
cut.MarkupMatches(expectedHtml);
}