Exploring The Interplay Between HTML Entities And TextContent In JavaScript

As I was playing around with inserting text at the last known caret location yesterday, I stumbled upon a large gap in my mental model for how HTML works. For years, I’ve been using HTML entities to generate web-safe HTML markup. However, I only just realized that if you read the textContent of an element that contains HTML entities, you don’t get the HTML markup of said element, you get the interpreted text content. What this means, as an example, is that if you render an emoji using hex-encoded HTML entities, reading the textContent out of that node gives you the actual emoji glyph! To see this in action, I put together a small JavaScript demo.

Run this demo in my JavaScript Demos project on GitHub.

View this code in my JavaScript Demos project on GitHub.

To demonstrate, all we’re going to do is render a paragraph that is composed entirely of HTML entities. Then, we’re going to grab the textContent of that element and echo the value into both an input element and the browser’s console:

<!doctype html>
<html lang="en">
	<meta charset="utf-8" />
		Exploring The Interplay Between HTML Entities And TextContent In JavaScript

	<link rel="stylesheet" type="text/css" href="./demo.css" />

		Exploring The Interplay Between HTML Entities And TextContent In JavaScript

	<p id="encoded">
		<!-- Common HTML entities. -->
		&lt; &gt; " &rarr;
		<!-- Slightly smiling face emoji. -->
		<!-- Frowning face. -->

	<input id="input" type="text" size="40" />

	<script type="text/javascript" src="../../vendor/jquery/3.6.0/jquery-3.6.0.min.js"></script>
	<script type="text/javascript">

		var encoded = $( "#encoded" );
		var input = $( "#input" );

		// Our encoded element contains text that we created using HTML entities; that
		// is, web-safe encodings that represent other values. When we then extract that
		// generated content, we get the RENDERED VALUE, not the ENCODED VALUE!
		var encodedValue = encoded
			.replace( /s+/g, " " ); // Cleaning up the white-space.

		// Echo the textContent in the Input and the Console.
		input.val( encodedValue );
		console.log( ( "%c" + encodedValue ), "font-family: monospace ;" );

		// And, just as a test, let's make sure the jQuery .text() method is actually
		// matching the raw .textProperty content.
		console.log( encoded.text() === encoded.prop( "textContent" ) );



As you can see, our test paragraph contains some common HTML entities and some encoded emoji codepoint sequences. But, when we grab those values using textContent and echo them to other text-base outputs, we get the following output:

An element's textContent echoed into an input and the console using JavaScript.

As you can see, the textContent property contains the evaluated HTML which, in this case, contains actual emoji glyphs, not the Unicode codepoints that we used to define the HTML content.

I can’t believe I didn’t know that the browser DOM (Document Object Model) worked this way. But, learning this is better late than never. I can definitely see this being helpful (unless you are one of those die-hards that believes “state” should never be stored on the DOM).

Source link

Latest articles

Related articles

Leave a reply

Please enter your comment!
Please enter your name here