at path：ROOT / wp-includes / html-api / class-wp-html-tag-processor.php
run：R W Run
class-wp-html-active-formatting-elements.php
7.09 KB
2026-03-11 16:18:52
R W Run
class-wp-html-attribute-token.php
2.71 KB
2026-03-11 16:18:52
R W Run
class-wp-html-decoder.php
16.3 KB
2026-03-11 16:18:52
R W Run
class-wp-html-doctype-info.php
24.79 KB
2026-03-11 16:18:52
R W Run
class-wp-html-open-elements.php
21.95 KB
2026-03-11 16:18:52
R W Run
class-wp-html-processor-state.php
11.07 KB
2026-03-11 16:18:52
R W Run
class-wp-html-processor.php
208.44 KB
2026-03-11 16:18:52
R W Run
class-wp-html-span.php
1.07 KB
2026-03-11 16:18:52
R W Run
class-wp-html-stack-event.php
1.6 KB
2026-03-11 16:18:52
R W Run
class-wp-html-tag-processor.php
147.75 KB
2026-03-11 16:18:52
R W Run
class-wp-html-text-replacement.php
1.38 KB
2026-03-11 16:18:52
R W Run
class-wp-html-token.php
3.33 KB
2026-03-11 16:18:52
R W Run
class-wp-html-unsupported-exception.php
3.52 KB
2026-03-11 16:18:52
R W Run
html5-named-character-references.php
78.28 KB
2026-03-11 16:18:52
R W Run
error_log
📄class-wp-html-tag-processor.php
  1<?php
  2/**
  3 * HTML API: WP_HTML_Tag_Processor class
  4 *
  5 * Scans through an HTML document to find specific tags, then
  6 * transforms those tags by adding, removing, or updating the
  7 * values of the HTML attributes within that tag (opener).
  8 *
  9 * Does not fully parse HTML or _recurse_ into the HTML structure
 10 * Instead this scans linearly through a document and only parses
 11 * the HTML tag openers.
 12 *
 13 * ### Possible future direction for this module
 14 *
 15 *  - Prune the whitespace when removing classes/attributes: e.g. "a b c" -> "c" not " c".
 16 *    This would increase the size of the changes for some operations but leave more
 17 *    natural-looking output HTML.
 18 *
 19 * @package WordPress
 20 * @subpackage HTML-API
 21 * @since 6.2.0
 22 */
 23
 24/**
 25 * Core class used to modify attributes in an HTML document for tags matching a query.
 26 *
 27 * ## Usage
 28 *
 29 * Use of this class requires three steps:
 30 *
 31 *  1. Create a new class instance with your input HTML document.
 32 *  2. Find the tag(s) you are looking for.
 33 *  3. Request changes to the attributes in those tag(s).
 34 *
 35 * Example:
 36 *
 37 *     $tags = new WP_HTML_Tag_Processor( $html );
 38 *     if ( $tags->next_tag( 'option' ) ) {
 39 *         $tags->set_attribute( 'selected', true );
 40 *     }
 41 *
 42 * ### Finding tags
 43 *
 44 * The `next_tag()` function moves the internal cursor through
 45 * your input HTML document until it finds a tag meeting any of
 46 * the supplied restrictions in the optional query argument. If
 47 * no argument is provided then it will find the next HTML tag,
 48 * regardless of what kind it is.
 49 *
 50 * If you want to _find whatever the next tag is_:
 51 *
 52 *     $tags->next_tag();
 53 *
 54 * | Goal                                                      | Query                                                                           |
 55 * |-----------------------------------------------------------|---------------------------------------------------------------------------------|
 56 * | Find any tag.                                             | `$tags->next_tag();`                                                            |
 57 * | Find next image tag.                                      | `$tags->next_tag( array( 'tag_name' => 'img' ) );`                              |
 58 * | Find next image tag (without passing the array).          | `$tags->next_tag( 'img' );`                                                     |
 59 * | Find next tag containing the `fullwidth` CSS class.       | `$tags->next_tag( array( 'class_name' => 'fullwidth' ) );`                      |
 60 * | Find next image tag containing the `fullwidth` CSS class. | `$tags->next_tag( array( 'tag_name' => 'img', 'class_name' => 'fullwidth' ) );` |
 61 *
 62 * If a tag was found meeting your criteria then `next_tag()`
 63 * will return `true` and you can proceed to modify it. If it
 64 * returns `false`, however, it failed to find the tag and
 65 * moved the cursor to the end of the file.
 66 *
 67 * Once the cursor reaches the end of the file the processor
 68 * is done and if you want to reach an earlier tag you will
 69 * need to recreate the processor and start over, as it's
 70 * unable to back up or move in reverse.
 71 *
 72 * See the section on bookmarks for an exception to this
 73 * no-backing-up rule.
 74 *
 75 * #### Custom queries
 76 *
 77 * Sometimes it's necessary to further inspect an HTML tag than
 78 * the query syntax here permits. In these cases one may further
 79 * inspect the search results using the read-only functions
 80 * provided by the processor or external state or variables.
 81 *
 82 * Example:
 83 *
 84 *     // Paint up to the first five DIV or SPAN tags marked with the "jazzy" style.
 85 *     $remaining_count = 5;
 86 *     while ( $remaining_count > 0 && $tags->next_tag() ) {
 87 *         if (
 88 *              ( 'DIV' === $tags->get_tag() || 'SPAN' === $tags->get_tag() ) &&
 89 *              'jazzy' === $tags->get_attribute( 'data-style' )
 90 *         ) {
 91 *             $tags->add_class( 'theme-style-everest-jazz' );
 92 *             $remaining_count--;
 93 *         }
 94 *     }
 95 *
 96 * `get_attribute()` will return `null` if the attribute wasn't present
 97 * on the tag when it was called. It may return `""` (the empty string)
 98 * in cases where the attribute was present but its value was empty.
 99 * For boolean attributes, those whose name is present but no value is
100 * given, it will return `true` (the only way to set `false` for an
101 * attribute is to remove it).
102 *
103 * #### When matching fails
104 *
105 * When `next_tag()` returns `false` it could mean different things:
106 *
107 *  - The requested tag wasn't found in the input document.
108 *  - The input document ended in the middle of an HTML syntax element.
109 *
110 * When a document ends in the middle of a syntax element it will pause
111 * the processor. This is to make it possible in the future to extend the
112 * input document and proceed - an important requirement for chunked
113 * streaming parsing of a document.
114 *
115 * Example:
116 *
117 *     $processor = new WP_HTML_Tag_Processor( 'This <div is="a" partial="token' );
118 *     false === $processor->next_tag();
119 *
120 * If a special element (see next section) is encountered but no closing tag
121 * is found it will count as an incomplete tag. The parser will pause as if
122 * the opening tag were incomplete.
123 *
124 * Example:
125 *
126 *     $processor = new WP_HTML_Tag_Processor( '<style>// there could be more styling to come' );
127 *     false === $processor->next_tag();
128 *
129 *     $processor = new WP_HTML_Tag_Processor( '<style>// this is everything</style><div>' );
130 *     true === $processor->next_tag( 'DIV' );
131 *
132 * #### Special self-contained elements
133 *
134 * Some HTML elements are handled in a special way; their start and end tags
135 * act like a void tag. These are special because their contents can't contain
136 * HTML markup. Everything inside these elements is handled in a special way
137 * and content that _appears_ like HTML tags inside of them isn't. There can
138 * be no nesting in these elements.
139 *
140 * In the following list, "raw text" means that all of the content in the HTML
141 * until the matching closing tag is treated verbatim without any replacements
142 * and without any parsing.
143 *
144 *  - IFRAME allows no content but requires a closing tag.
145 *  - NOEMBED (deprecated) content is raw text.
146 *  - NOFRAMES (deprecated) content is raw text.
147 *  - SCRIPT content is plaintext apart from legacy rules allowing `</script>` inside an HTML comment.
148 *  - STYLE content is raw text.
149 *  - TITLE content is plain text but character references are decoded.
150 *  - TEXTAREA content is plain text but character references are decoded.
151 *  - XMP (deprecated) content is raw text.
152 *
153 * ### Modifying HTML attributes for a found tag
154 *
155 * Once you've found the start of an opening tag you can modify
156 * any number of the attributes on that tag. You can set a new
157 * value for an attribute, remove the entire attribute, or do
158 * nothing and move on to the next opening tag.
159 *
160 * Example:
161 *
162 *     if ( $tags->next_tag( array( 'class_name' => 'wp-group-block' ) ) ) {
163 *         $tags->set_attribute( 'title', 'This groups the contained content.' );
164 *         $tags->remove_attribute( 'data-test-id' );
165 *     }
166 *
167 * If `set_attribute()` is called for an existing attribute it will
168 * overwrite the existing value. Similarly, calling `remove_attribute()`
169 * for a non-existing attribute has no effect on the document. Both
170 * of these methods are safe to call without knowing if a given attribute
171 * exists beforehand.
172 *
173 * ### Modifying CSS classes for a found tag
174 *
175 * The tag processor treats the `class` attribute as a special case.
176 * Because it's a common operation to add or remove CSS classes, this
177 * interface adds helper methods to make that easier.
178 *
179 * As with attribute values, adding or removing CSS classes is a safe
180 * operation that doesn't require checking if the attribute or class
181 * exists before making changes. If removing the only class then the
182 * entire `class` attribute will be removed.
183 *
184 * Example:
185 *
186 *     // from `<span>Yippee!</span>`
187 *     //   to `<span class="is-active">Yippee!</span>`
188 *     $tags->add_class( 'is-active' );
189 *
190 *     // from `<span class="excited">Yippee!</span>`
191 *     //   to `<span class="excited is-active">Yippee!</span>`
192 *     $tags->add_class( 'is-active' );
193 *
194 *     // from `<span class="is-active heavy-accent">Yippee!</span>`
195 *     //   to `<span class="is-active heavy-accent">Yippee!</span>`
196 *     $tags->add_class( 'is-active' );
197 *
198 *     // from `<input type="text" class="is-active rugby not-disabled" length="24">`
199 *     //   to `<input type="text" class="is-active not-disabled" length="24">
200 *     $tags->remove_class( 'rugby' );
201 *
202 *     // from `<input type="text" class="rugby" length="24">`
203 *     //   to `<input type="text" length="24">
204 *     $tags->remove_class( 'rugby' );
205 *
206 *     // from `<input type="text" length="24">`
207 *     //   to `<input type="text" length="24">
208 *     $tags->remove_class( 'rugby' );
209 *
210 * When class changes are enqueued but a direct change to `class` is made via
211 * `set_attribute` then the changes to `set_attribute` (or `remove_attribute`)
212 * will take precedence over those made through `add_class` and `remove_class`.
213 *
214 * ### Bookmarks
215 *
216 * While scanning through the input HTMl document it's possible to set
217 * a named bookmark when a particular tag is found. Later on, after
218 * continuing to scan other tags, it's possible to `seek` to one of
219 * the set bookmarks and then proceed again from that point forward.
220 *
221 * Because bookmarks create processing overhead one should avoid
222 * creating too many of them. As a rule, create only bookmarks
223 * of known string literal names; avoid creating "mark_{$index}"
224 * and so on. It's fine from a performance standpoint to create a
225 * bookmark and update it frequently, such as within a loop.
226 *
227 *     $total_todos = 0;
228 *     while ( $p->next_tag( array( 'tag_name' => 'UL', 'class_name' => 'todo' ) ) ) {
229 *         $p->set_bookmark( 'list-start' );
230 *         while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) {
231 *             if ( 'UL' === $p->get_tag() && $p->is_tag_closer() ) {
232 *                 $p->set_bookmark( 'list-end' );
233 *                 $p->seek( 'list-start' );
234 *                 $p->set_attribute( 'data-contained-todos', (string) $total_todos );
235 *                 $total_todos = 0;
236 *                 $p->seek( 'list-end' );
237 *                 break;
238 *             }
239 *
240 *             if ( 'LI' === $p->get_tag() && ! $p->is_tag_closer() ) {
241 *                 $total_todos++;
242 *             }
243 *         }
244 *     }
245 *
246 * ## Tokens and finer-grained processing.
247 *
248 * It's possible to scan through every lexical token in the
249 * HTML document using the `next_token()` function. This
250 * alternative form takes no argument and provides no built-in
251 * query syntax.
252 *
253 * Example:
254 *
255 *      $title = '(untitled)';
256 *      $text  = '';
257 *      while ( $processor->next_token() ) {
258 *          switch ( $processor->get_token_name() ) {
259 *              case '#text':
260 *                  $text .= $processor->get_modifiable_text();
261 *                  break;
262 *
263 *              case 'BR':
264 *                  $text .= "\n";
265 *                  break;
266 *
267 *              case 'TITLE':
268 *                  $title = $processor->get_modifiable_text();
269 *                  break;
270 *          }
271 *      }
272 *      return trim( "# {$title}\n\n{$text}" );
273 *
274 * ### Tokens and _modifiable text_.
275 *
276 * #### Special "atomic" HTML elements.
277 *
278 * Not all HTML elements are able to contain other elements inside of them.
279 * For instance, the contents inside a TITLE element are plaintext (except
280 * that character references like &amp; will be decoded). This means that
281 * if the string `<img>` appears inside a TITLE element, then it's not an
282 * image tag, but rather it's text describing an image tag. Likewise, the
283 * contents of a SCRIPT or STYLE element are handled entirely separately in
284 * a browser than the contents of other elements because they represent a
285 * different language than HTML.
286 *
287 * For these elements the Tag Processor treats the entire sequence as one,
288 * from the opening tag, including its contents, through its closing tag.
289 * This means that the it's not possible to match the closing tag for a
290 * SCRIPT element unless it's unexpected; the Tag Processor already matched
291 * it when it found the opening tag.
292 *
293 * The inner contents of these elements are that element's _modifiable text_.
294 *
295 * The special elements are:
296 *  - `SCRIPT` whose contents are treated as raw plaintext but supports a legacy
297 *    style of including JavaScript inside of HTML comments to avoid accidentally
298 *    closing the SCRIPT from inside a JavaScript string. E.g. `console.log( '</script>' )`.
299 *  - `TITLE` and `TEXTAREA` whose contents are treated as plaintext and then any
300 *    character references are decoded. E.g. `1 &lt; 2 < 3` becomes `1 < 2 < 3`.
301 *  - `IFRAME`, `NOSCRIPT`, `NOEMBED`, `NOFRAME`, `STYLE` whose contents are treated as
302 *    raw plaintext and left as-is. E.g. `1 &lt; 2 < 3` remains `1 &lt; 2 < 3`.
303 *
304 * #### Other tokens with modifiable text.
305 *
306 * There are also non-elements which are void/self-closing in nature and contain
307 * modifiable text that is part of that individual syntax token itself.
308 *
309 *  - `#text` nodes, whose entire token _is_ the modifiable text.
310 *  - HTML comments and tokens that become comments due to some syntax error. The
311 *    text for these tokens is the portion of the comment inside of the syntax.
312 *    E.g. for `<!-- comment -->` the text is `" comment "` (note the spaces are included).
313 *  - `CDATA` sections, whose text is the content inside of the section itself. E.g. for
314 *    `<![CDATA[some content]]>` the text is `"some content"` (with restrictions [1]).
315 *  - "Funky comments," which are a special case of invalid closing tags whose name is
316 *    invalid. The text for these nodes is the text that a browser would transform into
317 *    an HTML comment when parsing. E.g. for `</%post_author>` the text is `%post_author`.
318 *  - `DOCTYPE` declarations like `<DOCTYPE html>` which have no closing tag.
319 *  - XML Processing instruction nodes like `<?wp __( "Like" ); ?>` (with restrictions [2]).
320 *  - The empty end tag `</>` which is ignored in the browser and DOM.
321 *
322 * [1]: There are no CDATA sections in HTML. When encountering `<![CDATA[`, everything
323 *      until the next `>` becomes a bogus HTML comment, meaning there can be no CDATA
324 *      section in an HTML document containing `>`. The Tag Processor will first find
325 *      all valid and bogus HTML comments, and then if the comment _would_ have been a
326 *      CDATA section _were they to exist_, it will indicate this as the type of comment.
327 *
328 * [2]: XML allows a broader range of characters in a processing instruction's target name
329 *      and disallows "xml" as a name, since it's special. The Tag Processor only recognizes
330 *      target names with an ASCII-representable subset of characters. It also exhibits the
331 *      same constraint as with CDATA sections, in that `>` cannot exist within the token
332 *      since Processing Instructions do no exist within HTML and their syntax transforms
333 *      into a bogus comment in the DOM.
334 *
335 * ## Design and limitations
336 *
337 * The Tag Processor is designed to linearly scan HTML documents and tokenize
338 * HTML tags and their attributes. It's designed to do this as efficiently as
339 * possible without compromising parsing integrity. Therefore it will be
340 * slower than some methods of modifying HTML, such as those incorporating
341 * over-simplified PCRE patterns, but will not introduce the defects and
342 * failures that those methods bring in, which lead to broken page renders
343 * and often to security vulnerabilities. On the other hand, it will be faster
344 * than full-blown HTML parsers such as DOMDocument and use considerably
345 * less memory. It requires a negligible memory overhead, enough to consider
346 * it a zero-overhead system.
347 *
348 * The performance characteristics are maintained by avoiding tree construction
349 * and semantic cleanups which are specified in HTML5. Because of this, for
350 * example, it's not possible for the Tag Processor to associate any given
351 * opening tag with its corresponding closing tag, or to return the inner markup
352 * inside an element. Systems may be built on top of the Tag Processor to do
353 * this, but the Tag Processor is and should be constrained so it can remain an
354 * efficient, low-level, and reliable HTML scanner.
355 *
356 * The Tag Processor's design incorporates a "garbage-in-garbage-out" philosophy.
357 * HTML5 specifies that certain invalid content be transformed into different forms
358 * for display, such as removing null bytes from an input document and replacing
359 * invalid characters with the Unicode replacement character `U+FFFD` (visually "�").
360 * Where errors or transformations exist within the HTML5 specification, the Tag Processor
361 * leaves those invalid inputs untouched, passing them through to the final browser
362 * to handle. While this implies that certain operations will be non-spec-compliant,
363 * such as reading the value of an attribute with invalid content, it also preserves a
364 * simplicity and efficiency for handling those error cases.
365 *
366 * Most operations within the Tag Processor are designed to minimize the difference
367 * between an input and output document for any given change. For example, the
368 * `add_class` and `remove_class` methods preserve whitespace and the class ordering
369 * within the `class` attribute; and when encountering tags with duplicated attributes,
370 * the Tag Processor will leave those invalid duplicate attributes where they are but
371 * update the proper attribute which the browser will read for parsing its value. An
372 * exception to this rule is that all attribute updates store their values as
373 * double-quoted strings, meaning that attributes on input with single-quoted or
374 * unquoted values will appear in the output with double-quotes.
375 *
376 * ### Scripting Flag
377 *
378 * The Tag Processor parses HTML with the "scripting flag" disabled. This means
379 * that it doesn't run any scripts while parsing the page. In a browser with
380 * JavaScript enabled, for example, the script can change the parse of the
381 * document as it loads. On the server, however, evaluating JavaScript is not
382 * only impractical, but also unwanted.
383 *
384 * Practically this means that the Tag Processor will descend into NOSCRIPT
385 * elements and process its child tags. Were the scripting flag enabled, such
386 * as in a typical browser, the contents of NOSCRIPT are skipped entirely.
387 *
388 * This allows the HTML API to process the content that will be presented in
389 * a browser when scripting is disabled, but it offers a different view of a
390 * page than most browser sessions will experience. E.g. the tags inside the
391 * NOSCRIPT disappear.
392 *
393 * ### Text Encoding
394 *
395 * The Tag Processor assumes that the input HTML document is encoded with a
396 * text encoding compatible with 7-bit ASCII's '<', '>', '&', ';', '/', '=',
397 * "'", '"', 'a' - 'z', 'A' - 'Z', and the whitespace characters ' ', tab,
398 * carriage-return, newline, and form-feed.
399 *
400 * In practice, this includes almost every single-byte encoding as well as
401 * UTF-8. Notably, however, it does not include UTF-16. If providing input
402 * that's incompatible, then convert the encoding beforehand.
403 *
404 * @since 6.2.0
405 * @since 6.2.1 Fix: Support for various invalid comments; attribute updates are case-insensitive.
406 * @since 6.3.2 Fix: Skip HTML-like content inside rawtext elements such as STYLE.
407 * @since 6.5.0 Pauses processor when input ends in an incomplete syntax token.
408 *              Introduces "special" elements which act like void elements, e.g. TITLE, STYLE.
409 *              Allows scanning through all tokens and processing modifiable text, where applicable.
410 */
411class WP_HTML_Tag_Processor {
412	/**
413	 * The maximum number of bookmarks allowed to exist at
414	 * any given time.
415	 *
416	 * @since 6.2.0
417	 * @var int
418	 *
419	 * @see WP_HTML_Tag_Processor::set_bookmark()
420	 */
421	const MAX_BOOKMARKS = 10;
422
423	/**
424	 * Maximum number of times seek() can be called.
425	 * Prevents accidental infinite loops.
426	 *
427	 * @since 6.2.0
428	 * @var int
429	 *
430	 * @see WP_HTML_Tag_Processor::seek()
431	 */
432	const MAX_SEEK_OPS = 1000;
433
434	/**
435	 * The HTML document to parse.
436	 *
437	 * @since 6.2.0
438	 * @var string
439	 */
440	protected $html;
441
442	/**
443	 * The last query passed to next_tag().
444	 *
445	 * @since 6.2.0
446	 * @var array|null
447	 */
448	private $last_query;
449
450	/**
451	 * The tag name this processor currently scans for.
452	 *
453	 * @since 6.2.0
454	 * @var string|null
455	 */
456	private $sought_tag_name;
457
458	/**
459	 * The CSS class name this processor currently scans for.
460	 *
461	 * @since 6.2.0
462	 * @var string|null
463	 */
464	private $sought_class_name;
465
466	/**
467	 * The match offset this processor currently scans for.
468	 *
469	 * @since 6.2.0
470	 * @var int|null
471	 */
472	private $sought_match_offset;
473
474	/**
475	 * Whether to visit tag closers, e.g. </div>, when walking an input document.
476	 *
477	 * @since 6.2.0
478	 * @var bool
479	 */
480	private $stop_on_tag_closers;
481
482	/**
483	 * Specifies mode of operation of the parser at any given time.
484	 *
485	 * | State           | Meaning                                                              |
486	 * | ----------------|----------------------------------------------------------------------|
487	 * | *Ready*         | The parser is ready to run.                                          |
488	 * | *Complete*      | There is nothing left to parse.                                      |
489	 * | *Incomplete*    | The HTML ended in the middle of a token; nothing more can be parsed. |
490	 * | *Matched tag*   | Found an HTML tag; it's possible to modify its attributes.           |
491	 * | *Text node*     | Found a #text node; this is plaintext and modifiable.                |
492	 * | *CDATA node*    | Found a CDATA section; this is modifiable.                           |
493	 * | *Comment*       | Found a comment or bogus comment; this is modifiable.                |
494	 * | *Presumptuous*  | Found an empty tag closer: `</>`.                                    |
495	 * | *Funky comment* | Found a tag closer with an invalid tag name; this is modifiable.     |
496	 *
497	 * @since 6.5.0
498	 *
499	 * @see WP_HTML_Tag_Processor::STATE_READY
500	 * @see WP_HTML_Tag_Processor::STATE_COMPLETE
501	 * @see WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT
502	 * @see WP_HTML_Tag_Processor::STATE_MATCHED_TAG
503	 * @see WP_HTML_Tag_Processor::STATE_TEXT_NODE
504	 * @see WP_HTML_Tag_Processor::STATE_CDATA_NODE
505	 * @see WP_HTML_Tag_Processor::STATE_COMMENT
506	 * @see WP_HTML_Tag_Processor::STATE_DOCTYPE
507	 * @see WP_HTML_Tag_Processor::STATE_PRESUMPTUOUS_TAG
508	 * @see WP_HTML_Tag_Processor::STATE_FUNKY_COMMENT
509	 *
510	 * @var string
511	 */
512	protected $parser_state = self::STATE_READY;
513
514	/**
515	 * Indicates if the document is in quirks mode or no-quirks mode.
516	 *
517	 *  Impact on HTML parsing:
518	 *
519	 *   - In `NO_QUIRKS_MODE` (also known as "standard mode"):
520	 *       - CSS class and ID selectors match byte-for-byte (case-sensitively).
521	 *       - A TABLE start tag `<table>` implicitly closes any open `P` element.
522	 *
523	 *   - In `QUIRKS_MODE`:
524	 *       - CSS class and ID selectors match match in an ASCII case-insensitive manner.
525	 *       - A TABLE start tag `<table>` opens a `TABLE` element as a child of a `P`
526	 *         element if one is open.
527	 *
528	 * Quirks and no-quirks mode are thus mostly about styling, but have an impact when
529	 * tables are found inside paragraph elements.
530	 *
531	 * @see self::QUIRKS_MODE
532	 * @see self::NO_QUIRKS_MODE
533	 *
534	 * @since 6.7.0
535	 *
536	 * @var string
537	 */
538	protected $compat_mode = self::NO_QUIRKS_MODE;
539
540	/**
541	 * Indicates whether the parser is inside foreign content,
542	 * e.g. inside an SVG or MathML element.
543	 *
544	 * One of 'html', 'svg', or 'math'.
545	 *
546	 * Several parsing rules change based on whether the parser
547	 * is inside foreign content, including whether CDATA sections
548	 * are allowed and whether a self-closing flag indicates that
549	 * an element has no content.
550	 *
551	 * @since 6.7.0
552	 *
553	 * @var string
554	 */
555	private $parsing_namespace = 'html';
556
557	/**
558	 * What kind of syntax token became an HTML comment.
559	 *
560	 * Since there are many ways in which HTML syntax can create an HTML comment,
561	 * this indicates which of those caused it. This allows the Tag Processor to
562	 * represent more from the original input document than would appear in the DOM.
563	 *
564	 * @since 6.5.0
565	 *
566	 * @var string|null
567	 */
568	protected $comment_type = null;
569
570	/**
571	 * What kind of text the matched text node represents, if it was subdivided.
572	 *
573	 * @see self::TEXT_IS_NULL_SEQUENCE
574	 * @see self::TEXT_IS_WHITESPACE
575	 * @see self::TEXT_IS_GENERIC
576	 * @see self::subdivide_text_appropriately
577	 *
578	 * @since 6.7.0
579	 *
580	 * @var string
581	 */
582	protected $text_node_classification = self::TEXT_IS_GENERIC;
583
584	/**
585	 * How many bytes from the original HTML document have been read and parsed.
586	 *
587	 * This value points to the latest byte offset in the input document which
588	 * has been already parsed. It is the internal cursor for the Tag Processor
589	 * and updates while scanning through the HTML tokens.
590	 *
591	 * @since 6.2.0
592	 * @var int
593	 */
594	private $bytes_already_parsed = 0;
595
596	/**
597	 * Byte offset in input document where current token starts.
598	 *
599	 * Example:
600	 *
601	 *     <div id="test">...
602	 *     01234
603	 *     - token starts at 0
604	 *
605	 * @since 6.5.0
606	 *
607	 * @var int|null
608	 */
609	private $token_starts_at;
610
611	/**
612	 * Byte length of current token.
613	 *
614	 * Example:
615	 *
616	 *     <div id="test">...
617	 *     012345678901234
618	 *     - token length is 14 - 0 = 14
619	 *
620	 *     a <!-- comment --> is a token.
621	 *     0123456789 123456789 123456789
622	 *     - token length is 17 - 2 = 15
623	 *
624	 * @since 6.5.0
625	 *
626	 * @var int|null
627	 */
628	private $token_length;
629
630	/**
631	 * Byte offset in input document where current tag name starts.
632	 *
633	 * Example:
634	 *
635	 *     <div id="test">...
636	 *     01234
637	 *      - tag name starts at 1
638	 *
639	 * @since 6.2.0
640	 *
641	 * @var int|null
642	 */
643	private $tag_name_starts_at;
644
645	/**
646	 * Byte length of current tag name.
647	 *
648	 * Example:
649	 *
650	 *     <div id="test">...
651	 *     01234
652	 *      --- tag name length is 3
653	 *
654	 * @since 6.2.0
655	 *
656	 * @var int|null
657	 */
658	private $tag_name_length;
659
660	/**
661	 * Byte offset into input document where current modifiable text starts.
662	 *
663	 * @since 6.5.0
664	 *
665	 * @var int
666	 */
667	private $text_starts_at;
668
669	/**
670	 * Byte length of modifiable text.
671	 *
672	 * @since 6.5.0
673	 *
674	 * @var int
675	 */
676	private $text_length;
677
678	/**
679	 * Whether the current tag is an opening tag, e.g. <div>, or a closing tag, e.g. </div>.
680	 *
681	 * @var bool
682	 */
683	private $is_closing_tag;
684
685	/**
686	 * Lazily-built index of attributes found within an HTML tag, keyed by the attribute name.
687	 *
688	 * Example:
689	 *
690	 *     // Supposing the parser is working through this content
691	 *     // and stops after recognizing the `id` attribute.
692	 *     // <div id="test-4" class=outline title="data:text/plain;base64=asdk3nk1j3fo8">
693	 *     //                 ^ parsing will continue from this point.
694	 *     $this->attributes = array(
695	 *         'id' => new WP_HTML_Attribute_Token( 'id', 9, 6, 5, 11, false )
696	 *     );
697	 *
698	 *     // When picking up parsing again, or when asking to find the
699	 *     // `class` attribute we will continue and add to this array.
700	 *     $this->attributes = array(
701	 *         'id'    => new WP_HTML_Attribute_Token( 'id', 9, 6, 5, 11, false ),
702	 *         'class' => new WP_HTML_Attribute_Token( 'class', 23, 7, 17, 13, false )
703	 *     );
704	 *
705	 *     // Note that only the `class` attribute value is stored in the index.
706	 *     // That's because it is the only value used by this class at the moment.
707	 *
708	 * @since 6.2.0
709	 * @var WP_HTML_Attribute_Token[]
710	 */
711	private $attributes = array();
712
713	/**
714	 * Tracks spans of duplicate attributes on a given tag, used for removing
715	 * all copies of an attribute when calling `remove_attribute()`.
716	 *
717	 * @since 6.3.2
718	 *
719	 * @var (WP_HTML_Span[])[]|null
720	 */
721	private $duplicate_attributes = null;
722
723	/**
724	 * Which class names to add or remove from a tag.
725	 *
726	 * These are tracked separately from attribute updates because they are
727	 * semantically distinct, whereas this interface exists for the common
728	 * case of adding and removing class names while other attributes are
729	 * generally modified as with DOM `setAttribute` calls.
730	 *
731	 * When modifying an HTML document these will eventually be collapsed
732	 * into a single `set_attribute( 'class', $changes )` call.
733	 *
734	 * Example:
735	 *
736	 *     // Add the `wp-block-group` class, remove the `wp-group` class.
737	 *     $classname_updates = array(
738	 *         // Indexed by a comparable class name.
739	 *         'wp-block-group' => WP_HTML_Tag_Processor::ADD_CLASS,
740	 *         'wp-group'       => WP_HTML_Tag_Processor::REMOVE_CLASS
741	 *     );
742	 *
743	 * @since 6.2.0
744	 * @var bool[]
745	 */
746	private $classname_updates = array();
747
748	/**
749	 * Tracks a semantic location in the original HTML which
750	 * shifts with updates as they are applied to the document.
751	 *
752	 * @since 6.2.0
753	 * @var WP_HTML_Span[]
754	 */
755	protected $bookmarks = array();
756
757	const ADD_CLASS    = true;
758	const REMOVE_CLASS = false;
759	const SKIP_CLASS   = null;
760
761	/**
762	 * Lexical replacements to apply to input HTML document.
763	 *
764	 * "Lexical" in this class refers to the part of this class which
765	 * operates on pure text _as text_ and not as HTML. There's a line
766	 * between the public interface, with HTML-semantic methods like
767	 * `set_attribute` and `add_class`, and an internal state that tracks
768	 * text offsets in the input document.
769	 *
770	 * When higher-level HTML methods are called, those have to transform their
771	 * operations (such as setting an attribute's value) into text diffing
772	 * operations (such as replacing the sub-string from indices A to B with
773	 * some given new string). These text-diffing operations are the lexical
774	 * updates.
775	 *
776	 * As new higher-level methods are added they need to collapse their
777	 * operations into these lower-level lexical updates since that's the
778	 * Tag Processor's internal language of change. Any code which creates
779	 * these lexical updates must ensure that they do not cross HTML syntax
780	 * boundaries, however, so these should never be exposed outside of this
781	 * class or any classes which intentionally expand its functionality.
782	 *
783	 * These are enqueued while editing the document instead of being immediately
784	 * applied to avoid processing overhead, string allocations, and string
785	 * copies when applying many updates to a single document.
786	 *
787	 * Example:
788	 *
789	 *     // Replace an attribute stored with a new value, indices
790	 *     // sourced from the lazily-parsed HTML recognizer.
791	 *     $start  = $attributes['src']->start;
792	 *     $length = $attributes['src']->length;
793	 *     $modifications[] = new WP_HTML_Text_Replacement( $start, $length, $new_value );
794	 *
795	 *     // Correspondingly, something like this will appear in this array.
796	 *     $lexical_updates = array(
797	 *         WP_HTML_Text_Replacement( 14, 28, 'https://my-site.my-domain/wp-content/uploads/2014/08/kittens.jpg' )
798	 *     );
799	 *
800	 * @since 6.2.0
801	 * @var WP_HTML_Text_Replacement[]
802	 */
803	protected $lexical_updates = array();
804
805	/**
806	 * Tracks and limits `seek()` calls to prevent accidental infinite loops.
807	 *
808	 * @since 6.2.0
809	 * @var int
810	 *
811	 * @see WP_HTML_Tag_Processor::seek()
812	 */
813	protected $seek_count = 0;
814
815	/**
816	 * Whether the parser should skip over an immediately-following linefeed
817	 * character, as is the case with LISTING, PRE, and TEXTAREA.
818	 *
819	 * > If the next token is a U+000A LINE FEED (LF) character token, then
820	 * > ignore that token and move on to the next one. (Newlines at the start
821	 * > of [these] elements are ignored as an authoring convenience.)
822	 *
823	 * @since 6.7.0
824	 *
825	 * @var int|null
826	 */
827	private $skip_newline_at = null;
828
829	/**
830	 * Constructor.
831	 *
832	 * @since 6.2.0
833	 *
834	 * @param string $html HTML to process.
835	 */
836	public function __construct( $html ) {
837		if ( ! is_string( $html ) ) {
838			_doing_it_wrong(
839				__METHOD__,
840				__( 'The HTML parameter must be a string.' ),
841				'6.9.0'
842			);
843			$html = '';
844		}
845		$this->html = $html;
846	}
847
848	/**
849	 * Switches parsing mode into a new namespace, such as when
850	 * encountering an SVG tag and entering foreign content.
851	 *
852	 * @since 6.7.0
853	 *
854	 * @param string $new_namespace One of 'html', 'svg', or 'math' indicating into what
855	 *                              namespace the next tokens will be processed.
856	 * @return bool Whether the namespace was valid and changed.
857	 */
858	public function change_parsing_namespace( string $new_namespace ): bool {
859		if ( ! in_array( $new_namespace, array( 'html', 'math', 'svg' ), true ) ) {
860			return false;
861		}
862
863		$this->parsing_namespace = $new_namespace;
864		return true;
865	}
866
867	/**
868	 * Finds the next tag matching the $query.
869	 *
870	 * @since 6.2.0
871	 * @since 6.5.0 No longer processes incomplete tokens at end of document; pauses the processor at start of token.
872	 *
873	 * @param array|string|null $query {
874	 *     Optional. Which tag name to find, having which class, etc. Default is to find any tag.
875	 *
876	 *     @type string|null $tag_name     Which tag to find, or `null` for "any tag."
877	 *     @type int|null    $match_offset Find the Nth tag matching all search criteria.
878	 *                                     1 for "first" tag, 3 for "third," etc.
879	 *                                     Defaults to first tag.
880	 *     @type string|null $class_name   Tag must contain this whole class name to match.
881	 *     @type string|null $tag_closers  "visit" or "skip": whether to stop on tag closers, e.g. </div>.
882	 * }
883	 * @return bool Whether a tag was matched.
884	 */
885	public function next_tag( $query = null ): bool {
886		$this->parse_query( $query );
887		$already_found = 0;
888
889		do {
890			if ( false === $this->next_token() ) {
891				return false;
892			}
893
894			if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
895				continue;
896			}
897
898			if ( $this->matches() ) {
899				++$already_found;
900			}
901		} while ( $already_found < $this->sought_match_offset );
902
903		return true;
904	}
905
906	/**
907	 * Finds the next token in the HTML document.
908	 *
909	 * An HTML document can be viewed as a stream of tokens,
910	 * where tokens are things like HTML tags, HTML comments,
911	 * text nodes, etc. This method finds the next token in
912	 * the HTML document and returns whether it found one.
913	 *
914	 * If it starts parsing a token and reaches the end of the
915	 * document then it will seek to the start of the last
916	 * token and pause, returning `false` to indicate that it
917	 * failed to find a complete token.
918	 *
919	 * Possible token types, based on the HTML specification:
920	 *
921	 *  - an HTML tag, whether opening, closing, or void.
922	 *  - a text node - the plaintext inside tags.
923	 *  - an HTML comment.
924	 *  - a DOCTYPE declaration.
925	 *  - a processing instruction, e.g. `<?xml version="1.0" ?>`.
926	 *
927	 * The Tag Processor currently only supports the tag token.
928	 *
929	 * @since 6.5.0
930	 * @since 6.7.0 Recognizes CDATA sections within foreign content.
931	 *
932	 * @return bool Whether a token was parsed.
933	 */
934	public function next_token(): bool {
935		return $this->base_class_next_token();
936	}
937
938	/**
939	 * Internal method which finds the next token in the HTML document.
940	 *
941	 * This method is a protected internal function which implements the logic for
942	 * finding the next token in a document. It exists so that the parser can update
943	 * its state without affecting the location of the cursor in the document and
944	 * without triggering subclass methods for things like `next_token()`, e.g. when
945	 * applying patches before searching for the next token.
946	 *
947	 * @since 6.5.0
948	 *
949	 * @access private
950	 *
951	 * @return bool Whether a token was parsed.
952	 */
953	private function base_class_next_token(): bool {
954		$was_at = $this->bytes_already_parsed;
955		$this->after_tag();
956
957		// Don't proceed if there's nothing more to scan.
958		if (
959			self::STATE_COMPLETE === $this->parser_state ||
960			self::STATE_INCOMPLETE_INPUT === $this->parser_state
961		) {
962			return false;
963		}
964
965		/*
966		 * The next step in the parsing loop determines the parsing state;
967		 * clear it so that state doesn't linger from the previous step.
968		 */
969		$this->parser_state = self::STATE_READY;
970
971		if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
972			$this->parser_state = self::STATE_COMPLETE;
973			return false;
974		}
975
976		// Find the next tag if it exists.
977		if ( false === $this->parse_next_tag() ) {
978			if ( self::STATE_INCOMPLETE_INPUT === $this->parser_state ) {
979				$this->bytes_already_parsed = $was_at;
980			}
981
982			return false;
983		}
984
985		/*
986		 * For legacy reasons the rest of this function handles tags and their
987		 * attributes. If the processor has reached the end of the document
988		 * or if it matched any other token then it should return here to avoid
989		 * attempting to process tag-specific syntax.
990		 */
991		if (
992			self::STATE_INCOMPLETE_INPUT !== $this->parser_state &&
993			self::STATE_COMPLETE !== $this->parser_state &&
994			self::STATE_MATCHED_TAG !== $this->parser_state
995		) {
996			return true;
997		}
998
999		// Parse all of its attributes.
1000		while ( $this->parse_next_attribute() ) {
1001			continue;
1002		}
1003
1004		// Ensure that the tag closes before the end of the document.
1005		if (
1006			self::STATE_INCOMPLETE_INPUT === $this->parser_state ||
1007			$this->bytes_already_parsed >= strlen( $this->html )
1008		) {
1009			// Does this appropriately clear state (parsed attributes)?
1010			$this->parser_state         = self::STATE_INCOMPLETE_INPUT;
1011			$this->bytes_already_parsed = $was_at;
1012
1013			return false;
1014		}
1015
1016		$tag_ends_at = strpos( $this->html, '>', $this->bytes_already_parsed );
1017		if ( false === $tag_ends_at ) {
1018			$this->parser_state         = self::STATE_INCOMPLETE_INPUT;
1019			$this->bytes_already_parsed = $was_at;
1020
1021			return false;
1022		}
1023		$this->parser_state         = self::STATE_MATCHED_TAG;
1024		$this->bytes_already_parsed = $tag_ends_at + 1;
1025		$this->token_length         = $this->bytes_already_parsed - $this->token_starts_at;
1026
1027		/*
1028		 * Certain tags require additional processing. The first-letter pre-check
1029		 * avoids unnecessary string allocation when comparing the tag names.
1030		 *
1031		 *  - IFRAME
1032		 *  - LISTING (deprecated)
1033		 *  - NOEMBED (deprecated)
1034		 *  - NOFRAMES (deprecated)
1035		 *  - PRE
1036		 *  - SCRIPT
1037		 *  - STYLE
1038		 *  - TEXTAREA
1039		 *  - TITLE
1040		 *  - XMP (deprecated)
1041		 */
1042		if (
1043			$this->is_closing_tag ||
1044			'html' !== $this->parsing_namespace ||
1045			1 !== strspn( $this->html, 'iIlLnNpPsStTxX', $this->tag_name_starts_at, 1 )
1046		) {
1047			return true;
1048		}
1049
1050		$tag_name = $this->get_tag();
1051
1052		/*
1053		 * For LISTING, PRE, and TEXTAREA, the first linefeed of an immediately-following
1054		 * text node is ignored as an authoring convenience.
1055		 *
1056		 * @see static::skip_newline_at
1057		 */
1058		if ( 'LISTING' === $tag_name || 'PRE' === $tag_name ) {
1059			$this->skip_newline_at = $this->bytes_already_parsed;
1060			return true;
1061		}
1062
1063		/*
1064		 * There are certain elements whose children are not DATA but are instead
1065		 * RCDATA or RAWTEXT. These cannot contain other elements, and the contents
1066		 * are parsed as plaintext, with character references decoded in RCDATA but
1067		 * not in RAWTEXT.
1068		 *
1069		 * These elements are described here as "self-contained" or special atomic
1070		 * elements whose end tag is consumed with the opening tag, and they will
1071		 * contain modifiable text inside of them.
1072		 *
1073		 * Preserve the opening tag pointers, as these will be overwritten
1074		 * when finding the closing tag. They will be reset after finding
1075		 * the closing to tag to point to the opening of the special atomic
1076		 * tag sequence.
1077		 */
1078		$tag_name_starts_at   = $this->tag_name_starts_at;
1079		$tag_name_length      = $this->tag_name_length;
1080		$tag_ends_at          = $this->token_starts_at + $this->token_length;
1081		$attributes           = $this->attributes;
1082		$duplicate_attributes = $this->duplicate_attributes;
1083
1084		// Find the closing tag if necessary.
1085		switch ( $tag_name ) {
1086			case 'SCRIPT':
1087				$found_closer = $this->skip_script_data();
1088				break;
1089
1090			case 'TEXTAREA':
1091			case 'TITLE':
1092				$found_closer = $this->skip_rcdata( $tag_name );
1093				break;
1094
1095			/*
1096			 * In the browser this list would include the NOSCRIPT element,
1097			 * but the Tag Processor is an environment with the scripting
1098			 * flag disabled, meaning that it needs to descend into the
1099			 * NOSCRIPT element to be able to properly process what will be
1100			 * sent to a browser.
1101			 *
1102			 * Note that this rule makes HTML5 syntax incompatible with XML,
1103			 * because the parsing of this token depends on client application.
1104			 * The NOSCRIPT element cannot be represented in the XHTML syntax.
1105			 */
1106			case 'IFRAME':
1107			case 'NOEMBED':
1108			case 'NOFRAMES':
1109			case 'STYLE':
1110			case 'XMP':
1111				$found_closer = $this->skip_rawtext( $tag_name );
1112				break;
1113
1114			// No other tags should be treated in their entirety here.
1115			default:
1116				return true;
1117		}
1118
1119		if ( ! $found_closer ) {
1120			$this->parser_state         = self::STATE_INCOMPLETE_INPUT;
1121			$this->bytes_already_parsed = $was_at;
1122			return false;
1123		}
1124
1125		/*
1126		 * The values here look like they reference the opening tag but they reference
1127		 * the closing tag instead. This is why the opening tag values were stored
1128		 * above in a variable. It reads confusingly here, but that's because the
1129		 * functions that skip the contents have moved all the internal cursors past
1130		 * the inner content of the tag.
1131		 */
1132		$this->token_starts_at      = $was_at;
1133		$this->token_length         = $this->bytes_already_parsed - $this->token_starts_at;
1134		$this->text_starts_at       = $tag_ends_at;
1135		$this->text_length          = $this->tag_name_starts_at - $this->text_starts_at;
1136		$this->tag_name_starts_at   = $tag_name_starts_at;
1137		$this->tag_name_length      = $tag_name_length;
1138		$this->attributes           = $attributes;
1139		$this->duplicate_attributes = $duplicate_attributes;
1140
1141		return true;
1142	}
1143
1144	/**
1145	 * Whether the processor paused because the input HTML document ended
1146	 * in the middle of a syntax element, such as in the middle of a tag.
1147	 *
1148	 * Example:
1149	 *
1150	 *     $processor = new WP_HTML_Tag_Processor( '<input type="text" value="Th' );
1151	 *     false      === $processor->get_next_tag();
1152	 *     true       === $processor->paused_at_incomplete_token();
1153	 *
1154	 * @since 6.5.0
1155	 *
1156	 * @return bool Whether the parse paused at the start of an incomplete token.
1157	 */
1158	public function paused_at_incomplete_token(): bool {
1159		return self::STATE_INCOMPLETE_INPUT === $this->parser_state;
1160	}
1161
1162	/**
1163	 * Generator for a foreach loop to step through each class name for the matched tag.
1164	 *
1165	 * This generator function is designed to be used inside a "foreach" loop.
1166	 *
1167	 * Example:
1168	 *
1169	 *     $p = new WP_HTML_Tag_Processor( "<div class='free &lt;egg&lt;\tlang-en'>" );
1170	 *     $p->next_tag();
1171	 *     foreach ( $p->class_list() as $class_name ) {
1172	 *         echo "{$class_name} ";
1173	 *     }
1174	 *     // Outputs: "free <egg> lang-en "
1175	 *
1176	 * @since 6.4.0
1177	 */
1178	public function class_list() {
1179		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
1180			return;
1181		}
1182
1183		/** @var string $class contains the string value of the class attribute, with character references decoded. */
1184		$class = $this->get_attribute( 'class' );
1185
1186		if ( ! is_string( $class ) ) {
1187			return;
1188		}
1189
1190		$seen = array();
1191
1192		$is_quirks = self::QUIRKS_MODE === $this->compat_mode;
1193
1194		$at = 0;
1195		while ( $at < strlen( $class ) ) {
1196			// Skip past any initial boundary characters.
1197			$at += strspn( $class, " \t\f\r\n", $at );
1198			if ( $at >= strlen( $class ) ) {
1199				return;
1200			}
1201
1202			// Find the byte length until the next boundary.
1203			$length = strcspn( $class, " \t\f\r\n", $at );
1204			if ( 0 === $length ) {
1205				return;
1206			}
1207
1208			$name = str_replace( "\x00", "\u{FFFD}", substr( $class, $at, $length ) );
1209			if ( $is_quirks ) {
1210				$name = strtolower( $name );
1211			}
1212			$at += $length;
1213
1214			/*
1215			 * It's expected that the number of class names for a given tag is relatively small.
1216			 * Given this, it is probably faster overall to scan an array for a value rather
1217			 * than to use the class name as a key and check if it's a key of $seen.
1218			 */
1219			if ( in_array( $name, $seen, true ) ) {
1220				continue;
1221			}
1222
1223			$seen[] = $name;
1224			yield $name;
1225		}
1226	}
1227
1228
1229	/**
1230	 * Returns if a matched tag contains the given ASCII case-insensitive class name.
1231	 *
1232	 * @since 6.4.0
1233	 *
1234	 * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive.
1235	 * @return bool|null Whether the matched tag contains the given class name, or null if not matched.
1236	 */
1237	public function has_class( $wanted_class ): ?bool {
1238		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
1239			return null;
1240		}
1241
1242		$case_insensitive = self::QUIRKS_MODE === $this->compat_mode;
1243
1244		$wanted_length = strlen( $wanted_class );
1245		foreach ( $this->class_list() as $class_name ) {
1246			if (
1247				strlen( $class_name ) === $wanted_length &&
1248				0 === substr_compare( $class_name, $wanted_class, 0, strlen( $wanted_class ), $case_insensitive )
1249			) {
1250				return true;
1251			}
1252		}
1253
1254		return false;
1255	}
1256
1257
1258	/**
1259	 * Sets a bookmark in the HTML document.
1260	 *
1261	 * Bookmarks represent specific places or tokens in the HTML
1262	 * document, such as a tag opener or closer. When applying
1263	 * edits to a document, such as setting an attribute, the
1264	 * text offsets of that token may shift; the bookmark is
1265	 * kept updated with those shifts and remains stable unless
1266	 * the entire span of text in which the token sits is removed.
1267	 *
1268	 * Release bookmarks when they are no longer needed.
1269	 *
1270	 * Example:
1271	 *
1272	 *     <main><h2>Surprising fact you may not know!</h2></main>
1273	 *           ^  ^
1274	 *            \-|-- this `H2` opener bookmark tracks the token
1275	 *
1276	 *     <main class="clickbait"><h2>Surprising fact you may no…
1277	 *                             ^  ^
1278	 *                              \-|-- it shifts with edits
1279	 *
1280	 * Bookmarks provide the ability to seek to a previously-scanned
1281	 * place in the HTML document. This avoids the need to re-scan
1282	 * the entire document.
1283	 *
1284	 * Example:
1285	 *
1286	 *     <ul><li>One</li><li>Two</li><li>Three</li></ul>
1287	 *                                 ^^^^
1288	 *                                 want to note this last item
1289	 *
1290	 *     $p = new WP_HTML_Tag_Processor( $html );
1291	 *     $in_list = false;
1292	 *     while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) {
1293	 *         if ( 'UL' === $p->get_tag() ) {
1294	 *             if ( $p->is_tag_closer() ) {
1295	 *                 $in_list = false;
1296	 *                 $p->set_bookmark( 'resume' );
1297	 *                 if ( $p->seek( 'last-li' ) ) {
1298	 *                     $p->add_class( 'last-li' );
1299	 *                 }
1300	 *                 $p->seek( 'resume' );
1301	 *                 $p->release_bookmark( 'last-li' );
1302	 *                 $p->release_bookmark( 'resume' );
1303	 *             } else {
1304	 *                 $in_list = true;
1305	 *             }
1306	 *         }
1307	 *
1308	 *         if ( 'LI' === $p->get_tag() ) {
1309	 *             $p->set_bookmark( 'last-li' );
1310	 *         }
1311	 *     }
1312	 *
1313	 * Bookmarks intentionally hide the internal string offsets
1314	 * to which they refer. They are maintained internally as
1315	 * updates are applied to the HTML document and therefore
1316	 * retain their "position" - the location to which they
1317	 * originally pointed. The inability to use bookmarks with
1318	 * functions like `substr` is therefore intentional to guard
1319	 * against accidentally breaking the HTML.
1320	 *
1321	 * Because bookmarks allocate memory and require processing
1322	 * for every applied update, they are limited and require
1323	 * a name. They should not be created with programmatically-made
1324	 * names, such as "li_{$index}" with some loop. As a general
1325	 * rule they should only be created with string-literal names
1326	 * like "start-of-section" or "last-paragraph".
1327	 *
1328	 * Bookmarks are a powerful tool to enable complicated behavior.
1329	 * Consider double-checking that you need this tool if you are
1330	 * reaching for it, as inappropriate use could lead to broken
1331	 * HTML structure or unwanted processing overhead.
1332	 *
1333	 * @since 6.2.0
1334	 *
1335	 * @param string $name Identifies this particular bookmark.
1336	 * @return bool Whether the bookmark was successfully created.
1337	 */
1338	public function set_bookmark( $name ): bool {
1339		// It only makes sense to set a bookmark if the parser has paused on a concrete token.
1340		if (
1341			self::STATE_COMPLETE === $this->parser_state ||
1342			self::STATE_INCOMPLETE_INPUT === $this->parser_state
1343		) {
1344			return false;
1345		}
1346
1347		if ( ! array_key_exists( $name, $this->bookmarks ) && count( $this->bookmarks ) >= static::MAX_BOOKMARKS ) {
1348			_doing_it_wrong(
1349				__METHOD__,
1350				__( 'Too many bookmarks: cannot create any more.' ),
1351				'6.2.0'
1352			);
1353			return false;
1354		}
1355
1356		$this->bookmarks[ $name ] = new WP_HTML_Span( $this->token_starts_at, $this->token_length );
1357
1358		return true;
1359	}
1360
1361
1362	/**
1363	 * Removes a bookmark that is no longer needed.
1364	 *
1365	 * Releasing a bookmark frees up the small
1366	 * performance overhead it requires.
1367	 *
1368	 * @param string $name Name of the bookmark to remove.
1369	 * @return bool Whether the bookmark already existed before removal.
1370	 */
1371	public function release_bookmark( $name ): bool {
1372		if ( ! array_key_exists( $name, $this->bookmarks ) ) {
1373			return false;
1374		}
1375
1376		unset( $this->bookmarks[ $name ] );
1377
1378		return true;
1379	}
1380
1381	/**
1382	 * Skips contents of generic rawtext elements.
1383	 *
1384	 * @since 6.3.2
1385	 *
1386	 * @see https://html.spec.whatwg.org/#generic-raw-text-element-parsing-algorithm
1387	 *
1388	 * @param string $tag_name The uppercase tag name which will close the RAWTEXT region.
1389	 * @return bool Whether an end to the RAWTEXT region was found before the end of the document.
1390	 */
1391	private function skip_rawtext( string $tag_name ): bool {
1392		/*
1393		 * These two functions distinguish themselves on whether character references are
1394		 * decoded, and since functionality to read the inner markup isn't supported, it's
1395		 * not necessary to implement these two functions separately.
1396		 */
1397		return $this->skip_rcdata( $tag_name );
1398	}
1399
1400	/**
1401	 * Skips contents of RCDATA elements, namely title and textarea tags.
1402	 *
1403	 * @since 6.2.0
1404	 *
1405	 * @see https://html.spec.whatwg.org/multipage/parsing.html#rcdata-state
1406	 *
1407	 * @param string $tag_name The uppercase tag name which will close the RCDATA region.
1408	 * @return bool Whether an end to the RCDATA region was found before the end of the document.
1409	 */
1410	private function skip_rcdata( string $tag_name ): bool {
1411		$html       = $this->html;
1412		$doc_length = strlen( $html );
1413		$tag_length = strlen( $tag_name );
1414
1415		$at = $this->bytes_already_parsed;
1416
1417		while ( false !== $at && $at < $doc_length ) {
1418			$at                       = strpos( $this->html, '</', $at );
1419			$this->tag_name_starts_at = $at;
1420
1421			// Fail if there is no possible tag closer.
1422			if ( false === $at || ( $at + $tag_length ) >= $doc_length ) {
1423				return false;
1424			}
1425
1426			$at += 2;
1427
1428			/*
1429			 * Find a case-insensitive match to the tag name.
1430			 *
1431			 * Because tag names are limited to US-ASCII there is no
1432			 * need to perform any kind of Unicode normalization when
1433			 * comparing; any character which could be impacted by such
1434			 * normalization could not be part of a tag name.
1435			 */
1436			for ( $i = 0; $i < $tag_length; $i++ ) {
1437				$tag_char  = $tag_name[ $i ];
1438				$html_char = $html[ $at + $i ];
1439
1440				if ( $html_char !== $tag_char && strtoupper( $html_char ) !== $tag_char ) {
1441					$at += $i;
1442					continue 2;
1443				}
1444			}
1445
1446			$at                        += $tag_length;
1447			$this->bytes_already_parsed = $at;
1448
1449			if ( $at >= strlen( $html ) ) {
1450				return false;
1451			}
1452
1453			/*
1454			 * Ensure that the tag name terminates to avoid matching on
1455			 * substrings of a longer tag name. For example, the sequence
1456			 * "</textarearug" should not match for "</textarea" even
1457			 * though "textarea" is found within the text.
1458			 */
1459			$c = $html[ $at ];
1460			if ( ' ' !== $c && "\t" !== $c && "\r" !== $c && "\n" !== $c && '/' !== $c && '>' !== $c ) {
1461				continue;
1462			}
1463
1464			while ( $this->parse_next_attribute() ) {
1465				continue;
1466			}
1467
1468			$at = $this->bytes_already_parsed;
1469			if ( $at >= strlen( $this->html ) ) {
1470				return false;
1471			}
1472
1473			if ( '>' === $html[ $at ] ) {
1474				$this->bytes_already_parsed = $at + 1;
1475				return true;
1476			}
1477
1478			if ( $at + 1 >= strlen( $this->html ) ) {
1479				return false;
1480			}
1481
1482			if ( '/' === $html[ $at ] && '>' === $html[ $at + 1 ] ) {
1483				$this->bytes_already_parsed = $at + 2;
1484				return true;
1485			}
1486		}
1487
1488		return false;
1489	}
1490
1491	/**
1492	 * Skips contents of script tags.
1493	 *
1494	 * @since 6.2.0
1495	 *
1496	 * @return bool Whether the script tag was closed before the end of the document.
1497	 */
1498	private function skip_script_data(): bool {
1499		$state      = 'unescaped';
1500		$html       = $this->html;
1501		$doc_length = strlen( $html );
1502		$at         = $this->bytes_already_parsed;
1503
1504		while ( false !== $at && $at < $doc_length ) {
1505			$at += strcspn( $html, '-<', $at );
1506
1507			/*
1508			 * Optimization: Terminating a complete script element requires at least eight
1509			 * additional bytes in the document. Some checks below may cause local escaped
1510			 * state transitions when processing shorter strings, but those transitions are
1511			 * irrelevant if the script tag is incomplete and the function must return false.
1512			 *
1513			 * This may need updating if those transitions become significant or exported from
1514			 * this function in some way, such as when building safe methods to embed JavaScript
1515			 * or data inside a SCRIPT element.
1516			 *
1517			 *     $at may be here.
1518			 *        ↓
1519			 *     ...</script>
1520			 *         ╰──┬───╯
1521			 *     $at + 8 additional bytes are required for a non-false return value.
1522			 *
1523			 * This single check eliminates the need to check lengths for the shorter spans:
1524			 *
1525			 *           $at may be here.
1526			 *                  ↓
1527			 *     <script><!-- --></script>
1528			 *                   ├╯
1529			 *             $at + 2 additional characters does not require a length check.
1530			 *
1531			 * The transition from "escaped" to "unescaped" is not relevant if the document ends:
1532			 *
1533			 *           $at may be here.
1534			 *                  ↓
1535			 *     <script><!-- -->[[END-OF-DOCUMENT]]
1536			 *                   ╰──┬───╯
1537			 *             $at + 8 additional bytes is not satisfied, return false.
1538			 */
1539			if ( $at + 8 >= $doc_length ) {
1540				return false;
1541			}
1542
1543			/*
1544			 * For all script states a "-->"  transitions
1545			 * back into the normal unescaped script mode,
1546			 * even if that's the current state.
1547			 */
1548			if (
1549				'-' === $html[ $at ] &&
1550				'-' === $html[ $at + 1 ] &&
1551				'>' === $html[ $at + 2 ]
1552			) {
1553				$at   += 3;
1554				$state = 'unescaped';
1555				continue;
1556			}
1557
1558			/*
1559			 * Everything of interest past here starts with "<".
1560			 * Check this character and advance position regardless.
1561			 */
1562			if ( '<' !== $html[ $at++ ] ) {
1563				continue;
1564			}
1565
1566			/*
1567			 * "<!--" only transitions from _unescaped_ to _escaped_. This byte sequence is only
1568			 * significant in the _unescaped_ state and is ignored in any other state.
1569			 */
1570			if (
1571				'unescaped' === $state &&
1572				'!' === $html[ $at ] &&
1573				'-' === $html[ $at + 1 ] &&
1574				'-' === $html[ $at + 2 ]
1575			) {
1576				$at += 3;
1577
1578				/*
1579				 * The parser is ready to enter the _escaped_ state, but may remain in the
1580				 * _unescaped_ state. This occurs when "<!--" is immediately followed by a
1581				 * sequence of 0 or more "-" followed by ">". This is similar to abruptly closed
1582				 * HTML comments like "<!-->" or "<!--->".
1583				 *
1584				 * Note that this check may advance the position significantly and requires a
1585				 * length check to prevent bad offsets on inputs like `<script><!---------`.
1586				 */
1587				$at += strspn( $html, '-', $at );
1588				if ( $at < $doc_length && '>' === $html[ $at ] ) {
1589					++$at;
1590					continue;
1591				}
1592
1593				$state = 'escaped';
1594				continue;
1595			}
1596
1597			if ( '/' === $html[ $at ] ) {
1598				$closer_potentially_starts_at = $at - 1;
1599				$is_closing                   = true;
1600				++$at;
1601			} else {
1602				$is_closing = false;
1603			}
1604
1605			/*
1606			 * At this point the only remaining state-changes occur with the
1607			 * <script> and </script> tags; unless one of these appears next,
1608			 * proceed scanning to the next potential token in the text.
1609			 */
1610			if ( ! (
1611				( 's' === $html[ $at ] || 'S' === $html[ $at ] ) &&
1612				( 'c' === $html[ $at + 1 ] || 'C' === $html[ $at + 1 ] ) &&
1613				( 'r' === $html[ $at + 2 ] || 'R' === $html[ $at + 2 ] ) &&
1614				( 'i' === $html[ $at + 3 ] || 'I' === $html[ $at + 3 ] ) &&
1615				( 'p' === $html[ $at + 4 ] || 'P' === $html[ $at + 4 ] ) &&
1616				( 't' === $html[ $at + 5 ] || 'T' === $html[ $at + 5 ] )
1617			) ) {
1618				++$at;
1619				continue;
1620			}
1621
1622			/*
1623			 * Ensure that the script tag terminates to avoid matching on
1624			 * substrings of a non-match. For example, the sequence
1625			 * "<script123" should not end a script region even though
1626			 * "<script" is found within the text.
1627			 */
1628			$at += 6;
1629			$c   = $html[ $at ];
1630			if (
1631				/**
1632				 * These characters trigger state transitions of interest:
1633				 *
1634				 * - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-end-tag-name-state}
1635				 * - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-escaped-end-tag-name-state}
1636				 * - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-start-state}
1637				 * - @see {https://html.spec.whatwg.org/multipage/parsing.html#script-data-double-escape-end-state}
1638				 *
1639				 * The "\r" character is not present in the above references. However, "\r" must be
1640				 * treated the same as "\n". This is because the HTML Standard requires newline
1641				 * normalization during preprocessing which applies this replacement.
1642				 *
1643				 * - @see https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
1644				 * - @see https://infra.spec.whatwg.org/#normalize-newlines
1645				 */
1646				'>' !== $c &&
1647				' ' !== $c &&
1648				"\n" !== $c &&
1649				'/' !== $c &&
1650				"\t" !== $c &&
1651				"\f" !== $c &&
1652				"\r" !== $c
1653			) {
1654				continue;
1655			}
1656
1657			if ( 'escaped' === $state && ! $is_closing ) {
1658				$state = 'double-escaped';
1659				continue;
1660			}
1661
1662			if ( 'double-escaped' === $state && $is_closing ) {
1663				$state = 'escaped';
1664				continue;
1665			}
1666
1667			if ( $is_closing ) {
1668				$this->bytes_already_parsed = $closer_potentially_starts_at;
1669				$this->tag_name_starts_at   = $closer_potentially_starts_at;
1670				if ( $this->bytes_already_parsed >= $doc_length ) {
1671					return false;
1672				}
1673
1674				while ( $this->parse_next_attribute() ) {
1675					continue;
1676				}
1677
1678				if ( $this->bytes_already_parsed >= $doc_length ) {
1679					return false;
1680				}
1681
1682				if ( '>' === $html[ $this->bytes_already_parsed ] ) {
1683					++$this->bytes_already_parsed;
1684					return true;
1685				}
1686			}
1687
1688			++$at;
1689		}
1690
1691		return false;
1692	}
1693
1694	/**
1695	 * Parses the next tag.
1696	 *
1697	 * This will find and start parsing the next tag, including
1698	 * the opening `<`, the potential closer `/`, and the tag
1699	 * name. It does not parse the attributes or scan to the
1700	 * closing `>`; these are left for other methods.
1701	 *
1702	 * @since 6.2.0
1703	 * @since 6.2.1 Support abruptly-closed comments, invalid-tag-closer-comments, and empty elements.
1704	 *
1705	 * @return bool Whether a tag was found before the end of the document.
1706	 */
1707	private function parse_next_tag(): bool {
1708		$this->after_tag();
1709
1710		$html       = $this->html;
1711		$doc_length = strlen( $html );
1712		$was_at     = $this->bytes_already_parsed;
1713		$at         = $was_at;
1714
1715		while ( $at < $doc_length ) {
1716			$at = strpos( $html, '<', $at );
1717			if ( false === $at ) {
1718				break;
1719			}
1720
1721			if ( $at > $was_at ) {
1722				/*
1723				 * A "<" normally starts a new HTML tag or syntax token, but in cases where the
1724				 * following character can't produce a valid token, the "<" is instead treated
1725				 * as plaintext and the parser should skip over it. This avoids a problem when
1726				 * following earlier practices of typing emoji with text, e.g. "<3". This
1727				 * should be a heart, not a tag. It's supposed to be rendered, not hidden.
1728				 *
1729				 * At this point the parser checks if this is one of those cases and if it is
1730				 * will continue searching for the next "<" in search of a token boundary.
1731				 *
1732				 * @see https://html.spec.whatwg.org/#tag-open-state
1733				 */
1734				if ( 1 !== strspn( $html, '!/?abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', $at + 1, 1 ) ) {
1735					++$at;
1736					continue;
1737				}
1738
1739				$this->parser_state         = self::STATE_TEXT_NODE;
1740				$this->token_starts_at      = $was_at;
1741				$this->token_length         = $at - $was_at;
1742				$this->text_starts_at       = $was_at;
1743				$this->text_length          = $this->token_length;
1744				$this->bytes_already_parsed = $at;
1745				return true;
1746			}
1747
1748			$this->token_starts_at = $at;
1749
1750			if ( $at + 1 < $doc_length && '/' === $this->html[ $at + 1 ] ) {
1751				$this->is_closing_tag = true;
1752				++$at;
1753			} else {
1754				$this->is_closing_tag = false;
1755			}
1756
1757			/*
1758			 * HTML tag names must start with [a-zA-Z] otherwise they are not tags.
1759			 * For example, "<3" is rendered as text, not a tag opener. If at least
1760			 * one letter follows the "<" then _it is_ a tag, but if the following
1761			 * character is anything else it _is not a tag_.
1762			 *
1763			 * It's not uncommon to find non-tags starting with `<` in an HTML
1764			 * document, so it's good for performance to make this pre-check before
1765			 * continuing to attempt to parse a tag name.
1766			 *
1767			 * Reference:
1768			 * * https://html.spec.whatwg.org/multipage/parsing.html#data-state
1769			 * * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
1770			 */
1771			$tag_name_prefix_length = strspn( $html, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', $at + 1 );
1772			if ( $tag_name_prefix_length > 0 ) {
1773				++$at;
1774				$this->parser_state         = self::STATE_MATCHED_TAG;
1775				$this->tag_name_starts_at   = $at;
1776				$this->tag_name_length      = $tag_name_prefix_length + strcspn( $html, " \t\f\r\n/>", $at + $tag_name_prefix_length );
1777				$this->bytes_already_parsed = $at + $this->tag_name_length;
1778				return true;
1779			}
1780
1781			/*
1782			 * Abort if no tag is found before the end of
1783			 * the document. There is nothing left to parse.
1784			 */
1785			if ( $at + 1 >= $doc_length ) {
1786				$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1787
1788				return false;
1789			}
1790
1791			/*
1792			 * `<!` transitions to markup declaration open state
1793			 * https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state
1794			 */
1795			if ( ! $this->is_closing_tag && '!' === $html[ $at + 1 ] ) {
1796				/*
1797				 * `<!--` transitions to a comment state – apply further comment rules.
1798				 * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
1799				 */
1800				if ( 0 === substr_compare( $html, '--', $at + 2, 2 ) ) {
1801					$closer_at = $at + 4;
1802					// If it's not possible to close the comment then there is nothing more to scan.
1803					if ( $doc_length <= $closer_at ) {
1804						$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1805
1806						return false;
1807					}
1808
1809					// Abruptly-closed empty comments are a sequence of dashes followed by `>`.
1810					$span_of_dashes = strspn( $html, '-', $closer_at );
1811					if ( '>' === $html[ $closer_at + $span_of_dashes ] ) {
1812						/*
1813						 * @todo When implementing `set_modifiable_text()` ensure that updates to this token
1814						 *       don't break the syntax for short comments, e.g. `<!--->`. Unlike other comment
1815						 *       and bogus comment syntax, these leave no clear insertion point for text and
1816						 *       they need to be modified specially in order to contain text. E.g. to store
1817						 *       `?` as the modifiable text, the `<!--->` needs to become `<!--?-->`, which
1818						 *       involves inserting an additional `-` into the token after the modifiable text.
1819						 */
1820						$this->parser_state = self::STATE_COMMENT;
1821						$this->comment_type = self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT;
1822						$this->token_length = $closer_at + $span_of_dashes + 1 - $this->token_starts_at;
1823
1824						// Only provide modifiable text if the token is long enough to contain it.
1825						if ( $span_of_dashes >= 2 ) {
1826							$this->comment_type   = self::COMMENT_AS_HTML_COMMENT;
1827							$this->text_starts_at = $this->token_starts_at + 4;
1828							$this->text_length    = $span_of_dashes - 2;
1829						}
1830
1831						$this->bytes_already_parsed = $closer_at + $span_of_dashes + 1;
1832						return true;
1833					}
1834
1835					/*
1836					 * Comments may be closed by either a --> or an invalid --!>.
1837					 * The first occurrence closes the comment.
1838					 *
1839					 * See https://html.spec.whatwg.org/#parse-error-incorrectly-closed-comment
1840					 */
1841					--$closer_at; // Pre-increment inside condition below reduces risk of accidental infinite looping.
1842					while ( ++$closer_at < $doc_length ) {
1843						$closer_at = strpos( $html, '--', $closer_at );
1844						if ( false === $closer_at ) {
1845							$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1846
1847							return false;
1848						}
1849
1850						if ( $closer_at + 2 < $doc_length && '>' === $html[ $closer_at + 2 ] ) {
1851							$this->parser_state         = self::STATE_COMMENT;
1852							$this->comment_type         = self::COMMENT_AS_HTML_COMMENT;
1853							$this->token_length         = $closer_at + 3 - $this->token_starts_at;
1854							$this->text_starts_at       = $this->token_starts_at + 4;
1855							$this->text_length          = $closer_at - $this->text_starts_at;
1856							$this->bytes_already_parsed = $closer_at + 3;
1857							return true;
1858						}
1859
1860						if (
1861							$closer_at + 3 < $doc_length &&
1862							'!' === $html[ $closer_at + 2 ] &&
1863							'>' === $html[ $closer_at + 3 ]
1864						) {
1865							$this->parser_state         = self::STATE_COMMENT;
1866							$this->comment_type         = self::COMMENT_AS_HTML_COMMENT;
1867							$this->token_length         = $closer_at + 4 - $this->token_starts_at;
1868							$this->text_starts_at       = $this->token_starts_at + 4;
1869							$this->text_length          = $closer_at - $this->text_starts_at;
1870							$this->bytes_already_parsed = $closer_at + 4;
1871							return true;
1872						}
1873					}
1874				}
1875
1876				/*
1877				 * `<!DOCTYPE` transitions to DOCTYPE state – skip to the nearest >
1878				 * These are ASCII-case-insensitive.
1879				 * https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
1880				 */
1881				if (
1882					$doc_length > $at + 8 &&
1883					( 'D' === $html[ $at + 2 ] || 'd' === $html[ $at + 2 ] ) &&
1884					( 'O' === $html[ $at + 3 ] || 'o' === $html[ $at + 3 ] ) &&
1885					( 'C' === $html[ $at + 4 ] || 'c' === $html[ $at + 4 ] ) &&
1886					( 'T' === $html[ $at + 5 ] || 't' === $html[ $at + 5 ] ) &&
1887					( 'Y' === $html[ $at + 6 ] || 'y' === $html[ $at + 6 ] ) &&
1888					( 'P' === $html[ $at + 7 ] || 'p' === $html[ $at + 7 ] ) &&
1889					( 'E' === $html[ $at + 8 ] || 'e' === $html[ $at + 8 ] )
1890				) {
1891					$closer_at = strpos( $html, '>', $at + 9 );
1892					if ( false === $closer_at ) {
1893						$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1894
1895						return false;
1896					}
1897
1898					$this->parser_state         = self::STATE_DOCTYPE;
1899					$this->token_length         = $closer_at + 1 - $this->token_starts_at;
1900					$this->text_starts_at       = $this->token_starts_at + 9;
1901					$this->text_length          = $closer_at - $this->text_starts_at;
1902					$this->bytes_already_parsed = $closer_at + 1;
1903					return true;
1904				}
1905
1906				if (
1907					'html' !== $this->parsing_namespace &&
1908					strlen( $html ) > $at + 8 &&
1909					'[' === $html[ $at + 2 ] &&
1910					'C' === $html[ $at + 3 ] &&
1911					'D' === $html[ $at + 4 ] &&
1912					'A' === $html[ $at + 5 ] &&
1913					'T' === $html[ $at + 6 ] &&
1914					'A' === $html[ $at + 7 ] &&
1915					'[' === $html[ $at + 8 ]
1916				) {
1917					$closer_at = strpos( $html, ']]>', $at + 9 );
1918					if ( false === $closer_at ) {
1919						$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1920
1921						return false;
1922					}
1923
1924					$this->parser_state         = self::STATE_CDATA_NODE;
1925					$this->text_starts_at       = $at + 9;
1926					$this->text_length          = $closer_at - $this->text_starts_at;
1927					$this->token_length         = $closer_at + 3 - $this->token_starts_at;
1928					$this->bytes_already_parsed = $closer_at + 3;
1929					return true;
1930				}
1931
1932				/*
1933				 * Anything else here is an incorrectly-opened comment and transitions
1934				 * to the bogus comment state - skip to the nearest >. If no closer is
1935				 * found then the HTML was truncated inside the markup declaration.
1936				 */
1937				$closer_at = strpos( $html, '>', $at + 1 );
1938				if ( false === $closer_at ) {
1939					$this->parser_state = self::STATE_INCOMPLETE_INPUT;
1940
1941					return false;
1942				}
1943
1944				$this->parser_state         = self::STATE_COMMENT;
1945				$this->comment_type         = self::COMMENT_AS_INVALID_HTML;
1946				$this->token_length         = $closer_at + 1 - $this->token_starts_at;
1947				$this->text_starts_at       = $this->token_starts_at + 2;
1948				$this->text_length          = $closer_at - $this->text_starts_at;
1949				$this->bytes_already_parsed = $closer_at + 1;
1950
1951				/*
1952				 * Identify nodes that would be CDATA if HTML had CDATA sections.
1953				 *
1954				 * This section must occur after identifying the bogus comment end
1955				 * because in an HTML parser it will span to the nearest `>`, even
1956				 * if there's no `]]>` as would be required in an XML document. It
1957				 * is therefore not possible to parse a CDATA section containing
1958				 * a `>` in the HTML syntax.
1959				 *
1960				 * Inside foreign elements there is a discrepancy between browsers
1961				 * and the specification on this.
1962				 *
1963				 * @todo Track whether the Tag Processor is inside a foreign element
1964				 *       and require the proper closing `]]>` in those cases.
1965				 */
1966				if (
1967					$this->token_length >= 10 &&
1968					'[' === $html[ $this->token_starts_at + 2 ] &&
1969					'C' === $html[ $this->token_starts_at + 3 ] &&
1970					'D' === $html[ $this->token_starts_at + 4 ] &&
1971					'A' === $html[ $this->token_starts_at + 5 ] &&
1972					'T' === $html[ $this->token_starts_at + 6 ] &&
1973					'A' === $html[ $this->token_starts_at + 7 ] &&
1974					'[' === $html[ $this->token_starts_at + 8 ] &&
1975					']' === $html[ $closer_at - 1 ] &&
1976					']' === $html[ $closer_at - 2 ]
1977				) {
1978					$this->parser_state    = self::STATE_COMMENT;
1979					$this->comment_type    = self::COMMENT_AS_CDATA_LOOKALIKE;
1980					$this->text_starts_at += 7;
1981					$this->text_length    -= 9;
1982				}
1983
1984				return true;
1985			}
1986
1987			/*
1988			 * </> is a missing end tag name, which is ignored.
1989			 *
1990			 * This was also known as the "presumptuous empty tag"
1991			 * in early discussions as it was proposed to close
1992			 * the nearest previous opening tag.
1993			 *
1994			 * See https://html.spec.whatwg.org/#parse-error-missing-end-tag-name
1995			 */
1996			if ( '>' === $html[ $at + 1 ] ) {
1997				// `<>` is interpreted as plaintext.
1998				if ( ! $this->is_closing_tag ) {
1999					++$at;
2000					continue;
2001				}
2002
2003				$this->parser_state         = self::STATE_PRESUMPTUOUS_TAG;
2004				$this->token_length         = $at + 2 - $this->token_starts_at;
2005				$this->bytes_already_parsed = $at + 2;
2006				return true;
2007			}
2008
2009			/*
2010			 * `<?` transitions to a bogus comment state – skip to the nearest >
2011			 * See https://html.spec.whatwg.org/multipage/parsing.html#tag-open-state
2012			 */
2013			if ( ! $this->is_closing_tag && '?' === $html[ $at + 1 ] ) {
2014				$closer_at = strpos( $html, '>', $at + 2 );
2015				if ( false === $closer_at ) {
2016					$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2017
2018					return false;
2019				}
2020
2021				$this->parser_state         = self::STATE_COMMENT;
2022				$this->comment_type         = self::COMMENT_AS_INVALID_HTML;
2023				$this->token_length         = $closer_at + 1 - $this->token_starts_at;
2024				$this->text_starts_at       = $this->token_starts_at + 2;
2025				$this->text_length          = $closer_at - $this->text_starts_at;
2026				$this->bytes_already_parsed = $closer_at + 1;
2027
2028				/*
2029				 * Identify a Processing Instruction node were HTML to have them.
2030				 *
2031				 * This section must occur after identifying the bogus comment end
2032				 * because in an HTML parser it will span to the nearest `>`, even
2033				 * if there's no `?>` as would be required in an XML document. It
2034				 * is therefore not possible to parse a Processing Instruction node
2035				 * containing a `>` in the HTML syntax.
2036				 *
2037				 * XML allows for more target names, but this code only identifies
2038				 * those with ASCII-representable target names. This means that it
2039				 * may identify some Processing Instruction nodes as bogus comments,
2040				 * but it will not misinterpret the HTML structure. By limiting the
2041				 * identification to these target names the Tag Processor can avoid
2042				 * the need to start parsing UTF-8 sequences.
2043				 *
2044				 * > NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
2045				 *                     [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
2046				 *                     [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
2047				 *                     [#x10000-#xEFFFF]
2048				 * > NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
2049				 *
2050				 * @todo Processing instruction nodes in SGML may contain any kind of markup. XML defines a
2051				 *       special case with `<?xml ... ?>` syntax, but the `?` is part of the bogus comment.
2052				 *
2053				 * @see https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget
2054				 */
2055				if ( $this->token_length >= 5 && '?' === $html[ $closer_at - 1 ] ) {
2056					$comment_text     = substr( $html, $this->token_starts_at + 2, $this->token_length - 4 );
2057					$pi_target_length = strspn( $comment_text, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:_' );
2058
2059					if ( 0 < $pi_target_length ) {
2060						$pi_target_length += strspn( $comment_text, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789:_-.', $pi_target_length );
2061
2062						$this->comment_type       = self::COMMENT_AS_PI_NODE_LOOKALIKE;
2063						$this->tag_name_starts_at = $this->token_starts_at + 2;
2064						$this->tag_name_length    = $pi_target_length;
2065						$this->text_starts_at    += $pi_target_length;
2066						$this->text_length       -= $pi_target_length + 1;
2067					}
2068				}
2069
2070				return true;
2071			}
2072
2073			/*
2074			 * If a non-alpha starts the tag name in a tag closer it's a comment.
2075			 * Find the first `>`, which closes the comment.
2076			 *
2077			 * This parser classifies these particular comments as special "funky comments"
2078			 * which are made available for further processing.
2079			 *
2080			 * See https://html.spec.whatwg.org/#parse-error-invalid-first-character-of-tag-name
2081			 */
2082			if ( $this->is_closing_tag ) {
2083				// No chance of finding a closer.
2084				if ( $at + 3 > $doc_length ) {
2085					$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2086
2087					return false;
2088				}
2089
2090				$closer_at = strpos( $html, '>', $at + 2 );
2091				if ( false === $closer_at ) {
2092					$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2093
2094					return false;
2095				}
2096
2097				$this->parser_state         = self::STATE_FUNKY_COMMENT;
2098				$this->token_length         = $closer_at + 1 - $this->token_starts_at;
2099				$this->text_starts_at       = $this->token_starts_at + 2;
2100				$this->text_length          = $closer_at - $this->text_starts_at;
2101				$this->bytes_already_parsed = $closer_at + 1;
2102				return true;
2103			}
2104
2105			++$at;
2106		}
2107
2108		/*
2109		 * This does not imply an incomplete parse; it indicates that there
2110		 * can be nothing left in the document other than a #text node.
2111		 */
2112		$this->parser_state         = self::STATE_TEXT_NODE;
2113		$this->token_starts_at      = $was_at;
2114		$this->token_length         = $doc_length - $was_at;
2115		$this->text_starts_at       = $was_at;
2116		$this->text_length          = $this->token_length;
2117		$this->bytes_already_parsed = $doc_length;
2118		return true;
2119	}
2120
2121	/**
2122	 * Parses the next attribute.
2123	 *
2124	 * @since 6.2.0
2125	 *
2126	 * @return bool Whether an attribute was found before the end of the document.
2127	 */
2128	private function parse_next_attribute(): bool {
2129		$doc_length = strlen( $this->html );
2130
2131		// Skip whitespace and slashes.
2132		$this->bytes_already_parsed += strspn( $this->html, " \t\f\r\n/", $this->bytes_already_parsed );
2133		if ( $this->bytes_already_parsed >= $doc_length ) {
2134			$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2135
2136			return false;
2137		}
2138
2139		/*
2140		 * Treat the equal sign as a part of the attribute
2141		 * name if it is the first encountered byte.
2142		 *
2143		 * @see https://html.spec.whatwg.org/multipage/parsing.html#before-attribute-name-state
2144		 */
2145		$name_length = '=' === $this->html[ $this->bytes_already_parsed ]
2146			? 1 + strcspn( $this->html, "=/> \t\f\r\n", $this->bytes_already_parsed + 1 )
2147			: strcspn( $this->html, "=/> \t\f\r\n", $this->bytes_already_parsed );
2148
2149		// No attribute, just tag closer.
2150		if ( 0 === $name_length || $this->bytes_already_parsed + $name_length >= $doc_length ) {
2151			return false;
2152		}
2153
2154		$attribute_start             = $this->bytes_already_parsed;
2155		$attribute_name              = substr( $this->html, $attribute_start, $name_length );
2156		$this->bytes_already_parsed += $name_length;
2157		if ( $this->bytes_already_parsed >= $doc_length ) {
2158			$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2159
2160			return false;
2161		}
2162
2163		$this->skip_whitespace();
2164		if ( $this->bytes_already_parsed >= $doc_length ) {
2165			$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2166
2167			return false;
2168		}
2169
2170		$has_value = '=' === $this->html[ $this->bytes_already_parsed ];
2171		if ( $has_value ) {
2172			++$this->bytes_already_parsed;
2173			$this->skip_whitespace();
2174			if ( $this->bytes_already_parsed >= $doc_length ) {
2175				$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2176
2177				return false;
2178			}
2179
2180			switch ( $this->html[ $this->bytes_already_parsed ] ) {
2181				case "'":
2182				case '"':
2183					$quote                      = $this->html[ $this->bytes_already_parsed ];
2184					$value_start                = $this->bytes_already_parsed + 1;
2185					$end_quote_at               = strpos( $this->html, $quote, $value_start );
2186					$end_quote_at               = false === $end_quote_at ? $doc_length : $end_quote_at;
2187					$value_length               = $end_quote_at - $value_start;
2188					$attribute_end              = $end_quote_at + 1;
2189					$this->bytes_already_parsed = $attribute_end;
2190					break;
2191
2192				default:
2193					$value_start                = $this->bytes_already_parsed;
2194					$value_length               = strcspn( $this->html, "> \t\f\r\n", $value_start );
2195					$attribute_end              = $value_start + $value_length;
2196					$this->bytes_already_parsed = $attribute_end;
2197			}
2198		} else {
2199			$value_start   = $this->bytes_already_parsed;
2200			$value_length  = 0;
2201			$attribute_end = $attribute_start + $name_length;
2202		}
2203
2204		if ( $attribute_end >= $doc_length ) {
2205			$this->parser_state = self::STATE_INCOMPLETE_INPUT;
2206
2207			return false;
2208		}
2209
2210		if ( $this->is_closing_tag ) {
2211			return true;
2212		}
2213
2214		/*
2215		 * > There must never be two or more attributes on
2216		 * > the same start tag whose names are an ASCII
2217		 * > case-insensitive match for each other.
2218		 *     - HTML 5 spec
2219		 *
2220		 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
2221		 */
2222		$comparable_name = strtolower( $attribute_name );
2223
2224		// If an attribute is listed many times, only use the first declaration and ignore the rest.
2225		if ( ! isset( $this->attributes[ $comparable_name ] ) ) {
2226			$this->attributes[ $comparable_name ] = new WP_HTML_Attribute_Token(
2227				$attribute_name,
2228				$value_start,
2229				$value_length,
2230				$attribute_start,
2231				$attribute_end - $attribute_start,
2232				! $has_value
2233			);
2234
2235			return true;
2236		}
2237
2238		/*
2239		 * Track the duplicate attributes so if we remove it, all disappear together.
2240		 *
2241		 * While `$this->duplicated_attributes` could always be stored as an `array()`,
2242		 * which would simplify the logic here, storing a `null` and only allocating
2243		 * an array when encountering duplicates avoids needless allocations in the
2244		 * normative case of parsing tags with no duplicate attributes.
2245		 */
2246		$duplicate_span = new WP_HTML_Span( $attribute_start, $attribute_end - $attribute_start );
2247		if ( null === $this->duplicate_attributes ) {
2248			$this->duplicate_attributes = array( $comparable_name => array( $duplicate_span ) );
2249		} elseif ( ! isset( $this->duplicate_attributes[ $comparable_name ] ) ) {
2250			$this->duplicate_attributes[ $comparable_name ] = array( $duplicate_span );
2251		} else {
2252			$this->duplicate_attributes[ $comparable_name ][] = $duplicate_span;
2253		}
2254
2255		return true;
2256	}
2257
2258	/**
2259	 * Move the internal cursor past any immediate successive whitespace.
2260	 *
2261	 * @since 6.2.0
2262	 */
2263	private function skip_whitespace(): void {
2264		$this->bytes_already_parsed += strspn( $this->html, " \t\f\r\n", $this->bytes_already_parsed );
2265	}
2266
2267	/**
2268	 * Applies attribute updates and cleans up once a tag is fully parsed.
2269	 *
2270	 * @since 6.2.0
2271	 */
2272	private function after_tag(): void {
2273		/*
2274		 * There could be lexical updates enqueued for an attribute that
2275		 * also exists on the next tag. In order to avoid conflating the
2276		 * attributes across the two tags, lexical updates with names
2277		 * need to be flushed to raw lexical updates.
2278		 */
2279		$this->class_name_updates_to_attributes_updates();
2280
2281		/*
2282		 * Purge updates if there are too many. The actual count isn't
2283		 * scientific, but a few values from 100 to a few thousand were
2284		 * tests to find a practically-useful limit.
2285		 *
2286		 * If the update queue grows too big, then the Tag Processor
2287		 * will spend more time iterating through them and lose the
2288		 * efficiency gains of deferring applying them.
2289		 */
2290		if ( 1000 < count( $this->lexical_updates ) ) {
2291			$this->get_updated_html();
2292		}
2293
2294		foreach ( $this->lexical_updates as $name => $update ) {
2295			/*
2296			 * Any updates appearing after the cursor should be applied
2297			 * before proceeding, otherwise they may be overlooked.
2298			 */
2299			if ( $update->start >= $this->bytes_already_parsed ) {
2300				$this->get_updated_html();
2301				break;
2302			}
2303
2304			if ( is_int( $name ) ) {
2305				continue;
2306			}
2307
2308			$this->lexical_updates[] = $update;
2309			unset( $this->lexical_updates[ $name ] );
2310		}
2311
2312		$this->token_starts_at          = null;
2313		$this->token_length             = null;
2314		$this->tag_name_starts_at       = null;
2315		$this->tag_name_length          = null;
2316		$this->text_starts_at           = 0;
2317		$this->text_length              = 0;
2318		$this->is_closing_tag           = null;
2319		$this->attributes               = array();
2320		$this->comment_type             = null;
2321		$this->text_node_classification = self::TEXT_IS_GENERIC;
2322		$this->duplicate_attributes     = null;
2323	}
2324
2325	/**
2326	 * Converts class name updates into tag attributes updates
2327	 * (they are accumulated in different data formats for performance).
2328	 *
2329	 * @since 6.2.0
2330	 *
2331	 * @see WP_HTML_Tag_Processor::$lexical_updates
2332	 * @see WP_HTML_Tag_Processor::$classname_updates
2333	 */
2334	private function class_name_updates_to_attributes_updates(): void {
2335		if ( count( $this->classname_updates ) === 0 ) {
2336			return;
2337		}
2338
2339		$existing_class = $this->get_enqueued_attribute_value( 'class' );
2340		if ( null === $existing_class || true === $existing_class ) {
2341			$existing_class = '';
2342		}
2343
2344		if ( false === $existing_class && isset( $this->attributes['class'] ) ) {
2345			$existing_class = WP_HTML_Decoder::decode_attribute(
2346				substr(
2347					$this->html,
2348					$this->attributes['class']->value_starts_at,
2349					$this->attributes['class']->value_length
2350				)
2351			);
2352		}
2353
2354		if ( false === $existing_class ) {
2355			$existing_class = '';
2356		}
2357
2358		/**
2359		 * Updated "class" attribute value.
2360		 *
2361		 * This is incrementally built while scanning through the existing class
2362		 * attribute, skipping removed classes on the way, and then appending
2363		 * added classes at the end. Only when finished processing will the
2364		 * value contain the final new value.
2365
2366		 * @var string $class
2367		 */
2368		$class = '';
2369
2370		/**
2371		 * Tracks the cursor position in the existing
2372		 * class attribute value while parsing.
2373		 *
2374		 * @var int $at
2375		 */
2376		$at = 0;
2377
2378		/**
2379		 * Indicates if there's any need to modify the existing class attribute.
2380		 *
2381		 * If a call to `add_class()` and `remove_class()` wouldn't impact
2382		 * the `class` attribute value then there's no need to rebuild it.
2383		 * For example, when adding a class that's already present or
2384		 * removing one that isn't.
2385		 *
2386		 * This flag enables a performance optimization when none of the enqueued
2387		 * class updates would impact the `class` attribute; namely, that the
2388		 * processor can continue without modifying the input document, as if
2389		 * none of the `add_class()` or `remove_class()` calls had been made.
2390		 *
2391		 * This flag is set upon the first change that requires a string update.
2392		 *
2393		 * @var bool $modified
2394		 */
2395		$modified = false;
2396
2397		$seen      = array();
2398		$to_remove = array();
2399		$is_quirks = self::QUIRKS_MODE === $this->compat_mode;
2400		if ( $is_quirks ) {
2401			foreach ( $this->classname_updates as $updated_name => $action ) {
2402				if ( self::REMOVE_CLASS === $action ) {
2403					$to_remove[] = strtolower( $updated_name );
2404				}
2405			}
2406		} else {
2407			foreach ( $this->classname_updates as $updated_name => $action ) {
2408				if ( self::REMOVE_CLASS === $action ) {
2409					$to_remove[] = $updated_name;
2410				}
2411			}
2412		}
2413
2414		// Remove unwanted classes by only copying the new ones.
2415		$existing_class_length = strlen( $existing_class );
2416		while ( $at < $existing_class_length ) {
2417			// Skip to the first non-whitespace character.
2418			$ws_at     = $at;
2419			$ws_length = strspn( $existing_class, " \t\f\r\n", $ws_at );
2420			$at       += $ws_length;
2421
2422			// Capture the class name – it's everything until the next whitespace.
2423			$name_length = strcspn( $existing_class, " \t\f\r\n", $at );
2424			if ( 0 === $name_length ) {
2425				// If no more class names are found then that's the end.
2426				break;
2427			}
2428
2429			$name                  = substr( $existing_class, $at, $name_length );
2430			$comparable_class_name = $is_quirks ? strtolower( $name ) : $name;
2431			$at                   += $name_length;
2432
2433			// If this class is marked for removal, remove it and move on to the next one.
2434			if ( in_array( $comparable_class_name, $to_remove, true ) ) {
2435				$modified = true;
2436				continue;
2437			}
2438
2439			// If a class has already been seen then skip it; it should not be added twice.
2440			if ( in_array( $comparable_class_name, $seen, true ) ) {
2441				continue;
2442			}
2443
2444			$seen[] = $comparable_class_name;
2445
2446			/*
2447			 * Otherwise, append it to the new "class" attribute value.
2448			 *
2449			 * There are options for handling whitespace between tags.
2450			 * Preserving the existing whitespace produces fewer changes
2451			 * to the HTML content and should clarify the before/after
2452			 * content when debugging the modified output.
2453			 *
2454			 * This approach contrasts normalizing the inter-class
2455			 * whitespace to a single space, which might appear cleaner
2456			 * in the output HTML but produce a noisier change.
2457			 */
2458			if ( '' !== $class ) {
2459				$class .= substr( $existing_class, $ws_at, $ws_length );
2460			}
2461			$class .= $name;
2462		}
2463
2464		// Add new classes by appending those which haven't already been seen.
2465		foreach ( $this->classname_updates as $name => $operation ) {
2466			$comparable_name = $is_quirks ? strtolower( $name ) : $name;
2467			if ( self::ADD_CLASS === $operation && ! in_array( $comparable_name, $seen, true ) ) {
2468				$modified = true;
2469
2470				$class .= strlen( $class ) > 0 ? ' ' : '';
2471				$class .= $name;
2472			}
2473		}
2474
2475		$this->classname_updates = array();
2476		if ( ! $modified ) {
2477			return;
2478		}
2479
2480		if ( strlen( $class ) > 0 ) {
2481			$this->set_attribute( 'class', $class );
2482		} else {
2483			$this->remove_attribute( 'class' );
2484		}
2485	}
2486
2487	/**
2488	 * Applies attribute updates to HTML document.
2489	 *
2490	 * @since 6.2.0
2491	 * @since 6.2.1 Accumulates shift for internal cursor and passed pointer.
2492	 * @since 6.3.0 Invalidate any bookmarks whose targets are overwritten.
2493	 *
2494	 * @param int $shift_this_point Accumulate and return shift for this position.
2495	 * @return int How many bytes the given pointer moved in response to the updates.
2496	 */
2497	private function apply_attributes_updates( int $shift_this_point ): int {
2498		if ( ! count( $this->lexical_updates ) ) {
2499			return 0;
2500		}
2501
2502		$accumulated_shift_for_given_point = 0;
2503
2504		/*
2505		 * Attribute updates can be enqueued in any order but updates
2506		 * to the document must occur in lexical order; that is, each
2507		 * replacement must be made before all others which follow it
2508		 * at later string indices in the input document.
2509		 *
2510		 * Sorting avoid making out-of-order replacements which
2511		 * can lead to mangled output, partially-duplicated
2512		 * attributes, and overwritten attributes.
2513		 */
2514		usort( $this->lexical_updates, array( self::class, 'sort_start_ascending' ) );
2515
2516		$bytes_already_copied = 0;
2517		$output_buffer        = '';
2518		foreach ( $this->lexical_updates as $diff ) {
2519			$shift = strlen( $diff->text ) - $diff->length;
2520
2521			// Adjust the cursor position by however much an update affects it.
2522			if ( $diff->start < $this->bytes_already_parsed ) {
2523				$this->bytes_already_parsed += $shift;
2524			}
2525
2526			// Accumulate shift of the given pointer within this function call.
2527			if ( $diff->start < $shift_this_point ) {
2528				$accumulated_shift_for_given_point += $shift;
2529			}
2530
2531			$output_buffer       .= substr( $this->html, $bytes_already_copied, $diff->start - $bytes_already_copied );
2532			$output_buffer       .= $diff->text;
2533			$bytes_already_copied = $diff->start + $diff->length;
2534		}
2535
2536		$this->html = $output_buffer . substr( $this->html, $bytes_already_copied );
2537
2538		/*
2539		 * Adjust bookmark locations to account for how the text
2540		 * replacements adjust offsets in the input document.
2541		 */
2542		foreach ( $this->bookmarks as $bookmark_name => $bookmark ) {
2543			$bookmark_end = $bookmark->start + $bookmark->length;
2544
2545			/*
2546			 * Each lexical update which appears before the bookmark's endpoints
2547			 * might shift the offsets for those endpoints. Loop through each change
2548			 * and accumulate the total shift for each bookmark, then apply that
2549			 * shift after tallying the full delta.
2550			 */
2551			$head_delta = 0;
2552			$tail_delta = 0;
2553
2554			foreach ( $this->lexical_updates as $diff ) {
2555				$diff_end = $diff->start + $diff->length;
2556
2557				if ( $bookmark->start < $diff->start && $bookmark_end < $diff->start ) {
2558					break;
2559				}
2560
2561				if ( $bookmark->start >= $diff->start && $bookmark_end < $diff_end ) {
2562					$this->release_bookmark( $bookmark_name );
2563					continue 2;
2564				}
2565
2566				$delta = strlen( $diff->text ) - $diff->length;
2567
2568				if ( $bookmark->start >= $diff->start ) {
2569					$head_delta += $delta;
2570				}
2571
2572				if ( $bookmark_end >= $diff_end ) {
2573					$tail_delta += $delta;
2574				}
2575			}
2576
2577			$bookmark->start  += $head_delta;
2578			$bookmark->length += $tail_delta - $head_delta;
2579		}
2580
2581		$this->lexical_updates = array();
2582
2583		return $accumulated_shift_for_given_point;
2584	}
2585
2586	/**
2587	 * Checks whether a bookmark with the given name exists.
2588	 *
2589	 * @since 6.3.0
2590	 *
2591	 * @param string $bookmark_name Name to identify a bookmark that potentially exists.
2592	 * @return bool Whether that bookmark exists.
2593	 */
2594	public function has_bookmark( $bookmark_name ): bool {
2595		return array_key_exists( $bookmark_name, $this->bookmarks );
2596	}
2597
2598	/**
2599	 * Move the internal cursor in the Tag Processor to a given bookmark's location.
2600	 *
2601	 * In order to prevent accidental infinite loops, there's a
2602	 * maximum limit on the number of times seek() can be called.
2603	 *
2604	 * @since 6.2.0
2605	 *
2606	 * @param string $bookmark_name Jump to the place in the document identified by this bookmark name.
2607	 * @return bool Whether the internal cursor was successfully moved to the bookmark's location.
2608	 */
2609	public function seek( $bookmark_name ): bool {
2610		if ( ! array_key_exists( $bookmark_name, $this->bookmarks ) ) {
2611			_doing_it_wrong(
2612				__METHOD__,
2613				__( 'Unknown bookmark name.' ),
2614				'6.2.0'
2615			);
2616			return false;
2617		}
2618
2619		$existing_bookmark = $this->bookmarks[ $bookmark_name ];
2620
2621		if (
2622			$this->token_starts_at === $existing_bookmark->start &&
2623			$this->token_length === $existing_bookmark->length
2624		) {
2625			return true;
2626		}
2627
2628		if ( ++$this->seek_count > static::MAX_SEEK_OPS ) {
2629			_doing_it_wrong(
2630				__METHOD__,
2631				__( 'Too many calls to seek() - this can lead to performance issues.' ),
2632				'6.2.0'
2633			);
2634			return false;
2635		}
2636
2637		// Flush out any pending updates to the document.
2638		$this->get_updated_html();
2639
2640		// Point this tag processor before the sought tag opener and consume it.
2641		$this->bytes_already_parsed = $this->bookmarks[ $bookmark_name ]->start;
2642		$this->parser_state         = self::STATE_READY;
2643		return $this->next_token();
2644	}
2645
2646	/**
2647	 * Compare two WP_HTML_Text_Replacement objects.
2648	 *
2649	 * @since 6.2.0
2650	 *
2651	 * @param WP_HTML_Text_Replacement $a First attribute update.
2652	 * @param WP_HTML_Text_Replacement $b Second attribute update.
2653	 * @return int Comparison value for string order.
2654	 */
2655	private static function sort_start_ascending( WP_HTML_Text_Replacement $a, WP_HTML_Text_Replacement $b ): int {
2656		$by_start = $a->start - $b->start;
2657		if ( 0 !== $by_start ) {
2658			return $by_start;
2659		}
2660
2661		$by_text = isset( $a->text, $b->text ) ? strcmp( $a->text, $b->text ) : 0;
2662		if ( 0 !== $by_text ) {
2663			return $by_text;
2664		}
2665
2666		/*
2667		 * This code should be unreachable, because it implies the two replacements
2668		 * start at the same location and contain the same text.
2669		 */
2670		return $a->length - $b->length;
2671	}
2672
2673	/**
2674	 * Return the enqueued value for a given attribute, if one exists.
2675	 *
2676	 * Enqueued updates can take different data types:
2677	 *  - If an update is enqueued and is boolean, the return will be `true`
2678	 *  - If an update is otherwise enqueued, the return will be the string value of that update.
2679	 *  - If an attribute is enqueued to be removed, the return will be `null` to indicate that.
2680	 *  - If no updates are enqueued, the return will be `false` to differentiate from "removed."
2681	 *
2682	 * @since 6.2.0
2683	 *
2684	 * @param string $comparable_name The attribute name in its comparable form.
2685	 * @return string|boolean|null Value of enqueued update if present, otherwise false.
2686	 */
2687	private function get_enqueued_attribute_value( string $comparable_name ) {
2688		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
2689			return false;
2690		}
2691
2692		if ( ! isset( $this->lexical_updates[ $comparable_name ] ) ) {
2693			return false;
2694		}
2695
2696		$enqueued_text = $this->lexical_updates[ $comparable_name ]->text;
2697
2698		// Removed attributes erase the entire span.
2699		if ( '' === $enqueued_text ) {
2700			return null;
2701		}
2702
2703		/*
2704		 * Boolean attribute updates are just the attribute name without a corresponding value.
2705		 *
2706		 * This value might differ from the given comparable name in that there could be leading
2707		 * or trailing whitespace, and that the casing follows the name given in `set_attribute`.
2708		 *
2709		 * Example:
2710		 *
2711		 *     $p->set_attribute( 'data-TEST-id', 'update' );
2712		 *     'update' === $p->get_enqueued_attribute_value( 'data-test-id' );
2713		 *
2714		 * Detect this difference based on the absence of the `=`, which _must_ exist in any
2715		 * attribute containing a value, e.g. `<input type="text" enabled />`.
2716		 *                                            ¹           ²
2717		 *                                       1. Attribute with a string value.
2718		 *                                       2. Boolean attribute whose value is `true`.
2719		 */
2720		$equals_at = strpos( $enqueued_text, '=' );
2721		if ( false === $equals_at ) {
2722			return true;
2723		}
2724
2725		/*
2726		 * Finally, a normal update's value will appear after the `=` and
2727		 * be double-quoted, as performed incidentally by `set_attribute`.
2728		 *
2729		 * e.g. `type="text"`
2730		 *           ¹²    ³
2731		 *        1. Equals is here.
2732		 *        2. Double-quoting starts one after the equals sign.
2733		 *        3. Double-quoting ends at the last character in the update.
2734		 */
2735		$enqueued_value = substr( $enqueued_text, $equals_at + 2, -1 );
2736		return WP_HTML_Decoder::decode_attribute( $enqueued_value );
2737	}
2738
2739	/**
2740	 * Returns the value of a requested attribute from a matched tag opener if that attribute exists.
2741	 *
2742	 * Example:
2743	 *
2744	 *     $p = new WP_HTML_Tag_Processor( '<div enabled class="test" data-test-id="14">Test</div>' );
2745	 *     $p->next_tag( array( 'class_name' => 'test' ) ) === true;
2746	 *     $p->get_attribute( 'data-test-id' ) === '14';
2747	 *     $p->get_attribute( 'enabled' ) === true;
2748	 *     $p->get_attribute( 'aria-label' ) === null;
2749	 *
2750	 *     $p->next_tag() === false;
2751	 *     $p->get_attribute( 'class' ) === null;
2752	 *
2753	 * @since 6.2.0
2754	 *
2755	 * @param string $name Name of attribute whose value is requested.
2756	 * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`.
2757	 */
2758	public function get_attribute( $name ) {
2759		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
2760			return null;
2761		}
2762
2763		$comparable = strtolower( $name );
2764
2765		/*
2766		 * For every attribute other than `class` it's possible to perform a quick check if
2767		 * there's an enqueued lexical update whose value takes priority over what's found in
2768		 * the input document.
2769		 *
2770		 * The `class` attribute is special though because of the exposed helpers `add_class`
2771		 * and `remove_class`. These form a builder for the `class` attribute, so an additional
2772		 * check for enqueued class changes is required in addition to the check for any enqueued
2773		 * attribute values. If any exist, those enqueued class changes must first be flushed out
2774		 * into an attribute value update.
2775		 */
2776		if ( 'class' === $name ) {
2777			$this->class_name_updates_to_attributes_updates();
2778		}
2779
2780		// Return any enqueued attribute value updates if they exist.
2781		$enqueued_value = $this->get_enqueued_attribute_value( $comparable );
2782		if ( false !== $enqueued_value ) {
2783			return $enqueued_value;
2784		}
2785
2786		if ( ! isset( $this->attributes[ $comparable ] ) ) {
2787			return null;
2788		}
2789
2790		$attribute = $this->attributes[ $comparable ];
2791
2792		/*
2793		 * This flag distinguishes an attribute with no value
2794		 * from an attribute with an empty string value. For
2795		 * unquoted attributes this could look very similar.
2796		 * It refers to whether an `=` follows the name.
2797		 *
2798		 * e.g. <div boolean-attribute empty-attribute=></div>
2799		 *           ¹                 ²
2800		 *        1. Attribute `boolean-attribute` is `true`.
2801		 *        2. Attribute `empty-attribute` is `""`.
2802		 */
2803		if ( true === $attribute->is_true ) {
2804			return true;
2805		}
2806
2807		$raw_value = substr( $this->html, $attribute->value_starts_at, $attribute->value_length );
2808
2809		return WP_HTML_Decoder::decode_attribute( $raw_value );
2810	}
2811
2812	/**
2813	 * Gets lowercase names of all attributes matching a given prefix in the current tag.
2814	 *
2815	 * Note that matching is case-insensitive. This is in accordance with the spec:
2816	 *
2817	 * > There must never be two or more attributes on
2818	 * > the same start tag whose names are an ASCII
2819	 * > case-insensitive match for each other.
2820	 *     - HTML 5 spec
2821	 *
2822	 * Example:
2823	 *
2824	 *     $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' );
2825	 *     $p->next_tag( array( 'class_name' => 'test' ) ) === true;
2826	 *     $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' );
2827	 *
2828	 *     $p->next_tag() === false;
2829	 *     $p->get_attribute_names_with_prefix( 'data-' ) === null;
2830	 *
2831	 * @since 6.2.0
2832	 *
2833	 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
2834	 *
2835	 * @param string $prefix Prefix of requested attribute names.
2836	 * @return array|null List of attribute names, or `null` when no tag opener is matched.
2837	 */
2838	public function get_attribute_names_with_prefix( $prefix ): ?array {
2839		if (
2840			self::STATE_MATCHED_TAG !== $this->parser_state ||
2841			$this->is_closing_tag
2842		) {
2843			return null;
2844		}
2845
2846		$comparable = strtolower( $prefix );
2847
2848		$matches = array();
2849		foreach ( array_keys( $this->attributes ) as $attr_name ) {
2850			if ( str_starts_with( $attr_name, $comparable ) ) {
2851				$matches[] = $attr_name;
2852			}
2853		}
2854		return $matches;
2855	}
2856
2857	/**
2858	 * Returns the namespace of the matched token.
2859	 *
2860	 * @since 6.7.0
2861	 *
2862	 * @return string One of 'html', 'math', or 'svg'.
2863	 */
2864	public function get_namespace(): string {
2865		return $this->parsing_namespace;
2866	}
2867
2868	/**
2869	 * Returns the uppercase name of the matched tag.
2870	 *
2871	 * Example:
2872	 *
2873	 *     $p = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' );
2874	 *     $p->next_tag() === true;
2875	 *     $p->get_tag() === 'DIV';
2876	 *
2877	 *     $p->next_tag() === false;
2878	 *     $p->get_tag() === null;
2879	 *
2880	 * @since 6.2.0
2881	 *
2882	 * @return string|null Name of currently matched tag in input HTML, or `null` if none found.
2883	 */
2884	public function get_tag(): ?string {
2885		if ( null === $this->tag_name_starts_at ) {
2886			return null;
2887		}
2888
2889		$tag_name = substr( $this->html, $this->tag_name_starts_at, $this->tag_name_length );
2890
2891		if ( self::STATE_MATCHED_TAG === $this->parser_state ) {
2892			return strtoupper( $tag_name );
2893		}
2894
2895		if (
2896			self::STATE_COMMENT === $this->parser_state &&
2897			self::COMMENT_AS_PI_NODE_LOOKALIKE === $this->get_comment_type()
2898		) {
2899			return $tag_name;
2900		}
2901
2902		return null;
2903	}
2904
2905	/**
2906	 * Returns the adjusted tag name for a given token, taking into
2907	 * account the current parsing context, whether HTML, SVG, or MathML.
2908	 *
2909	 * @since 6.7.0
2910	 *
2911	 * @return string|null Name of current tag name.
2912	 */
2913	public function get_qualified_tag_name(): ?string {
2914		$tag_name = $this->get_tag();
2915		if ( null === $tag_name ) {
2916			return null;
2917		}
2918
2919		if ( 'html' === $this->get_namespace() ) {
2920			return $tag_name;
2921		}
2922
2923		$lower_tag_name = strtolower( $tag_name );
2924		if ( 'math' === $this->get_namespace() ) {
2925			return $lower_tag_name;
2926		}
2927
2928		if ( 'svg' === $this->get_namespace() ) {
2929			switch ( $lower_tag_name ) {
2930				case 'altglyph':
2931					return 'altGlyph';
2932
2933				case 'altglyphdef':
2934					return 'altGlyphDef';
2935
2936				case 'altglyphitem':
2937					return 'altGlyphItem';
2938
2939				case 'animatecolor':
2940					return 'animateColor';
2941
2942				case 'animatemotion':
2943					return 'animateMotion';
2944
2945				case 'animatetransform':
2946					return 'animateTransform';
2947
2948				case 'clippath':
2949					return 'clipPath';
2950
2951				case 'feblend':
2952					return 'feBlend';
2953
2954				case 'fecolormatrix':
2955					return 'feColorMatrix';
2956
2957				case 'fecomponenttransfer':
2958					return 'feComponentTransfer';
2959
2960				case 'fecomposite':
2961					return 'feComposite';
2962
2963				case 'feconvolvematrix':
2964					return 'feConvolveMatrix';
2965
2966				case 'fediffuselighting':
2967					return 'feDiffuseLighting';
2968
2969				case 'fedisplacementmap':
2970					return 'feDisplacementMap';
2971
2972				case 'fedistantlight':
2973					return 'feDistantLight';
2974
2975				case 'fedropshadow':
2976					return 'feDropShadow';
2977
2978				case 'feflood':
2979					return 'feFlood';
2980
2981				case 'fefunca':
2982					return 'feFuncA';
2983
2984				case 'fefuncb':
2985					return 'feFuncB';
2986
2987				case 'fefuncg':
2988					return 'feFuncG';
2989
2990				case 'fefuncr':
2991					return 'feFuncR';
2992
2993				case 'fegaussianblur':
2994					return 'feGaussianBlur';
2995
2996				case 'feimage':
2997					return 'feImage';
2998
2999				case 'femerge':
3000					return 'feMerge';
3001
3002				case 'femergenode':
3003					return 'feMergeNode';
3004
3005				case 'femorphology':
3006					return 'feMorphology';
3007
3008				case 'feoffset':
3009					return 'feOffset';
3010
3011				case 'fepointlight':
3012					return 'fePointLight';
3013
3014				case 'fespecularlighting':
3015					return 'feSpecularLighting';
3016
3017				case 'fespotlight':
3018					return 'feSpotLight';
3019
3020				case 'fetile':
3021					return 'feTile';
3022
3023				case 'feturbulence':
3024					return 'feTurbulence';
3025
3026				case 'foreignobject':
3027					return 'foreignObject';
3028
3029				case 'glyphref':
3030					return 'glyphRef';
3031
3032				case 'lineargradient':
3033					return 'linearGradient';
3034
3035				case 'radialgradient':
3036					return 'radialGradient';
3037
3038				case 'textpath':
3039					return 'textPath';
3040
3041				default:
3042					return $lower_tag_name;
3043			}
3044		}
3045
3046		// This unnecessary return prevents tools from inaccurately reporting type errors.
3047		return $tag_name;
3048	}
3049
3050	/**
3051	 * Returns the adjusted attribute name for a given attribute, taking into
3052	 * account the current parsing context, whether HTML, SVG, or MathML.
3053	 *
3054	 * @since 6.7.0
3055	 *
3056	 * @param string $attribute_name Which attribute to adjust.
3057	 *
3058	 * @return string|null
3059	 */
3060	public function get_qualified_attribute_name( $attribute_name ): ?string {
3061		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
3062			return null;
3063		}
3064
3065		$namespace  = $this->get_namespace();
3066		$lower_name = strtolower( $attribute_name );
3067
3068		if ( 'math' === $namespace && 'definitionurl' === $lower_name ) {
3069			return 'definitionURL';
3070		}
3071
3072		if ( 'svg' === $this->get_namespace() ) {
3073			switch ( $lower_name ) {
3074				case 'attributename':
3075					return 'attributeName';
3076
3077				case 'attributetype':
3078					return 'attributeType';
3079
3080				case 'basefrequency':
3081					return 'baseFrequency';
3082
3083				case 'baseprofile':
3084					return 'baseProfile';
3085
3086				case 'calcmode':
3087					return 'calcMode';
3088
3089				case 'clippathunits':
3090					return 'clipPathUnits';
3091
3092				case 'diffuseconstant':
3093					return 'diffuseConstant';
3094
3095				case 'edgemode':
3096					return 'edgeMode';
3097
3098				case 'filterunits':
3099					return 'filterUnits';
3100
3101				case 'glyphref':
3102					return 'glyphRef';
3103
3104				case 'gradienttransform':
3105					return 'gradientTransform';
3106
3107				case 'gradientunits':
3108					return 'gradientUnits';
3109
3110				case 'kernelmatrix':
3111					return 'kernelMatrix';
3112
3113				case 'kernelunitlength':
3114					return 'kernelUnitLength';
3115
3116				case 'keypoints':
3117					return 'keyPoints';
3118
3119				case 'keysplines':
3120					return 'keySplines';
3121
3122				case 'keytimes':
3123					return 'keyTimes';
3124
3125				case 'lengthadjust':
3126					return 'lengthAdjust';
3127
3128				case 'limitingconeangle':
3129					return 'limitingConeAngle';
3130
3131				case 'markerheight':
3132					return 'markerHeight';
3133
3134				case 'markerunits':
3135					return 'markerUnits';
3136
3137				case 'markerwidth':
3138					return 'markerWidth';
3139
3140				case 'maskcontentunits':
3141					return 'maskContentUnits';
3142
3143				case 'maskunits':
3144					return 'maskUnits';
3145
3146				case 'numoctaves':
3147					return 'numOctaves';
3148
3149				case 'pathlength':
3150					return 'pathLength';
3151
3152				case 'patterncontentunits':
3153					return 'patternContentUnits';
3154
3155				case 'patterntransform':
3156					return 'patternTransform';
3157
3158				case 'patternunits':
3159					return 'patternUnits';
3160
3161				case 'pointsatx':
3162					return 'pointsAtX';
3163
3164				case 'pointsaty':
3165					return 'pointsAtY';
3166
3167				case 'pointsatz':
3168					return 'pointsAtZ';
3169
3170				case 'preservealpha':
3171					return 'preserveAlpha';
3172
3173				case 'preserveaspectratio':
3174					return 'preserveAspectRatio';
3175
3176				case 'primitiveunits':
3177					return 'primitiveUnits';
3178
3179				case 'refx':
3180					return 'refX';
3181
3182				case 'refy':
3183					return 'refY';
3184
3185				case 'repeatcount':
3186					return 'repeatCount';
3187
3188				case 'repeatdur':
3189					return 'repeatDur';
3190
3191				case 'requiredextensions':
3192					return 'requiredExtensions';
3193
3194				case 'requiredfeatures':
3195					return 'requiredFeatures';
3196
3197				case 'specularconstant':
3198					return 'specularConstant';
3199
3200				case 'specularexponent':
3201					return 'specularExponent';
3202
3203				case 'spreadmethod':
3204					return 'spreadMethod';
3205
3206				case 'startoffset':
3207					return 'startOffset';
3208
3209				case 'stddeviation':
3210					return 'stdDeviation';
3211
3212				case 'stitchtiles':
3213					return 'stitchTiles';
3214
3215				case 'surfacescale':
3216					return 'surfaceScale';
3217
3218				case 'systemlanguage':
3219					return 'systemLanguage';
3220
3221				case 'tablevalues':
3222					return 'tableValues';
3223
3224				case 'targetx':
3225					return 'targetX';
3226
3227				case 'targety':
3228					return 'targetY';
3229
3230				case 'textlength':
3231					return 'textLength';
3232
3233				case 'viewbox':
3234					return 'viewBox';
3235
3236				case 'viewtarget':
3237					return 'viewTarget';
3238
3239				case 'xchannelselector':
3240					return 'xChannelSelector';
3241
3242				case 'ychannelselector':
3243					return 'yChannelSelector';
3244
3245				case 'zoomandpan':
3246					return 'zoomAndPan';
3247			}
3248		}
3249
3250		if ( 'html' !== $namespace ) {
3251			switch ( $lower_name ) {
3252				case 'xlink:actuate':
3253					return 'xlink actuate';
3254
3255				case 'xlink:arcrole':
3256					return 'xlink arcrole';
3257
3258				case 'xlink:href':
3259					return 'xlink href';
3260
3261				case 'xlink:role':
3262					return 'xlink role';
3263
3264				case 'xlink:show':
3265					return 'xlink show';
3266
3267				case 'xlink:title':
3268					return 'xlink title';
3269
3270				case 'xlink:type':
3271					return 'xlink type';
3272
3273				case 'xml:lang':
3274					return 'xml lang';
3275
3276				case 'xml:space':
3277					return 'xml space';
3278
3279				case 'xmlns':
3280					return 'xmlns';
3281
3282				case 'xmlns:xlink':
3283					return 'xmlns xlink';
3284			}
3285		}
3286
3287		return $attribute_name;
3288	}
3289
3290	/**
3291	 * Indicates if the currently matched tag contains the self-closing flag.
3292	 *
3293	 * No HTML elements ought to have the self-closing flag and for those, the self-closing
3294	 * flag will be ignored. For void elements this is benign because they "self close"
3295	 * automatically. For non-void HTML elements though problems will appear if someone
3296	 * intends to use a self-closing element in place of that element with an empty body.
3297	 * For HTML foreign elements and custom elements the self-closing flag determines if
3298	 * they self-close or not.
3299	 *
3300	 * This function does not determine if a tag is self-closing,
3301	 * but only if the self-closing flag is present in the syntax.
3302	 *
3303	 * @since 6.3.0
3304	 *
3305	 * @return bool Whether the currently matched tag contains the self-closing flag.
3306	 */
3307	public function has_self_closing_flag(): bool {
3308		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
3309			return false;
3310		}
3311
3312		/*
3313		 * The self-closing flag is the solidus at the _end_ of the tag, not the beginning.
3314		 *
3315		 * Example:
3316		 *
3317		 *     <figure />
3318		 *             ^ this appears one character before the end of the closing ">".
3319		 */
3320		return '/' === $this->html[ $this->token_starts_at + $this->token_length - 2 ];
3321	}
3322
3323	/**
3324	 * Indicates if the current tag token is a tag closer.
3325	 *
3326	 * Example:
3327	 *
3328	 *     $p = new WP_HTML_Tag_Processor( '<div></div>' );
3329	 *     $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
3330	 *     $p->is_tag_closer() === false;
3331	 *
3332	 *     $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
3333	 *     $p->is_tag_closer() === true;
3334	 *
3335	 * @since 6.2.0
3336	 * @since 6.7.0 Reports all BR tags as opening tags.
3337	 *
3338	 * @return bool Whether the current tag is a tag closer.
3339	 */
3340	public function is_tag_closer(): bool {
3341		return (
3342			self::STATE_MATCHED_TAG === $this->parser_state &&
3343			$this->is_closing_tag &&
3344
3345			/*
3346			 * The BR tag can only exist as an opening tag. If something like `</br>`
3347			 * appears then the HTML parser will treat it as an opening tag with no
3348			 * attributes. The BR tag is unique in this way.
3349			 *
3350			 * @see https://html.spec.whatwg.org/#parsing-main-inbody
3351			 */
3352			'BR' !== $this->get_tag()
3353		);
3354	}
3355
3356	/**
3357	 * Indicates the kind of matched token, if any.
3358	 *
3359	 * This differs from `get_token_name()` in that it always
3360	 * returns a static string indicating the type, whereas
3361	 * `get_token_name()` may return values derived from the
3362	 * token itself, such as a tag name or processing
3363	 * instruction tag.
3364	 *
3365	 * Possible values:
3366	 *  - `#tag` when matched on a tag.
3367	 *  - `#text` when matched on a text node.
3368	 *  - `#cdata-section` when matched on a CDATA node.
3369	 *  - `#comment` when matched on a comment.
3370	 *  - `#doctype` when matched on a DOCTYPE declaration.
3371	 *  - `#presumptuous-tag` when matched on an empty tag closer.
3372	 *  - `#funky-comment` when matched on a funky comment.
3373	 *
3374	 * @since 6.5.0
3375	 *
3376	 * @return string|null What kind of token is matched, or null.
3377	 */
3378	public function get_token_type(): ?string {
3379		switch ( $this->parser_state ) {
3380			case self::STATE_MATCHED_TAG:
3381				return '#tag';
3382
3383			case self::STATE_DOCTYPE:
3384				return '#doctype';
3385
3386			default:
3387				return $this->get_token_name();
3388		}
3389	}
3390
3391	/**
3392	 * Returns the node name represented by the token.
3393	 *
3394	 * This matches the DOM API value `nodeName`. Some values
3395	 * are static, such as `#text` for a text node, while others
3396	 * are dynamically generated from the token itself.
3397	 *
3398	 * Dynamic names:
3399	 *  - Uppercase tag name for tag matches.
3400	 *  - `html` for DOCTYPE declarations.
3401	 *
3402	 * Note that if the Tag Processor is not matched on a token
3403	 * then this function will return `null`, either because it
3404	 * hasn't yet found a token or because it reached the end
3405	 * of the document without matching a token.
3406	 *
3407	 * @since 6.5.0
3408	 *
3409	 * @return string|null Name of the matched token.
3410	 */
3411	public function get_token_name(): ?string {
3412		switch ( $this->parser_state ) {
3413			case self::STATE_MATCHED_TAG:
3414				return $this->get_tag();
3415
3416			case self::STATE_TEXT_NODE:
3417				return '#text';
3418
3419			case self::STATE_CDATA_NODE:
3420				return '#cdata-section';
3421
3422			case self::STATE_COMMENT:
3423				return '#comment';
3424
3425			case self::STATE_DOCTYPE:
3426				return 'html';
3427
3428			case self::STATE_PRESUMPTUOUS_TAG:
3429				return '#presumptuous-tag';
3430
3431			case self::STATE_FUNKY_COMMENT:
3432				return '#funky-comment';
3433		}
3434
3435		return null;
3436	}
3437
3438	/**
3439	 * Indicates what kind of comment produced the comment node.
3440	 *
3441	 * Because there are different kinds of HTML syntax which produce
3442	 * comments, the Tag Processor tracks and exposes this as a type
3443	 * for the comment. Nominally only regular HTML comments exist as
3444	 * they are commonly known, but a number of unrelated syntax errors
3445	 * also produce comments.
3446	 *
3447	 * @see self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT
3448	 * @see self::COMMENT_AS_CDATA_LOOKALIKE
3449	 * @see self::COMMENT_AS_INVALID_HTML
3450	 * @see self::COMMENT_AS_HTML_COMMENT
3451	 * @see self::COMMENT_AS_PI_NODE_LOOKALIKE
3452	 *
3453	 * @since 6.5.0
3454	 *
3455	 * @return string|null
3456	 */
3457	public function get_comment_type(): ?string {
3458		if ( self::STATE_COMMENT !== $this->parser_state ) {
3459			return null;
3460		}
3461
3462		return $this->comment_type;
3463	}
3464
3465	/**
3466	 * Returns the text of a matched comment or null if not on a comment type node.
3467	 *
3468	 * This method returns the entire text content of a comment node as it
3469	 * would appear in the browser.
3470	 *
3471	 * This differs from {@see ::get_modifiable_text()} in that certain comment
3472	 * types in the HTML API cannot allow their entire comment text content to
3473	 * be modified. Namely, "bogus comments" of the form `<?not allowed in html>`
3474	 * will create a comment whose text content starts with `?`. Note that if
3475	 * that character were modified, it would be possible to change the node
3476	 * type.
3477	 *
3478	 * @since 6.7.0
3479	 *
3480	 * @return string|null The comment text as it would appear in the browser or null
3481	 *                     if not on a comment type node.
3482	 */
3483	public function get_full_comment_text(): ?string {
3484		if ( self::STATE_FUNKY_COMMENT === $this->parser_state ) {
3485			return $this->get_modifiable_text();
3486		}
3487
3488		if ( self::STATE_COMMENT !== $this->parser_state ) {
3489			return null;
3490		}
3491
3492		switch ( $this->get_comment_type() ) {
3493			case self::COMMENT_AS_HTML_COMMENT:
3494			case self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT:
3495				return $this->get_modifiable_text();
3496
3497			case self::COMMENT_AS_CDATA_LOOKALIKE:
3498				return "[CDATA[{$this->get_modifiable_text()}]]";
3499
3500			case self::COMMENT_AS_PI_NODE_LOOKALIKE:
3501				return "?{$this->get_tag()}{$this->get_modifiable_text()}?";
3502
3503			/*
3504			 * This represents "bogus comments state" from HTML tokenization.
3505			 * This can be entered by `<?` or `<!`, where `?` is included in
3506			 * the comment text but `!` is not.
3507			 */
3508			case self::COMMENT_AS_INVALID_HTML:
3509				$preceding_character = $this->html[ $this->text_starts_at - 1 ];
3510				$comment_start       = '?' === $preceding_character ? '?' : '';
3511				return "{$comment_start}{$this->get_modifiable_text()}";
3512		}
3513
3514		return null;
3515	}
3516
3517	/**
3518	 * Subdivides a matched text node, splitting NULL byte sequences and decoded whitespace as
3519	 * distinct nodes prefixes.
3520	 *
3521	 * Note that once anything that's neither a NULL byte nor decoded whitespace is
3522	 * encountered, then the remainder of the text node is left intact as generic text.
3523	 *
3524	 *  - The HTML Processor uses this to apply distinct rules for different kinds of text.
3525	 *  - Inter-element whitespace can be detected and skipped with this method.
3526	 *
3527	 * Text nodes aren't eagerly subdivided because there's no need to split them unless
3528	 * decisions are being made on NULL byte sequences or whitespace-only text.
3529	 *
3530	 * Example:
3531	 *
3532	 *     $processor = new WP_HTML_Tag_Processor( "\x00Apples & Oranges" );
3533	 *     true  === $processor->next_token();                   // Text is "Apples & Oranges".
3534	 *     true  === $processor->subdivide_text_appropriately(); // Text is "".
3535	 *     true  === $processor->next_token();                   // Text is "Apples & Oranges".
3536	 *     false === $processor->subdivide_text_appropriately();
3537	 *
3538	 *     $processor = new WP_HTML_Tag_Processor( "&#x13; \r\n\tMore" );
3539	 *     true  === $processor->next_token();                   // Text is "␤ ␤␉More".
3540	 *     true  === $processor->subdivide_text_appropriately(); // Text is "␤ ␤␉".
3541	 *     true  === $processor->next_token();                   // Text is "More".
3542	 *     false === $processor->subdivide_text_appropriately();
3543	 *
3544	 * @since 6.7.0
3545	 *
3546	 * @return bool Whether the text node was subdivided.
3547	 */
3548	public function subdivide_text_appropriately(): bool {
3549		if ( self::STATE_TEXT_NODE !== $this->parser_state ) {
3550			return false;
3551		}
3552
3553		$this->text_node_classification = self::TEXT_IS_GENERIC;
3554
3555		/*
3556		 * NULL bytes are treated categorically different than numeric character
3557		 * references whose number is zero. `&#x00;` is not the same as `"\x00"`.
3558		 */
3559		$leading_nulls = strspn( $this->html, "\x00", $this->text_starts_at, $this->text_length );
3560		if ( $leading_nulls > 0 ) {
3561			$this->token_length             = $leading_nulls;
3562			$this->text_length              = $leading_nulls;
3563			$this->bytes_already_parsed     = $this->token_starts_at + $leading_nulls;
3564			$this->text_node_classification = self::TEXT_IS_NULL_SEQUENCE;
3565			return true;
3566		}
3567
3568		/*
3569		 * Start a decoding loop to determine the point at which the
3570		 * text subdivides. This entails raw whitespace bytes and any
3571		 * character reference that decodes to the same.
3572		 */
3573		$at  = $this->text_starts_at;
3574		$end = $this->text_starts_at + $this->text_length;
3575		while ( $at < $end ) {
3576			$skipped = strspn( $this->html, " \t\f\r\n", $at, $end - $at );
3577			$at     += $skipped;
3578
3579			if ( $at < $end && '&' === $this->html[ $at ] ) {
3580				$matched_byte_length = null;
3581				$replacement         = WP_HTML_Decoder::read_character_reference( 'data', $this->html, $at, $matched_byte_length );
3582				if ( isset( $replacement ) && 1 === strspn( $replacement, " \t\f\r\n" ) ) {
3583					$at += $matched_byte_length;
3584					continue;
3585				}
3586			}
3587
3588			break;
3589		}
3590
3591		if ( $at > $this->text_starts_at ) {
3592			$new_length                     = $at - $this->text_starts_at;
3593			$this->text_length              = $new_length;
3594			$this->token_length             = $new_length;
3595			$this->bytes_already_parsed     = $at;
3596			$this->text_node_classification = self::TEXT_IS_WHITESPACE;
3597			return true;
3598		}
3599
3600		return false;
3601	}
3602
3603	/**
3604	 * Returns the modifiable text for a matched token, or an empty string.
3605	 *
3606	 * Modifiable text is text content that may be read and changed without
3607	 * changing the HTML structure of the document around it. This includes
3608	 * the contents of `#text` nodes in the HTML as well as the inner
3609	 * contents of HTML comments, Processing Instructions, and others, even
3610	 * though these nodes aren't part of a parsed DOM tree. They also contain
3611	 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any
3612	 * other section in an HTML document which cannot contain HTML markup (DATA).
3613	 *
3614	 * If a token has no modifiable text then an empty string is returned to
3615	 * avoid needless crashing or type errors. An empty string does not mean
3616	 * that a token has modifiable text, and a token with modifiable text may
3617	 * have an empty string (e.g. a comment with no contents).
3618	 *
3619	 * Limitations:
3620	 *
3621	 *  - This function will not strip the leading newline appropriately
3622	 *    after seeking into a LISTING or PRE element. To ensure that the
3623	 *    newline is treated properly, seek to the LISTING or PRE opening
3624	 *    tag instead of to the first text node inside the element.
3625	 *
3626	 * @since 6.5.0
3627	 * @since 6.7.0 Replaces NULL bytes (U+0000) and newlines appropriately.
3628	 *
3629	 * @return string
3630	 */
3631	public function get_modifiable_text(): string {
3632		$has_enqueued_update = isset( $this->lexical_updates['modifiable text'] );
3633
3634		if ( ! $has_enqueued_update && ( null === $this->text_starts_at || 0 === $this->text_length ) ) {
3635			return '';
3636		}
3637
3638		$text = $has_enqueued_update
3639			? $this->lexical_updates['modifiable text']->text
3640			: substr( $this->html, $this->text_starts_at, $this->text_length );
3641
3642		/*
3643		 * Pre-processing the input stream would normally happen before
3644		 * any parsing is done, but deferring it means it's possible to
3645		 * skip in most cases. When getting the modifiable text, however
3646		 * it's important to apply the pre-processing steps, which is
3647		 * normalizing newlines.
3648		 *
3649		 * @see https://html.spec.whatwg.org/#preprocessing-the-input-stream
3650		 * @see https://infra.spec.whatwg.org/#normalize-newlines
3651		 */
3652		$text = str_replace( "\r\n", "\n", $text );
3653		$text = str_replace( "\r", "\n", $text );
3654
3655		// Comment data is not decoded.
3656		if (
3657			self::STATE_CDATA_NODE === $this->parser_state ||
3658			self::STATE_COMMENT === $this->parser_state ||
3659			self::STATE_DOCTYPE === $this->parser_state ||
3660			self::STATE_FUNKY_COMMENT === $this->parser_state
3661		) {
3662			return str_replace( "\x00", "\u{FFFD}", $text );
3663		}
3664
3665		$tag_name = $this->get_token_name();
3666		if (
3667			// Script data is not decoded.
3668			'SCRIPT' === $tag_name ||
3669
3670			// RAWTEXT data is not decoded.
3671			'IFRAME' === $tag_name ||
3672			'NOEMBED' === $tag_name ||
3673			'NOFRAMES' === $tag_name ||
3674			'STYLE' === $tag_name ||
3675			'XMP' === $tag_name
3676		) {
3677			return str_replace( "\x00", "\u{FFFD}", $text );
3678		}
3679
3680		$decoded = WP_HTML_Decoder::decode_text_node( $text );
3681
3682		/*
3683		 * Skip the first line feed after LISTING, PRE, and TEXTAREA opening tags.
3684		 *
3685		 * Note that this first newline may come in the form of a character
3686		 * reference, such as `&#x0a;`, and so it's important to perform
3687		 * this transformation only after decoding the raw text content.
3688		 */
3689		if (
3690			( "\n" === ( $decoded[0] ?? '' ) ) &&
3691			( ( $this->skip_newline_at === $this->token_starts_at && '#text' === $tag_name ) || 'TEXTAREA' === $tag_name )
3692		) {
3693			$decoded = substr( $decoded, 1 );
3694		}
3695
3696		/*
3697		 * Only in normative text nodes does the NULL byte (U+0000) get removed.
3698		 * In all other contexts it's replaced by the replacement character (U+FFFD)
3699		 * for security reasons (to avoid joining together strings that were safe
3700		 * when separated, but not when joined).
3701		 *
3702		 * @todo Inside HTML integration points and MathML integration points, the
3703		 *       text is processed according to the insertion mode, not according
3704		 *       to the foreign content rules. This should strip the NULL bytes.
3705		 */
3706		return ( '#text' === $tag_name && 'html' === $this->get_namespace() )
3707			? str_replace( "\x00", '', $decoded )
3708			: str_replace( "\x00", "\u{FFFD}", $decoded );
3709	}
3710
3711	/**
3712	 * Sets the modifiable text for the matched token, if matched.
3713	 *
3714	 * Modifiable text is text content that may be read and changed without
3715	 * changing the HTML structure of the document around it. This includes
3716	 * the contents of `#text` nodes in the HTML as well as the inner
3717	 * contents of HTML comments, Processing Instructions, and others, even
3718	 * though these nodes aren't part of a parsed DOM tree. They also contain
3719	 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any
3720	 * other section in an HTML document which cannot contain HTML markup (DATA).
3721	 *
3722	 * Not all modifiable text may be set by this method, and not all content
3723	 * may be set as modifiable text. In the case that this fails it will return
3724	 * `false` indicating as much. For instance, it will not allow inserting the
3725	 * string `</script` into a SCRIPT element, because the rules for escaping
3726	 * that safely are complicated. Similarly, it will not allow setting content
3727	 * into a comment which would prematurely terminate the comment.
3728	 *
3729	 * Example:
3730	 *
3731	 *     // Add a preface to all STYLE contents.
3732	 *     while ( $processor->next_tag( 'STYLE' ) ) {
3733	 *         $style = $processor->get_modifiable_text();
3734	 *         $processor->set_modifiable_text( "// Made with love on the World Wide Web\n{$style}" );
3735	 *     }
3736	 *
3737	 *     // Replace smiley text with Emoji smilies.
3738	 *     while ( $processor->next_token() ) {
3739	 *         if ( '#text' !== $processor->get_token_name() ) {
3740	 *             continue;
3741	 *         }
3742	 *
3743	 *         $chunk = $processor->get_modifiable_text();
3744	 *         if ( ! str_contains( $chunk, ':)' ) ) {
3745	 *             continue;
3746	 *         }
3747	 *
3748	 *         $processor->set_modifiable_text( str_replace( ':)', '🙂', $chunk ) );
3749	 *     }
3750	 *
3751	 * This function handles all necessary HTML encoding. Provide normal, unescaped string values.
3752	 * The HTML API will encode the strings appropriately so that the browser will interpret them
3753	 * as the intended value.
3754	 *
3755	 * Example:
3756	 *
3757	 *     // Renders as “Eggs & Milk” in a browser, encoded as `<p>Eggs &amp; Milk</p>`.
3758	 *     $processor->set_modifiable_text( 'Eggs & Milk' );
3759	 *
3760	 *     // Renders as “Eggs &amp; Milk” in a browser, encoded as `<p>Eggs &amp;amp; Milk</p>`.
3761	 *     $processor->set_modifiable_text( 'Eggs &amp; Milk' );
3762	 *
3763	 * @since 6.7.0
3764	 * @since 6.9.0 Escapes all character references instead of trying to avoid double-escaping.
3765	 *
3766	 * @param string $plaintext_content New text content to represent in the matched token.
3767	 * @return bool Whether the text was able to update.
3768	 */
3769	public function set_modifiable_text( string $plaintext_content ): bool {
3770		if ( self::STATE_TEXT_NODE === $this->parser_state ) {
3771			$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
3772				$this->text_starts_at,
3773				$this->text_length,
3774				strtr(
3775					$plaintext_content,
3776					array(
3777						'<' => '&lt;',
3778						'>' => '&gt;',
3779						'&' => '&amp;',
3780						'"' => '&quot;',
3781						"'" => '&apos;',
3782					)
3783				)
3784			);
3785
3786			return true;
3787		}
3788
3789		// Comment data is not encoded.
3790		if (
3791			self::STATE_COMMENT === $this->parser_state &&
3792			self::COMMENT_AS_HTML_COMMENT === $this->comment_type
3793		) {
3794			// Check if the text could close the comment.
3795			if ( 1 === preg_match( '/--!?>/', $plaintext_content ) ) {
3796				return false;
3797			}
3798
3799			$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
3800				$this->text_starts_at,
3801				$this->text_length,
3802				$plaintext_content
3803			);
3804
3805			return true;
3806		}
3807
3808		if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
3809			return false;
3810		}
3811
3812		switch ( $this->get_tag() ) {
3813			case 'SCRIPT':
3814				/**
3815				 * This is over-protective, but ensures the update doesn't break
3816				 * the HTML structure of the SCRIPT element.
3817				 *
3818				 * More thorough analysis could track the HTML tokenizer states
3819				 * and to ensure that the SCRIPT element closes at the expected
3820				 * SCRIPT close tag as is done in {@see ::skip_script_data()}.
3821				 *
3822				 * A SCRIPT element could be closed prematurely by contents
3823				 * like `</script>`. A SCRIPT element could be prevented from
3824				 * closing by contents like `<!--<script>`.
3825				 *
3826				 * The following strings are essential for dangerous content,
3827				 * although they are insufficient on their own. This trade-off
3828				 * prevents dangerous scripts from being sent to the browser.
3829				 * It is also unlikely to produce HTML that may confuse more
3830				 * basic HTML tooling.
3831				 */
3832				if (
3833					false !== stripos( $plaintext_content, '</script' ) ||
3834					false !== stripos( $plaintext_content, '<script' )
3835				) {
3836					return false;
3837				}
3838
3839				$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
3840					$this->text_starts_at,
3841					$this->text_length,
3842					$plaintext_content
3843				);
3844
3845				return true;
3846
3847			case 'STYLE':
3848				$plaintext_content = preg_replace_callback(
3849					'~</(?P<TAG_NAME>style)~i',
3850					static function ( $tag_match ) {
3851						return "\\3c\\2f{$tag_match['TAG_NAME']}";
3852					},
3853					$plaintext_content
3854				);
3855
3856				$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
3857					$this->text_starts_at,
3858					$this->text_length,
3859					$plaintext_content
3860				);
3861
3862				return true;
3863
3864			case 'TEXTAREA':
3865			case 'TITLE':
3866				$plaintext_content = preg_replace_callback(
3867					"~</(?P<TAG_NAME>{$this->get_tag()})~i",
3868					static function ( $tag_match ) {
3869						return "&lt;/{$tag_match['TAG_NAME']}";
3870					},
3871					$plaintext_content
3872				);
3873
3874				/*
3875				 * These don't _need_ to be escaped, but since they are decoded it's
3876				 * safe to leave them escaped and this can prevent other code from
3877				 * naively detecting tags within the contents.
3878				 *
3879				 * @todo It would be useful to prefix a multiline replacement text
3880				 *       with a newline, but not necessary. This is for aesthetics.
3881				 */
3882				$this->lexical_updates['modifiable text'] = new WP_HTML_Text_Replacement(
3883					$this->text_starts_at,
3884					$this->text_length,
3885					$plaintext_content
3886				);
3887
3888				return true;
3889		}
3890
3891		return false;
3892	}
3893
3894	/**
3895	 * Updates or creates a new attribute on the currently matched tag with the passed value.
3896	 *
3897	 * This function handles all necessary HTML encoding. Provide normal, unescaped string values.
3898	 * The HTML API will encode the strings appropriately so that the browser will interpret them
3899	 * as the intended value.
3900	 *
3901	 * Example:
3902	 *
3903	 *     // Renders “Eggs & Milk” in a browser, encoded as `<abbr title="Eggs &amp; Milk">`.
3904	 *     $processor->set_attribute( 'title', 'Eggs & Milk' );
3905	 *
3906	 *     // Renders “Eggs &amp; Milk” in a browser, encoded as `<abbr title="Eggs &amp;amp; Milk">`.
3907	 *     $processor->set_attribute( 'title', 'Eggs &amp; Milk' );
3908	 *
3909	 *     // Renders `true` as `<abbr title>`.
3910	 *     $processor->set_attribute( 'title', true );
3911	 *
3912	 *     // Renders without the attribute for `false` as `<abbr>`.
3913	 *     $processor->set_attribute( 'title', false );
3914	 *
3915	 * Special handling is provided for boolean attribute values:
3916	 *  - When `true` is passed as the value, then only the attribute name is added to the tag.
3917	 *  - When `false` is passed, the attribute gets removed if it existed before.
3918	 *
3919	 * @since 6.2.0
3920	 * @since 6.2.1 Fix: Only create a single update for multiple calls with case-variant attribute names.
3921	 * @since 6.9.0 Escapes all character references instead of trying to avoid double-escaping.
3922	 *
3923	 * @param string      $name  The attribute name to target.
3924	 * @param string|bool $value The new attribute value.
3925	 * @return bool Whether an attribute value was set.
3926	 */
3927	public function set_attribute( $name, $value ): bool {
3928		if (
3929			self::STATE_MATCHED_TAG !== $this->parser_state ||
3930			$this->is_closing_tag
3931		) {
3932			return false;
3933		}
3934
3935		$name_length = strlen( $name );
3936
3937		/**
3938		 * WordPress rejects more characters than are strictly forbidden
3939		 * in HTML5. This is to prevent additional security risks deeper
3940		 * in the WordPress and plugin stack. Specifically the following
3941		 * are not allowed to be set as part of an HTML attribute name:
3942		 *
3943		 *  - greater-than “>”
3944		 *  - ampersand “&”
3945		 *
3946		 * @see https://html.spec.whatwg.org/#attributes-2
3947		 */
3948		if (
3949			0 === $name_length ||
3950			// Syntax-like characters.
3951			strcspn( $name, '"\'>&</ =' ) !== $name_length ||
3952			// Control characters.
3953			strcspn(
3954				$name,
3955				"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F" .
3956				"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F"
3957			) !== $name_length ||
3958			// Unicode noncharacters.
3959			wp_has_noncharacters( $name )
3960		) {
3961			_doing_it_wrong(
3962				__METHOD__,
3963				__( 'Invalid attribute name.' ),
3964				'6.2.0'
3965			);
3966
3967			return false;
3968		}
3969
3970		/*
3971		 * > The values "true" and "false" are not allowed on boolean attributes.
3972		 * > To represent a false value, the attribute has to be omitted altogether.
3973		 *     - HTML5 spec, https://html.spec.whatwg.org/#boolean-attributes
3974		 */
3975		if ( false === $value ) {
3976			return $this->remove_attribute( $name );
3977		}
3978
3979		if ( true === $value ) {
3980			$updated_attribute = $name;
3981		} else {
3982			$comparable_name = strtolower( $name );
3983
3984			/**
3985			 * Escape attribute values appropriately.
3986			 *
3987			 * @see https://html.spec.whatwg.org/#attributes-3
3988			 */
3989			$escaped_new_value = in_array( $comparable_name, wp_kses_uri_attributes(), true )
3990				? esc_url( $value )
3991				: strtr(
3992					$value,
3993					array(
3994						'<' => '&lt;',
3995						'>' => '&gt;',
3996						'&' => '&amp;',
3997						'"' => '&quot;',
3998						"'" => '&apos;',
3999					)
4000				);
4001
4002			// If the escaping functions wiped out the update, reject it and indicate it was rejected.
4003			if ( '' === $escaped_new_value && '' !== $value ) {
4004				return false;
4005			}
4006
4007			$updated_attribute = "{$name}=\"{$escaped_new_value}\"";
4008		}
4009
4010		/*
4011		 * > There must never be two or more attributes on
4012		 * > the same start tag whose names are an ASCII
4013		 * > case-insensitive match for each other.
4014		 *     - HTML 5 spec
4015		 *
4016		 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
4017		 */
4018		$comparable_name = strtolower( $name );
4019
4020		if ( isset( $this->attributes[ $comparable_name ] ) ) {
4021			/*
4022			 * Update an existing attribute.
4023			 *
4024			 * Example – set attribute id to "new" in <div id="initial_id" />:
4025			 *
4026			 *     <div id="initial_id"/>
4027			 *          ^-------------^
4028			 *          start         end
4029			 *     replacement: `id="new"`
4030			 *
4031			 *     Result: <div id="new"/>
4032			 */
4033			$existing_attribute                        = $this->attributes[ $comparable_name ];
4034			$this->lexical_updates[ $comparable_name ] = new WP_HTML_Text_Replacement(
4035				$existing_attribute->start,
4036				$existing_attribute->length,
4037				$updated_attribute
4038			);
4039		} else {
4040			/*
4041			 * Create a new attribute at the tag's name end.
4042			 *
4043			 * Example – add attribute id="new" to <div />:
4044			 *
4045			 *     <div/>
4046			 *         ^
4047			 *         start and end
4048			 *     replacement: ` id="new"`
4049			 *
4050			 *     Result: <div id="new"/>
4051			 */
4052			$this->lexical_updates[ $comparable_name ] = new WP_HTML_Text_Replacement(
4053				$this->tag_name_starts_at + $this->tag_name_length,
4054				0,
4055				' ' . $updated_attribute
4056			);
4057		}
4058
4059		/*
4060		 * Any calls to update the `class` attribute directly should wipe out any
4061		 * enqueued class changes from `add_class` and `remove_class`.
4062		 */
4063		if ( 'class' === $comparable_name && ! empty( $this->classname_updates ) ) {
4064			$this->classname_updates = array();
4065		}
4066
4067		return true;
4068	}
4069
4070	/**
4071	 * Remove an attribute from the currently-matched tag.
4072	 *
4073	 * @since 6.2.0
4074	 *
4075	 * @param string $name The attribute name to remove.
4076	 * @return bool Whether an attribute was removed.
4077	 */
4078	public function remove_attribute( $name ): bool {
4079		if (
4080			self::STATE_MATCHED_TAG !== $this->parser_state ||
4081			$this->is_closing_tag
4082		) {
4083			return false;
4084		}
4085
4086		/*
4087		 * > There must never be two or more attributes on
4088		 * > the same start tag whose names are an ASCII
4089		 * > case-insensitive match for each other.
4090		 *     - HTML 5 spec
4091		 *
4092		 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
4093		 */
4094		$name = strtolower( $name );
4095
4096		/*
4097		 * Any calls to update the `class` attribute directly should wipe out any
4098		 * enqueued class changes from `add_class` and `remove_class`.
4099		 */
4100		if ( 'class' === $name && count( $this->classname_updates ) !== 0 ) {
4101			$this->classname_updates = array();
4102		}
4103
4104		/*
4105		 * If updating an attribute that didn't exist in the input
4106		 * document, then remove the enqueued update and move on.
4107		 *
4108		 * For example, this might occur when calling `remove_attribute()`
4109		 * after calling `set_attribute()` for the same attribute
4110		 * and when that attribute wasn't originally present.
4111		 */
4112		if ( ! isset( $this->attributes[ $name ] ) ) {
4113			if ( isset( $this->lexical_updates[ $name ] ) ) {
4114				unset( $this->lexical_updates[ $name ] );
4115			}
4116			return false;
4117		}
4118
4119		/*
4120		 * Removes an existing tag attribute.
4121		 *
4122		 * Example – remove the attribute id from <div id="main"/>:
4123		 *    <div id="initial_id"/>
4124		 *         ^-------------^
4125		 *         start         end
4126		 *    replacement: ``
4127		 *
4128		 *    Result: <div />
4129		 */
4130		$this->lexical_updates[ $name ] = new WP_HTML_Text_Replacement(
4131			$this->attributes[ $name ]->start,
4132			$this->attributes[ $name ]->length,
4133			''
4134		);
4135
4136		// Removes any duplicated attributes if they were also present.
4137		foreach ( $this->duplicate_attributes[ $name ] ?? array() as $attribute_token ) {
4138			$this->lexical_updates[] = new WP_HTML_Text_Replacement(
4139				$attribute_token->start,
4140				$attribute_token->length,
4141				''
4142			);
4143		}
4144
4145		return true;
4146	}
4147
4148	/**
4149	 * Adds a new class name to the currently matched tag.
4150	 *
4151	 * @since 6.2.0
4152	 *
4153	 * @param string $class_name The class name to add.
4154	 * @return bool Whether the class was set to be added.
4155	 */
4156	public function add_class( $class_name ): bool {
4157		if (
4158			self::STATE_MATCHED_TAG !== $this->parser_state ||
4159			$this->is_closing_tag
4160		) {
4161			return false;
4162		}
4163
4164		if ( self::QUIRKS_MODE !== $this->compat_mode ) {
4165			$this->classname_updates[ $class_name ] = self::ADD_CLASS;
4166			return true;
4167		}
4168
4169		/*
4170		 * Because class names are matched ASCII-case-insensitively in quirks mode,
4171		 * this needs to see if a case variant of the given class name is already
4172		 * enqueued and update that existing entry, if so. This picks the casing of
4173		 * the first-provided class name for all lexical variations.
4174		 */
4175		$class_name_length = strlen( $class_name );
4176		foreach ( $this->classname_updates as $updated_name => $action ) {
4177			if (
4178				strlen( $updated_name ) === $class_name_length &&
4179				0 === substr_compare( $updated_name, $class_name, 0, $class_name_length, true )
4180			) {
4181				$this->classname_updates[ $updated_name ] = self::ADD_CLASS;
4182				return true;
4183			}
4184		}
4185
4186		$this->classname_updates[ $class_name ] = self::ADD_CLASS;
4187		return true;
4188	}
4189
4190	/**
4191	 * Removes a class name from the currently matched tag.
4192	 *
4193	 * @since 6.2.0
4194	 *
4195	 * @param string $class_name The class name to remove.
4196	 * @return bool Whether the class was set to be removed.
4197	 */
4198	public function remove_class( $class_name ): bool {
4199		if (
4200			self::STATE_MATCHED_TAG !== $this->parser_state ||
4201			$this->is_closing_tag
4202		) {
4203			return false;
4204		}
4205
4206		if ( self::QUIRKS_MODE !== $this->compat_mode ) {
4207			$this->classname_updates[ $class_name ] = self::REMOVE_CLASS;
4208			return true;
4209		}
4210
4211		/*
4212		 * Because class names are matched ASCII-case-insensitively in quirks mode,
4213		 * this needs to see if a case variant of the given class name is already
4214		 * enqueued and update that existing entry, if so. This picks the casing of
4215		 * the first-provided class name for all lexical variations.
4216		 */
4217		$class_name_length = strlen( $class_name );
4218		foreach ( $this->classname_updates as $updated_name => $action ) {
4219			if (
4220				strlen( $updated_name ) === $class_name_length &&
4221				0 === substr_compare( $updated_name, $class_name, 0, $class_name_length, true )
4222			) {
4223				$this->classname_updates[ $updated_name ] = self::REMOVE_CLASS;
4224				return true;
4225			}
4226		}
4227
4228		$this->classname_updates[ $class_name ] = self::REMOVE_CLASS;
4229		return true;
4230	}
4231
4232	/**
4233	 * Returns the string representation of the HTML Tag Processor.
4234	 *
4235	 * @since 6.2.0
4236	 *
4237	 * @see WP_HTML_Tag_Processor::get_updated_html()
4238	 *
4239	 * @return string The processed HTML.
4240	 */
4241	public function __toString(): string {
4242		return $this->get_updated_html();
4243	}
4244
4245	/**
4246	 * Returns the string representation of the HTML Tag Processor.
4247	 *
4248	 * @since 6.2.0
4249	 * @since 6.2.1 Shifts the internal cursor corresponding to the applied updates.
4250	 * @since 6.4.0 No longer calls subclass method `next_tag()` after updating HTML.
4251	 *
4252	 * @return string The processed HTML.
4253	 */
4254	public function get_updated_html(): string {
4255		$requires_no_updating = 0 === count( $this->classname_updates ) && 0 === count( $this->lexical_updates );
4256
4257		/*
4258		 * When there is nothing more to update and nothing has already been
4259		 * updated, return the original document and avoid a string copy.
4260		 */
4261		if ( $requires_no_updating ) {
4262			return $this->html;
4263		}
4264
4265		/*
4266		 * Keep track of the position right before the current tag. This will
4267		 * be necessary for reparsing the current tag after updating the HTML.
4268		 */
4269		$before_current_tag = $this->token_starts_at ?? 0;
4270
4271		/*
4272		 * 1. Apply the enqueued edits and update all the pointers to reflect those changes.
4273		 */
4274		$this->class_name_updates_to_attributes_updates();
4275		$before_current_tag += $this->apply_attributes_updates( $before_current_tag );
4276
4277		/*
4278		 * 2. Rewind to before the current tag and reparse to get updated attributes.
4279		 *
4280		 * At this point the internal cursor points to the end of the tag name.
4281		 * Rewind before the tag name starts so that it's as if the cursor didn't
4282		 * move; a call to `next_tag()` will reparse the recently-updated attributes
4283		 * and additional calls to modify the attributes will apply at this same
4284		 * location, but in order to avoid issues with subclasses that might add
4285		 * behaviors to `next_tag()`, the internal methods should be called here
4286		 * instead.
4287		 *
4288		 * It's important to note that in this specific place there will be no change
4289		 * because the processor was already at a tag when this was called and it's
4290		 * rewinding only to the beginning of this very tag before reprocessing it
4291		 * and its attributes.
4292		 *
4293		 * <p>Previous HTML<em>More HTML</em></p>
4294		 *                 ↑  │ back up by the length of the tag name plus the opening <
4295		 *                 └←─┘ back up by strlen("em") + 1 ==> 3
4296		 */
4297		$this->bytes_already_parsed = $before_current_tag;
4298		$this->base_class_next_token();
4299
4300		return $this->html;
4301	}
4302
4303	/**
4304	 * Parses tag query input into internal search criteria.
4305	 *
4306	 * @since 6.2.0
4307	 *
4308	 * @param array|string|null $query {
4309	 *     Optional. Which tag name to find, having which class, etc. Default is to find any tag.
4310	 *
4311	 *     @type string|null $tag_name     Which tag to find, or `null` for "any tag."
4312	 *     @type int|null    $match_offset Find the Nth tag matching all search criteria.
4313	 *                                     1 for "first" tag, 3 for "third," etc.
4314	 *                                     Defaults to first tag.
4315	 *     @type string|null $class_name   Tag must contain this class name to match.
4316	 *     @type string      $tag_closers  "visit" or "skip": whether to stop on tag closers, e.g. </div>.
4317	 * }
4318	 */
4319	private function parse_query( $query ) {
4320		if ( null !== $query && $query === $this->last_query ) {
4321			return;
4322		}
4323
4324		$this->last_query          = $query;
4325		$this->sought_tag_name     = null;
4326		$this->sought_class_name   = null;
4327		$this->sought_match_offset = 1;
4328		$this->stop_on_tag_closers = false;
4329
4330		// A single string value means "find the tag of this name".
4331		if ( is_string( $query ) ) {
4332			$this->sought_tag_name = $query;
4333			return;
4334		}
4335
4336		// An empty query parameter applies no restrictions on the search.
4337		if ( null === $query ) {
4338			return;
4339		}
4340
4341		// If not using the string interface, an associative array is required.
4342		if ( ! is_array( $query ) ) {
4343			_doing_it_wrong(
4344				__METHOD__,
4345				__( 'The query argument must be an array or a tag name.' ),
4346				'6.2.0'
4347			);
4348			return;
4349		}
4350
4351		if ( isset( $query['tag_name'] ) && is_string( $query['tag_name'] ) ) {
4352			$this->sought_tag_name = $query['tag_name'];
4353		}
4354
4355		if ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) ) {
4356			$this->sought_class_name = $query['class_name'];
4357		}
4358
4359		if ( isset( $query['match_offset'] ) && is_int( $query['match_offset'] ) && 0 < $query['match_offset'] ) {
4360			$this->sought_match_offset = $query['match_offset'];
4361		}
4362
4363		if ( isset( $query['tag_closers'] ) ) {
4364			$this->stop_on_tag_closers = 'visit' === $query['tag_closers'];
4365		}
4366	}
4367
4368
4369	/**
4370	 * Checks whether a given tag and its attributes match the search criteria.
4371	 *
4372	 * @since 6.2.0
4373	 *
4374	 * @return bool Whether the given tag and its attribute match the search criteria.
4375	 */
4376	private function matches(): bool {
4377		if ( $this->is_closing_tag && ! $this->stop_on_tag_closers ) {
4378			return false;
4379		}
4380
4381		// Does the tag name match the requested tag name in a case-insensitive manner?
4382		if (
4383			isset( $this->sought_tag_name ) &&
4384			(
4385				strlen( $this->sought_tag_name ) !== $this->tag_name_length ||
4386				0 !== substr_compare( $this->html, $this->sought_tag_name, $this->tag_name_starts_at, $this->tag_name_length, true )
4387			)
4388		) {
4389			return false;
4390		}
4391
4392		if ( null !== $this->sought_class_name && ! $this->has_class( $this->sought_class_name ) ) {
4393			return false;
4394		}
4395
4396		return true;
4397	}
4398
4399	/**
4400	 * Gets DOCTYPE declaration info from a DOCTYPE token.
4401	 *
4402	 * DOCTYPE tokens may appear in many places in an HTML document. In most places, they are
4403	 * simply ignored. The main parsing functions find the basic shape of DOCTYPE tokens but
4404	 * do not perform detailed parsing.
4405	 *
4406	 * This method can be called to perform a full parse of the DOCTYPE token and retrieve
4407	 * its information.
4408	 *
4409	 * @return WP_HTML_Doctype_Info|null The DOCTYPE declaration information or `null` if not
4410	 *                                   currently at a DOCTYPE node.
4411	 */
4412	public function get_doctype_info(): ?WP_HTML_Doctype_Info {
4413		if ( self::STATE_DOCTYPE !== $this->parser_state ) {
4414			return null;
4415		}
4416
4417		return WP_HTML_Doctype_Info::from_doctype_token( substr( $this->html, $this->token_starts_at, $this->token_length ) );
4418	}
4419
4420	/**
4421	 * Parser Ready State.
4422	 *
4423	 * Indicates that the parser is ready to run and waiting for a state transition.
4424	 * It may not have started yet, or it may have just finished parsing a token and
4425	 * is ready to find the next one.
4426	 *
4427	 * @since 6.5.0
4428	 *
4429	 * @access private
4430	 */
4431	const STATE_READY = 'STATE_READY';
4432
4433	/**
4434	 * Parser Complete State.
4435	 *
4436	 * Indicates that the parser has reached the end of the document and there is
4437	 * nothing left to scan. It finished parsing the last token completely.
4438	 *
4439	 * @since 6.5.0
4440	 *
4441	 * @access private
4442	 */
4443	const STATE_COMPLETE = 'STATE_COMPLETE';
4444
4445	/**
4446	 * Parser Incomplete Input State.
4447	 *
4448	 * Indicates that the parser has reached the end of the document before finishing
4449	 * a token. It started parsing a token but there is a possibility that the input
4450	 * HTML document was truncated in the middle of a token.
4451	 *
4452	 * The parser is reset at the start of the incomplete token and has paused. There
4453	 * is nothing more than can be scanned unless provided a more complete document.
4454	 *
4455	 * @since 6.5.0
4456	 *
4457	 * @access private
4458	 */
4459	const STATE_INCOMPLETE_INPUT = 'STATE_INCOMPLETE_INPUT';
4460
4461	/**
4462	 * Parser Matched Tag State.
4463	 *
4464	 * Indicates that the parser has found an HTML tag and it's possible to get
4465	 * the tag name and read or modify its attributes (if it's not a closing tag).
4466	 *
4467	 * @since 6.5.0
4468	 *
4469	 * @access private
4470	 */
4471	const STATE_MATCHED_TAG = 'STATE_MATCHED_TAG';
4472
4473	/**
4474	 * Parser Text Node State.
4475	 *
4476	 * Indicates that the parser has found a text node and it's possible
4477	 * to read and modify that text.
4478	 *
4479	 * @since 6.5.0
4480	 *
4481	 * @access private
4482	 */
4483	const STATE_TEXT_NODE = 'STATE_TEXT_NODE';
4484
4485	/**
4486	 * Parser CDATA Node State.
4487	 *
4488	 * Indicates that the parser has found a CDATA node and it's possible
4489	 * to read and modify its modifiable text. Note that in HTML there are
4490	 * no CDATA nodes outside of foreign content (SVG and MathML). Outside
4491	 * of foreign content, they are treated as HTML comments.
4492	 *
4493	 * @since 6.5.0
4494	 *
4495	 * @access private
4496	 */
4497	const STATE_CDATA_NODE = 'STATE_CDATA_NODE';
4498
4499	/**
4500	 * Indicates that the parser has found an HTML comment and it's
4501	 * possible to read and modify its modifiable text.
4502	 *
4503	 * @since 6.5.0
4504	 *
4505	 * @access private
4506	 */
4507	const STATE_COMMENT = 'STATE_COMMENT';
4508
4509	/**
4510	 * Indicates that the parser has found a DOCTYPE node and it's
4511	 * possible to read its DOCTYPE information via `get_doctype_info()`.
4512	 *
4513	 * @since 6.5.0
4514	 *
4515	 * @access private
4516	 */
4517	const STATE_DOCTYPE = 'STATE_DOCTYPE';
4518
4519	/**
4520	 * Indicates that the parser has found an empty tag closer `</>`.
4521	 *
4522	 * Note that in HTML there are no empty tag closers, and they
4523	 * are ignored. Nonetheless, the Tag Processor still
4524	 * recognizes them as they appear in the HTML stream.
4525	 *
4526	 * These were historically discussed as a "presumptuous tag
4527	 * closer," which would close the nearest open tag, but were
4528	 * dismissed in favor of explicitly-closing tags.
4529	 *
4530	 * @since 6.5.0
4531	 *
4532	 * @access private
4533	 */
4534	const STATE_PRESUMPTUOUS_TAG = 'STATE_PRESUMPTUOUS_TAG';
4535
4536	/**
4537	 * Indicates that the parser has found a "funky comment"
4538	 * and it's possible to read and modify its modifiable text.
4539	 *
4540	 * Example:
4541	 *
4542	 *     </%url>
4543	 *     </{"wp-bit":"query/post-author"}>
4544	 *     </2>
4545	 *
4546	 * Funky comments are tag closers with invalid tag names. Note
4547	 * that in HTML these are turn into bogus comments. Nonetheless,
4548	 * the Tag Processor recognizes them in a stream of HTML and
4549	 * exposes them for inspection and modification.
4550	 *
4551	 * @since 6.5.0
4552	 *
4553	 * @access private
4554	 */
4555	const STATE_FUNKY_COMMENT = 'STATE_WP_FUNKY';
4556
4557	/**
4558	 * Indicates that a comment was created when encountering abruptly-closed HTML comment.
4559	 *
4560	 * Example:
4561	 *
4562	 *     <!-->
4563	 *     <!--->
4564	 *
4565	 * @since 6.5.0
4566	 */
4567	const COMMENT_AS_ABRUPTLY_CLOSED_COMMENT = 'COMMENT_AS_ABRUPTLY_CLOSED_COMMENT';
4568
4569	/**
4570	 * Indicates that a comment would be parsed as a CDATA node,
4571	 * were HTML to allow CDATA nodes outside of foreign content.
4572	 *
4573	 * Example:
4574	 *
4575	 *     <![CDATA[This is a CDATA node.]]>
4576	 *
4577	 * This is an HTML comment, but it looks like a CDATA node.
4578	 *
4579	 * @since 6.5.0
4580	 */
4581	const COMMENT_AS_CDATA_LOOKALIKE = 'COMMENT_AS_CDATA_LOOKALIKE';
4582
4583	/**
4584	 * Indicates that a comment was created when encountering
4585	 * normative HTML comment syntax.
4586	 *
4587	 * Example:
4588	 *
4589	 *     <!-- this is a comment -->
4590	 *
4591	 * @since 6.5.0
4592	 */
4593	const COMMENT_AS_HTML_COMMENT = 'COMMENT_AS_HTML_COMMENT';
4594
4595	/**
4596	 * Indicates that a comment would be parsed as a Processing
4597	 * Instruction node, were they to exist within HTML.
4598	 *
4599	 * Example:
4600	 *
4601	 *     <?wp __( 'Like' ) ?>
4602	 *
4603	 * This is an HTML comment, but it looks like a CDATA node.
4604	 *
4605	 * @since 6.5.0
4606	 */
4607	const COMMENT_AS_PI_NODE_LOOKALIKE = 'COMMENT_AS_PI_NODE_LOOKALIKE';
4608
4609	/**
4610	 * Indicates that a comment was created when encountering invalid
4611	 * HTML input, a so-called "bogus comment."
4612	 *
4613	 * Example:
4614	 *
4615	 *     <?nothing special>
4616	 *     <!{nothing special}>
4617	 *
4618	 * @since 6.5.0
4619	 */
4620	const COMMENT_AS_INVALID_HTML = 'COMMENT_AS_INVALID_HTML';
4621
4622	/**
4623	 * No-quirks mode document compatibility mode.
4624	 *
4625	 * > In no-quirks mode, the behavior is (hopefully) the desired behavior
4626	 * > described by the modern HTML and CSS specifications.
4627	 *
4628	 * @see self::$compat_mode
4629	 * @see https://developer.mozilla.org/en-US/docs/Web/HTML/Quirks_Mode_and_Standards_Mode
4630	 *
4631	 * @since 6.7.0
4632	 *
4633	 * @var string
4634	 */
4635	const NO_QUIRKS_MODE = 'no-quirks-mode';
4636
4637	/**
4638	 * Quirks mode document compatibility mode.
4639	 *
4640	 * > In quirks mode, layout emulates behavior in Navigator 4 and Internet
4641	 * > Explorer 5. This is essential in order to support websites that were
4642	 * > built before the widespread adoption of web standards.
4643	 *
4644	 * @see self::$compat_mode
4645	 * @see https://developer.mozilla.org/en-US/docs/Web/HTML/Quirks_Mode_and_Standards_Mode
4646	 *
4647	 * @since 6.7.0
4648	 *
4649	 * @var string
4650	 */
4651	const QUIRKS_MODE = 'quirks-mode';
4652
4653	/**
4654	 * Indicates that a span of text may contain any combination of significant
4655	 * kinds of characters: NULL bytes, whitespace, and others.
4656	 *
4657	 * @see self::$text_node_classification
4658	 * @see self::subdivide_text_appropriately
4659	 *
4660	 * @since 6.7.0
4661	 */
4662	const TEXT_IS_GENERIC = 'TEXT_IS_GENERIC';
4663
4664	/**
4665	 * Indicates that a span of text comprises a sequence only of NULL bytes.
4666	 *
4667	 * @see self::$text_node_classification
4668	 * @see self::subdivide_text_appropriately
4669	 *
4670	 * @since 6.7.0
4671	 */
4672	const TEXT_IS_NULL_SEQUENCE = 'TEXT_IS_NULL_SEQUENCE';
4673
4674	/**
4675	 * Indicates that a span of decoded text comprises only whitespace.
4676	 *
4677	 * @see self::$text_node_classification
4678	 * @see self::subdivide_text_appropriately
4679	 *
4680	 * @since 6.7.0
4681	 */
4682	const TEXT_IS_WHITESPACE = 'TEXT_IS_WHITESPACE';
4683
4684	/**
4685	 * Wakeup magic method.
4686	 *
4687	 * @since 6.9.2
4688	 */
4689	public function __wakeup() {
4690		throw new \LogicException( __CLASS__ . ' should never be unserialized' );
4691	}
4692}
4693
Ui Ux Design – Teachers Night Out
info@cardgames4educators.com
8:00 am - 7:00 pm
DISCOUNT OFFER: Get 25% off using "TEACHERS" Coupon Code
By admin
(1) comments
April 20, 2026
Masters In English How English Speaker

Lorem ipsum dolor sit amet consectetur. Morbi nibh porttitor in ut tristique mi at eget. Aliquam praesent nibh in ut enim habitasse. Nulla neque netus pellentesque dignissim nisi proin nisl. Fermentum ut aliquam vitae a aenean
Masters In English How English Speaker

User Links

Informations

Get in Touch

© 2024 Teachers Night Out. All Rights Reserved.

Category: Ui Ux Design

Masters In English How English Speaker

User Links

Informations

Get in Touch

© 2024 Teachers Night Out. All Rights Reserved.