run:R W Run
7.09 KB
2026-03-11 16:18:52
R W Run
2.71 KB
2026-03-11 16:18:52
R W Run
16.3 KB
2026-03-11 16:18:52
R W Run
24.79 KB
2026-03-11 16:18:52
R W Run
21.95 KB
2026-03-11 16:18:52
R W Run
11.07 KB
2026-03-11 16:18:52
R W Run
208.44 KB
2026-03-11 16:18:52
R W Run
1.07 KB
2026-03-11 16:18:52
R W Run
1.6 KB
2026-03-11 16:18:52
R W Run
147.75 KB
2026-03-11 16:18:52
R W Run
1.38 KB
2026-03-11 16:18:52
R W Run
3.33 KB
2026-03-11 16:18:52
R W Run
3.52 KB
2026-03-11 16:18:52
R W Run
78.28 KB
2026-03-11 16:18:52
R W Run
error_log
📄class-wp-html-processor.php
1<?php
2/**
3 * HTML API: WP_HTML_Processor class
4 *
5 * @package WordPress
6 * @subpackage HTML-API
7 * @since 6.4.0
8 */
9
10/**
11 * Core class used to safely parse and modify an HTML document.
12 *
13 * The HTML Processor class properly parses and modifies HTML5 documents.
14 *
15 * It supports a subset of the HTML5 specification, and when it encounters
16 * unsupported markup, it aborts early to avoid unintentionally breaking
17 * the document. The HTML Processor should never break an HTML document.
18 *
19 * While the `WP_HTML_Tag_Processor` is a valuable tool for modifying
20 * attributes on individual HTML tags, the HTML Processor is more capable
21 * and useful for the following operations:
22 *
23 * - Querying based on nested HTML structure.
24 *
25 * Eventually the HTML Processor will also support:
26 * - Wrapping a tag in surrounding HTML.
27 * - Unwrapping a tag by removing its parent.
28 * - Inserting and removing nodes.
29 * - Reading and changing inner content.
30 * - Navigating up or around HTML structure.
31 *
32 * ## Usage
33 *
34 * Use of this class requires three steps:
35 *
36 * 1. Call a static creator method with your input HTML document.
37 * 2. Find the location in the document you are looking for.
38 * 3. Request changes to the document at that location.
39 *
40 * Example:
41 *
42 * $processor = WP_HTML_Processor::create_fragment( $html );
43 * if ( $processor->next_tag( array( 'breadcrumbs' => array( 'DIV', 'FIGURE', 'IMG' ) ) ) ) {
44 * $processor->add_class( 'responsive-image' );
45 * }
46 *
47 * #### Breadcrumbs
48 *
49 * Breadcrumbs represent the stack of open elements from the root
50 * of the document or fragment down to the currently-matched node,
51 * if one is currently selected. Call WP_HTML_Processor::get_breadcrumbs()
52 * to inspect the breadcrumbs for a matched tag.
53 *
54 * Breadcrumbs can specify nested HTML structure and are equivalent
55 * to a CSS selector comprising tag names separated by the child
56 * combinator, such as "DIV > FIGURE > IMG".
57 *
58 * Since all elements find themselves inside a full HTML document
59 * when parsed, the return value from `get_breadcrumbs()` will always
60 * contain any implicit outermost elements. For example, when parsing
61 * with `create_fragment()` in the `BODY` context (the default), any
62 * tag in the given HTML document will contain `array( 'HTML', 'BODY', … )`
63 * in its breadcrumbs.
64 *
65 * Despite containing the implied outermost elements in their breadcrumbs,
66 * tags may be found with the shortest-matching breadcrumb query. That is,
67 * `array( 'IMG' )` matches all IMG elements and `array( 'P', 'IMG' )`
68 * matches all IMG elements directly inside a P element. To ensure that no
69 * partial matches erroneously match it's possible to specify in a query
70 * the full breadcrumb match all the way down from the root HTML element.
71 *
72 * Example:
73 *
74 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
75 * // ----- Matches here.
76 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'IMG' ) ) );
77 *
78 * $html = '<figure><img><figcaption>A <em>lovely</em> day outside</figcaption></figure>';
79 * // ---- Matches here.
80 * $processor->next_tag( array( 'breadcrumbs' => array( 'FIGURE', 'FIGCAPTION', 'EM' ) ) );
81 *
82 * $html = '<div><img></div><img>';
83 * // ----- Matches here, because IMG must be a direct child of the implicit BODY.
84 * $processor->next_tag( array( 'breadcrumbs' => array( 'BODY', 'IMG' ) ) );
85 *
86 * ## HTML Support
87 *
88 * This class implements a small part of the HTML5 specification.
89 * It's designed to operate within its support and abort early whenever
90 * encountering circumstances it can't properly handle. This is
91 * the principle way in which this class remains as simple as possible
92 * without cutting corners and breaking compliance.
93 *
94 * ### Supported elements
95 *
96 * If any unsupported element appears in the HTML input the HTML Processor
97 * will abort early and stop all processing. This draconian measure ensures
98 * that the HTML Processor won't break any HTML it doesn't fully understand.
99 *
100 * The HTML Processor supports all elements other than a specific set:
101 *
102 * - Any element inside a TABLE.
103 * - Any element inside foreign content, including SVG and MATH.
104 * - Any element outside the IN BODY insertion mode, e.g. doctype declarations, meta, links.
105 *
106 * ### Supported markup
107 *
108 * Some kinds of non-normative HTML involve reconstruction of formatting elements and
109 * re-parenting of mis-nested elements. For example, a DIV tag found inside a TABLE
110 * may in fact belong _before_ the table in the DOM. If the HTML Processor encounters
111 * such a case it will stop processing.
112 *
113 * The following list illustrates some common examples of unexpected HTML inputs that
114 * the HTML Processor properly parses and represents:
115 *
116 * - HTML with optional tags omitted, e.g. `<p>one<p>two`.
117 * - HTML with unexpected tag closers, e.g. `<p>one </span> more</p>`.
118 * - Non-void tags with self-closing flag, e.g. `<div/>the DIV is still open.</div>`.
119 * - Heading elements which close open heading elements of another level, e.g. `<h1>Closed by </h2>`.
120 * - Elements containing text that looks like other tags but isn't, e.g. `<title>The <img> is plaintext</title>`.
121 * - SCRIPT and STYLE tags containing text that looks like HTML but isn't, e.g. `<script>document.write('<p>Hi</p>');</script>`.
122 * - SCRIPT content which has been escaped, e.g. `<script><!-- document.write('<script>console.log("hi")</script>') --></script>`.
123 *
124 * ### Unsupported Features
125 *
126 * This parser does not report parse errors.
127 *
128 * Normally, when additional HTML or BODY tags are encountered in a document, if there
129 * are any additional attributes on them that aren't found on the previous elements,
130 * the existing HTML and BODY elements adopt those missing attribute values. This
131 * parser does not add those additional attributes.
132 *
133 * In certain situations, elements are moved to a different part of the document in
134 * a process called "adoption" and "fostering." Because the nodes move to a location
135 * in the document that the parser had already processed, this parser does not support
136 * these situations and will bail.
137 *
138 * @since 6.4.0
139 *
140 * @see WP_HTML_Tag_Processor
141 * @see https://html.spec.whatwg.org/
142 */
143class WP_HTML_Processor extends WP_HTML_Tag_Processor {
144 /**
145 * The maximum number of bookmarks allowed to exist at any given time.
146 *
147 * HTML processing requires more bookmarks than basic tag processing,
148 * so this class constant from the Tag Processor is overwritten.
149 *
150 * @since 6.4.0
151 *
152 * @var int
153 */
154 const MAX_BOOKMARKS = 100;
155
156 /**
157 * Holds the working state of the parser, including the stack of
158 * open elements and the stack of active formatting elements.
159 *
160 * Initialized in the constructor.
161 *
162 * @since 6.4.0
163 *
164 * @var WP_HTML_Processor_State
165 */
166 private $state;
167
168 /**
169 * Used to create unique bookmark names.
170 *
171 * This class sets a bookmark for every tag in the HTML document that it encounters.
172 * The bookmark name is auto-generated and increments, starting with `1`. These are
173 * internal bookmarks and are automatically released when the referring WP_HTML_Token
174 * goes out of scope and is garbage-collected.
175 *
176 * @since 6.4.0
177 *
178 * @see WP_HTML_Processor::$release_internal_bookmark_on_destruct
179 *
180 * @var int
181 */
182 private $bookmark_counter = 0;
183
184 /**
185 * Stores an explanation for why something failed, if it did.
186 *
187 * @see self::get_last_error
188 *
189 * @since 6.4.0
190 *
191 * @var string|null
192 */
193 private $last_error = null;
194
195 /**
196 * Stores context for why the parser bailed on unsupported HTML, if it did.
197 *
198 * @see self::get_unsupported_exception
199 *
200 * @since 6.7.0
201 *
202 * @var WP_HTML_Unsupported_Exception|null
203 */
204 private $unsupported_exception = null;
205
206 /**
207 * Releases a bookmark when PHP garbage-collects its wrapping WP_HTML_Token instance.
208 *
209 * This function is created inside the class constructor so that it can be passed to
210 * the stack of open elements and the stack of active formatting elements without
211 * exposing it as a public method on the class.
212 *
213 * @since 6.4.0
214 *
215 * @var Closure|null
216 */
217 private $release_internal_bookmark_on_destruct = null;
218
219 /**
220 * Stores stack events which arise during parsing of the
221 * HTML document, which will then supply the "match" events.
222 *
223 * @since 6.6.0
224 *
225 * @var WP_HTML_Stack_Event[]
226 */
227 private $element_queue = array();
228
229 /**
230 * Stores the current breadcrumbs.
231 *
232 * @since 6.7.0
233 *
234 * @var string[]
235 */
236 private $breadcrumbs = array();
237
238 /**
239 * Current stack event, if set, representing a matched token.
240 *
241 * Because the parser may internally point to a place further along in a document
242 * than the nodes which have already been processed (some "virtual" nodes may have
243 * appeared while scanning the HTML document), this will point at the "current" node
244 * being processed. It comes from the front of the element queue.
245 *
246 * @since 6.6.0
247 *
248 * @var WP_HTML_Stack_Event|null
249 */
250 private $current_element = null;
251
252 /**
253 * Context node if created as a fragment parser.
254 *
255 * @var WP_HTML_Token|null
256 */
257 private $context_node = null;
258
259 /*
260 * Public Interface Functions
261 */
262
263 /**
264 * Creates an HTML processor in the fragment parsing mode.
265 *
266 * Use this for cases where you are processing chunks of HTML that
267 * will be found within a bigger HTML document, such as rendered
268 * block output that exists within a post, `the_content` inside a
269 * rendered site layout.
270 *
271 * Fragment parsing occurs within a context, which is an HTML element
272 * that the document will eventually be placed in. It becomes important
273 * when special elements have different rules than others, such as inside
274 * a TEXTAREA or a TITLE tag where things that look like tags are text,
275 * or inside a SCRIPT tag where things that look like HTML syntax are JS.
276 *
277 * The context value should be a representation of the tag into which the
278 * HTML is found. For most cases this will be the body element. The HTML
279 * form is provided because a context element may have attributes that
280 * impact the parse, such as with a SCRIPT tag and its `type` attribute.
281 *
282 * ## Current HTML Support
283 *
284 * - The only supported context is `<body>`, which is the default value.
285 * - The only supported document encoding is `UTF-8`, which is the default value.
286 *
287 * @since 6.4.0
288 * @since 6.6.0 Returns `static` instead of `self` so it can create subclass instances.
289 *
290 * @param string $html Input HTML fragment to process.
291 * @param string $context Context element for the fragment, must be default of `<body>`.
292 * @param string $encoding Text encoding of the document; must be default of 'UTF-8'.
293 * @return static|null The created processor if successful, otherwise null.
294 */
295 public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
296 if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
297 return null;
298 }
299
300 if ( ! is_string( $html ) ) {
301 _doing_it_wrong(
302 __METHOD__,
303 __( 'The HTML parameter must be a string.' ),
304 '6.9.0'
305 );
306 return null;
307 }
308
309 $context_processor = static::create_full_parser( "<!DOCTYPE html>{$context}", $encoding );
310 if ( null === $context_processor ) {
311 return null;
312 }
313
314 while ( $context_processor->next_tag() ) {
315 if ( ! $context_processor->is_virtual() ) {
316 $context_processor->set_bookmark( 'final_node' );
317 }
318 }
319
320 if (
321 ! $context_processor->has_bookmark( 'final_node' ) ||
322 ! $context_processor->seek( 'final_node' )
323 ) {
324 _doing_it_wrong( __METHOD__, __( 'No valid context element was detected.' ), '6.8.0' );
325 return null;
326 }
327
328 return $context_processor->create_fragment_at_current_node( $html );
329 }
330
331 /**
332 * Creates an HTML processor in the full parsing mode.
333 *
334 * It's likely that a fragment parser is more appropriate, unless sending an
335 * entire HTML document from start to finish. Consider a fragment parser with
336 * a context node of `<body>`.
337 *
338 * UTF-8 is the only allowed encoding. If working with a document that
339 * isn't UTF-8, first convert the document to UTF-8, then pass in the
340 * converted HTML.
341 *
342 * @param string $html Input HTML document to process.
343 * @param string|null $known_definite_encoding Optional. If provided, specifies the charset used
344 * in the input byte stream. Currently must be UTF-8.
345 * @return static|null The created processor if successful, otherwise null.
346 */
347 public static function create_full_parser( $html, $known_definite_encoding = 'UTF-8' ) {
348 if ( 'UTF-8' !== $known_definite_encoding ) {
349 return null;
350 }
351 if ( ! is_string( $html ) ) {
352 _doing_it_wrong(
353 __METHOD__,
354 __( 'The HTML parameter must be a string.' ),
355 '6.9.0'
356 );
357 return null;
358 }
359
360 $processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
361 $processor->state->encoding = $known_definite_encoding;
362 $processor->state->encoding_confidence = 'certain';
363
364 return $processor;
365 }
366
367 /**
368 * Constructor.
369 *
370 * Do not use this method. Use the static creator methods instead.
371 *
372 * @access private
373 *
374 * @since 6.4.0
375 *
376 * @see WP_HTML_Processor::create_fragment()
377 *
378 * @param string $html HTML to process.
379 * @param string|null $use_the_static_create_methods_instead This constructor should not be called manually.
380 */
381 public function __construct( $html, $use_the_static_create_methods_instead = null ) {
382 parent::__construct( $html );
383
384 if ( self::CONSTRUCTOR_UNLOCK_CODE !== $use_the_static_create_methods_instead ) {
385 _doing_it_wrong(
386 __METHOD__,
387 sprintf(
388 /* translators: %s: WP_HTML_Processor::create_fragment(). */
389 __( 'Call %s to create an HTML Processor instead of calling the constructor directly.' ),
390 '<code>WP_HTML_Processor::create_fragment()</code>'
391 ),
392 '6.4.0'
393 );
394 }
395
396 $this->state = new WP_HTML_Processor_State();
397
398 $this->state->stack_of_open_elements->set_push_handler(
399 function ( WP_HTML_Token $token ): void {
400 $is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer();
401 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
402 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
403 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH, $provenance );
404
405 $this->change_parsing_namespace( $token->integration_node_type ? 'html' : $token->namespace );
406 }
407 );
408
409 $this->state->stack_of_open_elements->set_pop_handler(
410 function ( WP_HTML_Token $token ): void {
411 $is_virtual = ! isset( $this->state->current_token ) || ! $this->is_tag_closer();
412 $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name;
413 $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real';
414 $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::POP, $provenance );
415
416 $adjusted_current_node = $this->get_adjusted_current_node();
417
418 if ( $adjusted_current_node ) {
419 $this->change_parsing_namespace( $adjusted_current_node->integration_node_type ? 'html' : $adjusted_current_node->namespace );
420 } else {
421 $this->change_parsing_namespace( 'html' );
422 }
423 }
424 );
425
426 /*
427 * Create this wrapper so that it's possible to pass
428 * a private method into WP_HTML_Token classes without
429 * exposing it to any public API.
430 */
431 $this->release_internal_bookmark_on_destruct = function ( string $name ): void {
432 parent::release_bookmark( $name );
433 };
434 }
435
436 /**
437 * Creates a fragment processor at the current node.
438 *
439 * HTML Fragment parsing always happens with a context node. HTML Fragment Processors can be
440 * instantiated with a `BODY` context node via `WP_HTML_Processor::create_fragment( $html )`.
441 *
442 * The context node may impact how a fragment of HTML is parsed. For example, consider the HTML
443 * fragment `<td />Inside TD?</td>`.
444 *
445 * A BODY context node will produce the following tree:
446 *
447 * └─#text Inside TD?
448 *
449 * Notice that the `<td>` tags are completely ignored.
450 *
451 * Compare that with an SVG context node that produces the following tree:
452 *
453 * ├─svg:td
454 * └─#text Inside TD?
455 *
456 * Here, a `td` node in the `svg` namespace is created, and its self-closing flag is respected.
457 * This is a peculiarity of parsing HTML in foreign content like SVG.
458 *
459 * Finally, consider the tree produced with a TABLE context node:
460 *
461 * └─TBODY
462 * └─TR
463 * └─TD
464 * └─#text Inside TD?
465 *
466 * These examples demonstrate how important the context node may be when processing an HTML
467 * fragment. Special care must be taken when processing fragments that are expected to appear
468 * in specific contexts. SVG and TABLE are good examples, but there are others.
469 *
470 * @see https://html.spec.whatwg.org/multipage/parsing.html#html-fragment-parsing-algorithm
471 *
472 * @since 6.8.0
473 *
474 * @param string $html Input HTML fragment to process.
475 * @return static|null The created processor if successful, otherwise null.
476 */
477 private function create_fragment_at_current_node( string $html ) {
478 if ( $this->get_token_type() !== '#tag' || $this->is_tag_closer() ) {
479 _doing_it_wrong(
480 __METHOD__,
481 __( 'The context element must be a start tag.' ),
482 '6.8.0'
483 );
484 return null;
485 }
486
487 $tag_name = $this->current_element->token->node_name;
488 $namespace = $this->current_element->token->namespace;
489
490 if ( 'html' === $namespace && self::is_void( $tag_name ) ) {
491 _doing_it_wrong(
492 __METHOD__,
493 sprintf(
494 // translators: %s: A tag name like INPUT or BR.
495 __( 'The context element cannot be a void element, found "%s".' ),
496 $tag_name
497 ),
498 '6.8.0'
499 );
500 return null;
501 }
502
503 /*
504 * Prevent creating fragments at nodes that require a special tokenizer state.
505 * This is unsupported by the HTML Processor.
506 */
507 if (
508 'html' === $namespace &&
509 in_array( $tag_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP', 'PLAINTEXT' ), true )
510 ) {
511 _doing_it_wrong(
512 __METHOD__,
513 sprintf(
514 // translators: %s: A tag name like IFRAME or TEXTAREA.
515 __( 'The context element "%s" is not supported.' ),
516 $tag_name
517 ),
518 '6.8.0'
519 );
520 return null;
521 }
522
523 $fragment_processor = new static( $html, self::CONSTRUCTOR_UNLOCK_CODE );
524
525 $fragment_processor->compat_mode = $this->compat_mode;
526
527 // @todo Create "fake" bookmarks for non-existent but implied nodes.
528 $fragment_processor->bookmarks['root-node'] = new WP_HTML_Span( 0, 0 );
529 $root_node = new WP_HTML_Token(
530 'root-node',
531 'HTML',
532 false
533 );
534 $fragment_processor->state->stack_of_open_elements->push( $root_node );
535
536 $fragment_processor->bookmarks['context-node'] = new WP_HTML_Span( 0, 0 );
537 $fragment_processor->context_node = clone $this->current_element->token;
538 $fragment_processor->context_node->bookmark_name = 'context-node';
539 $fragment_processor->context_node->on_destroy = null;
540
541 $fragment_processor->breadcrumbs = array( 'HTML', $fragment_processor->context_node->node_name );
542
543 if ( 'TEMPLATE' === $fragment_processor->context_node->node_name ) {
544 $fragment_processor->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
545 }
546
547 $fragment_processor->reset_insertion_mode_appropriately();
548
549 /*
550 * > Set the parser's form element pointer to the nearest node to the context element that
551 * > is a form element (going straight up the ancestor chain, and including the element
552 * > itself, if it is a form element), if any. (If there is no such form element, the
553 * > form element pointer keeps its initial value, null.)
554 */
555 foreach ( $this->state->stack_of_open_elements->walk_up() as $element ) {
556 if ( 'FORM' === $element->node_name && 'html' === $element->namespace ) {
557 $fragment_processor->state->form_element = clone $element;
558 $fragment_processor->state->form_element->bookmark_name = null;
559 $fragment_processor->state->form_element->on_destroy = null;
560 break;
561 }
562 }
563
564 $fragment_processor->state->encoding_confidence = 'irrelevant';
565
566 /*
567 * Update the parsing namespace near the end of the process.
568 * This is important so that any push/pop from the stack of open
569 * elements does not change the parsing namespace.
570 */
571 $fragment_processor->change_parsing_namespace(
572 $this->current_element->token->integration_node_type ? 'html' : $namespace
573 );
574
575 return $fragment_processor;
576 }
577
578 /**
579 * Stops the parser and terminates its execution when encountering unsupported markup.
580 *
581 * @throws WP_HTML_Unsupported_Exception Halts execution of the parser.
582 *
583 * @since 6.7.0
584 *
585 * @param string $message Explains support is missing in order to parse the current node.
586 */
587 private function bail( string $message ) {
588 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ];
589 $token = substr( $this->html, $here->start, $here->length );
590
591 $open_elements = array();
592 foreach ( $this->state->stack_of_open_elements->stack as $item ) {
593 $open_elements[] = $item->node_name;
594 }
595
596 $active_formats = array();
597 foreach ( $this->state->active_formatting_elements->walk_down() as $item ) {
598 $active_formats[] = $item->node_name;
599 }
600
601 $this->last_error = self::ERROR_UNSUPPORTED;
602
603 $this->unsupported_exception = new WP_HTML_Unsupported_Exception(
604 $message,
605 $this->state->current_token->node_name,
606 $here->start,
607 $token,
608 $open_elements,
609 $active_formats
610 );
611
612 throw $this->unsupported_exception;
613 }
614
615 /**
616 * Returns the last error, if any.
617 *
618 * Various situations lead to parsing failure but this class will
619 * return `false` in all those cases. To determine why something
620 * failed it's possible to request the last error. This can be
621 * helpful to know to distinguish whether a given tag couldn't
622 * be found or if content in the document caused the processor
623 * to give up and abort processing.
624 *
625 * Example
626 *
627 * $processor = WP_HTML_Processor::create_fragment( '<template><strong><button><em><p><em>' );
628 * false === $processor->next_tag();
629 * WP_HTML_Processor::ERROR_UNSUPPORTED === $processor->get_last_error();
630 *
631 * @since 6.4.0
632 *
633 * @see self::ERROR_UNSUPPORTED
634 * @see self::ERROR_EXCEEDED_MAX_BOOKMARKS
635 *
636 * @return string|null The last error, if one exists, otherwise null.
637 */
638 public function get_last_error(): ?string {
639 return $this->last_error;
640 }
641
642 /**
643 * Returns context for why the parser aborted due to unsupported HTML, if it did.
644 *
645 * This is meant for debugging purposes, not for production use.
646 *
647 * @since 6.7.0
648 *
649 * @see self::$unsupported_exception
650 *
651 * @return WP_HTML_Unsupported_Exception|null
652 */
653 public function get_unsupported_exception() {
654 return $this->unsupported_exception;
655 }
656
657 /**
658 * Finds the next tag matching the $query.
659 *
660 * @todo Support matching the class name and tag name.
661 *
662 * @since 6.4.0
663 * @since 6.6.0 Visits all tokens, including virtual ones.
664 *
665 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
666 *
667 * @param array|string|null $query {
668 * Optional. Which tag name to find, having which class, etc. Default is to find any tag.
669 *
670 * @type string|null $tag_name Which tag to find, or `null` for "any tag."
671 * @type string $tag_closers 'visit' to pause at tag closers, 'skip' or unset to only visit openers.
672 * @type int|null $match_offset Find the Nth tag matching all search criteria.
673 * 1 for "first" tag, 3 for "third," etc.
674 * Defaults to first tag.
675 * @type string|null $class_name Tag must contain this whole class name to match.
676 * @type string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
677 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
678 * }
679 * @return bool Whether a tag was matched.
680 */
681 public function next_tag( $query = null ): bool {
682 $visit_closers = isset( $query['tag_closers'] ) && 'visit' === $query['tag_closers'];
683
684 if ( null === $query ) {
685 while ( $this->next_token() ) {
686 if ( '#tag' !== $this->get_token_type() ) {
687 continue;
688 }
689
690 if ( ! $this->is_tag_closer() || $visit_closers ) {
691 return true;
692 }
693 }
694
695 return false;
696 }
697
698 if ( is_string( $query ) ) {
699 $query = array( 'breadcrumbs' => array( $query ) );
700 }
701
702 if ( ! is_array( $query ) ) {
703 _doing_it_wrong(
704 __METHOD__,
705 __( 'Please pass a query array to this function.' ),
706 '6.4.0'
707 );
708 return false;
709 }
710
711 if ( isset( $query['tag_name'] ) ) {
712 $query['tag_name'] = strtoupper( $query['tag_name'] );
713 }
714
715 $needs_class = ( isset( $query['class_name'] ) && is_string( $query['class_name'] ) )
716 ? $query['class_name']
717 : null;
718
719 if ( ! ( array_key_exists( 'breadcrumbs', $query ) && is_array( $query['breadcrumbs'] ) ) ) {
720 while ( $this->next_token() ) {
721 if ( '#tag' !== $this->get_token_type() ) {
722 continue;
723 }
724
725 if ( isset( $query['tag_name'] ) && $query['tag_name'] !== $this->get_token_name() ) {
726 continue;
727 }
728
729 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
730 continue;
731 }
732
733 if ( ! $this->is_tag_closer() || $visit_closers ) {
734 return true;
735 }
736 }
737
738 return false;
739 }
740
741 $breadcrumbs = $query['breadcrumbs'];
742 $match_offset = isset( $query['match_offset'] ) ? (int) $query['match_offset'] : 1;
743
744 while ( $match_offset > 0 && $this->next_token() ) {
745 if ( '#tag' !== $this->get_token_type() || $this->is_tag_closer() ) {
746 continue;
747 }
748
749 if ( isset( $needs_class ) && ! $this->has_class( $needs_class ) ) {
750 continue;
751 }
752
753 if ( $this->matches_breadcrumbs( $breadcrumbs ) && 0 === --$match_offset ) {
754 return true;
755 }
756 }
757
758 return false;
759 }
760
761 /**
762 * Finds the next token in the HTML document.
763 *
764 * This doesn't currently have a way to represent non-tags and doesn't process
765 * semantic rules for text nodes. For access to the raw tokens consider using
766 * WP_HTML_Tag_Processor instead.
767 *
768 * @since 6.5.0 Added for internal support; do not use.
769 * @since 6.7.2 Refactored so subclasses may extend.
770 *
771 * @return bool Whether a token was parsed.
772 */
773 public function next_token(): bool {
774 return $this->next_visitable_token();
775 }
776
777 /**
778 * Ensures internal accounting is maintained for HTML semantic rules while
779 * the underlying Tag Processor class is seeking to a bookmark.
780 *
781 * This doesn't currently have a way to represent non-tags and doesn't process
782 * semantic rules for text nodes. For access to the raw tokens consider using
783 * WP_HTML_Tag_Processor instead.
784 *
785 * Note that this method may call itself recursively. This is why it is not
786 * implemented as {@see WP_HTML_Processor::next_token()}, which instead calls
787 * this method similarly to how {@see WP_HTML_Tag_Processor::next_token()}
788 * calls the {@see WP_HTML_Tag_Processor::base_class_next_token()} method.
789 *
790 * @since 6.7.2 Added for internal support.
791 *
792 * @access private
793 *
794 * @return bool
795 */
796 private function next_visitable_token(): bool {
797 $this->current_element = null;
798
799 if ( isset( $this->last_error ) ) {
800 return false;
801 }
802
803 /*
804 * Prime the events if there are none.
805 *
806 * @todo In some cases, probably related to the adoption agency
807 * algorithm, this call to step() doesn't create any new
808 * events. Calling it again creates them. Figure out why
809 * this is and if it's inherent or if it's a bug. Looping
810 * until there are events or until there are no more
811 * tokens works in the meantime and isn't obviously wrong.
812 */
813 if ( empty( $this->element_queue ) && $this->step() ) {
814 return $this->next_visitable_token();
815 }
816
817 // Process the next event on the queue.
818 $this->current_element = array_shift( $this->element_queue );
819 if ( ! isset( $this->current_element ) ) {
820 // There are no tokens left, so close all remaining open elements.
821 while ( $this->state->stack_of_open_elements->pop() ) {
822 continue;
823 }
824
825 return empty( $this->element_queue ) ? false : $this->next_visitable_token();
826 }
827
828 $is_pop = WP_HTML_Stack_Event::POP === $this->current_element->operation;
829
830 /*
831 * The root node only exists in the fragment parser, and closing it
832 * indicates that the parse is complete. Stop before popping it from
833 * the breadcrumbs.
834 */
835 if ( 'root-node' === $this->current_element->token->bookmark_name ) {
836 return $this->next_visitable_token();
837 }
838
839 // Adjust the breadcrumbs for this event.
840 if ( $is_pop ) {
841 array_pop( $this->breadcrumbs );
842 } else {
843 $this->breadcrumbs[] = $this->current_element->token->node_name;
844 }
845
846 // Avoid sending close events for elements which don't expect a closing.
847 if ( $is_pop && ! $this->expects_closer( $this->current_element->token ) ) {
848 return $this->next_visitable_token();
849 }
850
851 return true;
852 }
853
854 /**
855 * Indicates if the current tag token is a tag closer.
856 *
857 * Example:
858 *
859 * $p = WP_HTML_Processor::create_fragment( '<div></div>' );
860 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
861 * $p->is_tag_closer() === false;
862 *
863 * $p->next_tag( array( 'tag_name' => 'div', 'tag_closers' => 'visit' ) );
864 * $p->is_tag_closer() === true;
865 *
866 * @since 6.6.0 Subclassed for HTML Processor.
867 *
868 * @return bool Whether the current tag is a tag closer.
869 */
870 public function is_tag_closer(): bool {
871 return $this->is_virtual()
872 ? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() )
873 : parent::is_tag_closer();
874 }
875
876 /**
877 * Indicates if the currently-matched token is virtual, created by a stack operation
878 * while processing HTML, rather than a token found in the HTML text itself.
879 *
880 * @since 6.6.0
881 *
882 * @return bool Whether the current token is virtual.
883 */
884 private function is_virtual(): bool {
885 return (
886 isset( $this->current_element->provenance ) &&
887 'virtual' === $this->current_element->provenance
888 );
889 }
890
891 /**
892 * Indicates if the currently-matched tag matches the given breadcrumbs.
893 *
894 * A "*" represents a single tag wildcard, where any tag matches, but not no tags.
895 *
896 * At some point this function _may_ support a `**` syntax for matching any number
897 * of unspecified tags in the breadcrumb stack. This has been intentionally left
898 * out, however, to keep this function simple and to avoid introducing backtracking,
899 * which could open up surprising performance breakdowns.
900 *
901 * Example:
902 *
903 * $processor = WP_HTML_Processor::create_fragment( '<div><span><figure><img></figure></span></div>' );
904 * $processor->next_tag( 'img' );
905 * true === $processor->matches_breadcrumbs( array( 'figure', 'img' ) );
906 * true === $processor->matches_breadcrumbs( array( 'span', 'figure', 'img' ) );
907 * false === $processor->matches_breadcrumbs( array( 'span', 'img' ) );
908 * true === $processor->matches_breadcrumbs( array( 'span', '*', 'img' ) );
909 *
910 * @since 6.4.0
911 *
912 * @param string[] $breadcrumbs DOM sub-path at which element is found, e.g. `array( 'FIGURE', 'IMG' )`.
913 * May also contain the wildcard `*` which matches a single element, e.g. `array( 'SECTION', '*' )`.
914 * @return bool Whether the currently-matched tag is found at the given nested structure.
915 */
916 public function matches_breadcrumbs( $breadcrumbs ): bool {
917 // Everything matches when there are zero constraints.
918 if ( 0 === count( $breadcrumbs ) ) {
919 return true;
920 }
921
922 // Start at the last crumb.
923 $crumb = end( $breadcrumbs );
924
925 if ( '*' !== $crumb && $this->get_tag() !== strtoupper( $crumb ) ) {
926 return false;
927 }
928
929 for ( $i = count( $this->breadcrumbs ) - 1; $i >= 0; $i-- ) {
930 $node = $this->breadcrumbs[ $i ];
931 $crumb = strtoupper( current( $breadcrumbs ) );
932
933 if ( '*' !== $crumb && $node !== $crumb ) {
934 return false;
935 }
936
937 if ( false === prev( $breadcrumbs ) ) {
938 return true;
939 }
940 }
941
942 return false;
943 }
944
945 /**
946 * Indicates if the currently-matched node expects a closing
947 * token, or if it will self-close on the next step.
948 *
949 * Most HTML elements expect a closer, such as a P element or
950 * a DIV element. Others, like an IMG element are void and don't
951 * have a closing tag. Special elements, such as SCRIPT and STYLE,
952 * are treated just like void tags. Text nodes and self-closing
953 * foreign content will also act just like a void tag, immediately
954 * closing as soon as the processor advances to the next token.
955 *
956 * @since 6.6.0
957 *
958 * @param WP_HTML_Token|null $node Optional. Node to examine, if provided.
959 * Default is to examine current node.
960 * @return bool|null Whether to expect a closer for the currently-matched node,
961 * or `null` if not matched on any token.
962 */
963 public function expects_closer( ?WP_HTML_Token $node = null ): ?bool {
964 $token_name = $node->node_name ?? $this->get_token_name();
965
966 if ( ! isset( $token_name ) ) {
967 return null;
968 }
969
970 $token_namespace = $node->namespace ?? $this->get_namespace();
971 $token_has_self_closing = $node->has_self_closing_flag ?? $this->has_self_closing_flag();
972
973 return ! (
974 // Comments, text nodes, and other atomic tokens.
975 '#' === $token_name[0] ||
976 // Doctype declarations.
977 'html' === $token_name ||
978 // Void elements.
979 ( 'html' === $token_namespace && self::is_void( $token_name ) ) ||
980 // Special atomic elements.
981 ( 'html' === $token_namespace && in_array( $token_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) ||
982 // Self-closing elements in foreign content.
983 ( 'html' !== $token_namespace && $token_has_self_closing )
984 );
985 }
986
987 /**
988 * Steps through the HTML document and stop at the next tag, if any.
989 *
990 * @since 6.4.0
991 *
992 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
993 *
994 * @see self::PROCESS_NEXT_NODE
995 * @see self::REPROCESS_CURRENT_NODE
996 *
997 * @param string $node_to_process Whether to parse the next node or reprocess the current node.
998 * @return bool Whether a tag was matched.
999 */
1000 public function step( $node_to_process = self::PROCESS_NEXT_NODE ): bool {
1001 // Refuse to proceed if there was a previous error.
1002 if ( null !== $this->last_error ) {
1003 return false;
1004 }
1005
1006 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
1007 /*
1008 * Void elements still hop onto the stack of open elements even though
1009 * there's no corresponding closing tag. This is important for managing
1010 * stack-based operations such as "navigate to parent node" or checking
1011 * on an element's breadcrumbs.
1012 *
1013 * When moving on to the next node, therefore, if the bottom-most element
1014 * on the stack is a void element, it must be closed.
1015 */
1016 $top_node = $this->state->stack_of_open_elements->current_node();
1017 if ( isset( $top_node ) && ! $this->expects_closer( $top_node ) ) {
1018 $this->state->stack_of_open_elements->pop();
1019 }
1020 }
1021
1022 if ( self::PROCESS_NEXT_NODE === $node_to_process ) {
1023 parent::next_token();
1024 if ( WP_HTML_Tag_Processor::STATE_TEXT_NODE === $this->parser_state ) {
1025 parent::subdivide_text_appropriately();
1026 }
1027 }
1028
1029 // Finish stepping when there are no more tokens in the document.
1030 if (
1031 WP_HTML_Tag_Processor::STATE_INCOMPLETE_INPUT === $this->parser_state ||
1032 WP_HTML_Tag_Processor::STATE_COMPLETE === $this->parser_state
1033 ) {
1034 return false;
1035 }
1036
1037 $adjusted_current_node = $this->get_adjusted_current_node();
1038 $is_closer = $this->is_tag_closer();
1039 $is_start_tag = WP_HTML_Tag_Processor::STATE_MATCHED_TAG === $this->parser_state && ! $is_closer;
1040 $token_name = $this->get_token_name();
1041
1042 if ( self::REPROCESS_CURRENT_NODE !== $node_to_process ) {
1043 $this->state->current_token = new WP_HTML_Token(
1044 $this->bookmark_token(),
1045 $token_name,
1046 $this->has_self_closing_flag(),
1047 $this->release_internal_bookmark_on_destruct
1048 );
1049 }
1050
1051 $parse_in_current_insertion_mode = (
1052 0 === $this->state->stack_of_open_elements->count() ||
1053 'html' === $adjusted_current_node->namespace ||
1054 (
1055 'math' === $adjusted_current_node->integration_node_type &&
1056 (
1057 ( $is_start_tag && ! in_array( $token_name, array( 'MGLYPH', 'MALIGNMARK' ), true ) ) ||
1058 '#text' === $token_name
1059 )
1060 ) ||
1061 (
1062 'math' === $adjusted_current_node->namespace &&
1063 'ANNOTATION-XML' === $adjusted_current_node->node_name &&
1064 $is_start_tag && 'SVG' === $token_name
1065 ) ||
1066 (
1067 'html' === $adjusted_current_node->integration_node_type &&
1068 ( $is_start_tag || '#text' === $token_name )
1069 )
1070 );
1071
1072 try {
1073 if ( ! $parse_in_current_insertion_mode ) {
1074 return $this->step_in_foreign_content();
1075 }
1076
1077 switch ( $this->state->insertion_mode ) {
1078 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL:
1079 return $this->step_initial();
1080
1081 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML:
1082 return $this->step_before_html();
1083
1084 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD:
1085 return $this->step_before_head();
1086
1087 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD:
1088 return $this->step_in_head();
1089
1090 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT:
1091 return $this->step_in_head_noscript();
1092
1093 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD:
1094 return $this->step_after_head();
1095
1096 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
1097 return $this->step_in_body();
1098
1099 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
1100 return $this->step_in_table();
1101
1102 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT:
1103 return $this->step_in_table_text();
1104
1105 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
1106 return $this->step_in_caption();
1107
1108 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP:
1109 return $this->step_in_column_group();
1110
1111 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
1112 return $this->step_in_table_body();
1113
1114 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
1115 return $this->step_in_row();
1116
1117 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
1118 return $this->step_in_cell();
1119
1120 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT:
1121 return $this->step_in_select();
1122
1123 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE:
1124 return $this->step_in_select_in_table();
1125
1126 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE:
1127 return $this->step_in_template();
1128
1129 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY:
1130 return $this->step_after_body();
1131
1132 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET:
1133 return $this->step_in_frameset();
1134
1135 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET:
1136 return $this->step_after_frameset();
1137
1138 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY:
1139 return $this->step_after_after_body();
1140
1141 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET:
1142 return $this->step_after_after_frameset();
1143
1144 // This should be unreachable but PHP doesn't have total type checking on switch.
1145 default:
1146 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." );
1147 }
1148 } catch ( WP_HTML_Unsupported_Exception $e ) {
1149 /*
1150 * Exceptions are used in this class to escape deep call stacks that
1151 * otherwise might involve messier calling and return conventions.
1152 */
1153 return false;
1154 }
1155 }
1156
1157 /**
1158 * Computes the HTML breadcrumbs for the currently-matched node, if matched.
1159 *
1160 * Breadcrumbs start at the outermost parent and descend toward the matched element.
1161 * They always include the entire path from the root HTML node to the matched element.
1162 *
1163 * Example:
1164 *
1165 * $processor = WP_HTML_Processor::create_fragment( '<p><strong><em><img></em></strong></p>' );
1166 * $processor->next_tag( 'IMG' );
1167 * $processor->get_breadcrumbs() === array( 'HTML', 'BODY', 'P', 'STRONG', 'EM', 'IMG' );
1168 *
1169 * @since 6.4.0
1170 *
1171 * @return string[] Array of tag names representing path to matched node.
1172 */
1173 public function get_breadcrumbs(): array {
1174 return $this->breadcrumbs;
1175 }
1176
1177 /**
1178 * Returns the nesting depth of the current location in the document.
1179 *
1180 * Example:
1181 *
1182 * $processor = WP_HTML_Processor::create_fragment( '<div><p></p></div>' );
1183 * // The processor starts in the BODY context, meaning it has depth from the start: HTML > BODY.
1184 * 2 === $processor->get_current_depth();
1185 *
1186 * // Opening the DIV element increases the depth.
1187 * $processor->next_token();
1188 * 3 === $processor->get_current_depth();
1189 *
1190 * // Opening the P element increases the depth.
1191 * $processor->next_token();
1192 * 4 === $processor->get_current_depth();
1193 *
1194 * // The P element is closed during `next_token()` so the depth is decreased to reflect that.
1195 * $processor->next_token();
1196 * 3 === $processor->get_current_depth();
1197 *
1198 * @since 6.6.0
1199 *
1200 * @return int Nesting-depth of current location in the document.
1201 */
1202 public function get_current_depth(): int {
1203 return count( $this->breadcrumbs );
1204 }
1205
1206 /**
1207 * Normalizes an HTML fragment by serializing it.
1208 *
1209 * This method assumes that the given HTML snippet is found in BODY context.
1210 * For normalizing full documents or fragments found in other contexts, create
1211 * a new processor using {@see WP_HTML_Processor::create_fragment} or
1212 * {@see WP_HTML_Processor::create_full_parser} and call {@see WP_HTML_Processor::serialize}
1213 * on the created instances.
1214 *
1215 * Many aspects of an input HTML fragment may be changed during normalization.
1216 *
1217 * - Attribute values will be double-quoted.
1218 * - Duplicate attributes will be removed.
1219 * - Omitted tags will be added.
1220 * - Tag and attribute name casing will be lower-cased,
1221 * except for specific SVG and MathML tags or attributes.
1222 * - Text will be re-encoded, null bytes handled,
1223 * and invalid UTF-8 replaced with U+FFFD.
1224 * - Any incomplete syntax trailing at the end will be omitted,
1225 * for example, an unclosed comment opener will be removed.
1226 *
1227 * Example:
1228 *
1229 * echo WP_HTML_Processor::normalize( '<a href=#anchor v=5 href="/" enabled>One</a another v=5><!--' );
1230 * // <a href="#anchor" v="5" enabled>One</a>
1231 *
1232 * echo WP_HTML_Processor::normalize( '<div></p>fun<table><td>cell</div>' );
1233 * // <div><p></p>fun<table><tbody><tr><td>cell</td></tr></tbody></table></div>
1234 *
1235 * echo WP_HTML_Processor::normalize( '<![CDATA[invalid comment]]> syntax < <> "oddities"' );
1236 * // <!--[CDATA[invalid comment]]--> syntax &lt; &lt;&gt; &quot;oddities&quot;
1237 *
1238 * @since 6.7.0
1239 *
1240 * @param string $html Input HTML to normalize.
1241 *
1242 * @return string|null Normalized output, or `null` if unable to normalize.
1243 */
1244 public static function normalize( string $html ): ?string {
1245 return static::create_fragment( $html )->serialize();
1246 }
1247
1248 /**
1249 * Returns normalized HTML for a fragment by serializing it.
1250 *
1251 * This differs from {@see WP_HTML_Processor::normalize} in that it starts with
1252 * a specific HTML Processor, which _must_ not have already started scanning;
1253 * it must be in the initial ready state and will be in the completed state once
1254 * serialization is complete.
1255 *
1256 * Many aspects of an input HTML fragment may be changed during normalization.
1257 *
1258 * - Attribute values will be double-quoted.
1259 * - Duplicate attributes will be removed.
1260 * - Omitted tags will be added.
1261 * - Tag and attribute name casing will be lower-cased,
1262 * except for specific SVG and MathML tags or attributes.
1263 * - Text will be re-encoded, null bytes handled,
1264 * and invalid UTF-8 replaced with U+FFFD.
1265 * - Any incomplete syntax trailing at the end will be omitted,
1266 * for example, an unclosed comment opener will be removed.
1267 *
1268 * Example:
1269 *
1270 * $processor = WP_HTML_Processor::create_fragment( '<a href=#anchor v=5 href="/" enabled>One</a another v=5><!--' );
1271 * echo $processor->serialize();
1272 * // <a href="#anchor" v="5" enabled>One</a>
1273 *
1274 * $processor = WP_HTML_Processor::create_fragment( '<div></p>fun<table><td>cell</div>' );
1275 * echo $processor->serialize();
1276 * // <div><p></p>fun<table><tbody><tr><td>cell</td></tr></tbody></table></div>
1277 *
1278 * $processor = WP_HTML_Processor::create_fragment( '<![CDATA[invalid comment]]> syntax < <> "oddities"' );
1279 * echo $processor->serialize();
1280 * // <!--[CDATA[invalid comment]]--> syntax &lt; &lt;&gt; &quot;oddities&quot;
1281 *
1282 * @since 6.7.0
1283 *
1284 * @return string|null Normalized HTML markup represented by processor,
1285 * or `null` if unable to generate serialization.
1286 */
1287 public function serialize(): ?string {
1288 if ( WP_HTML_Tag_Processor::STATE_READY !== $this->parser_state ) {
1289 wp_trigger_error(
1290 __METHOD__,
1291 'An HTML Processor which has already started processing cannot serialize its contents. Serialize immediately after creating the instance.',
1292 E_USER_WARNING
1293 );
1294 return null;
1295 }
1296
1297 $html = '';
1298 while ( $this->next_token() ) {
1299 $html .= $this->serialize_token();
1300 }
1301
1302 if ( null !== $this->get_last_error() ) {
1303 wp_trigger_error(
1304 __METHOD__,
1305 "Cannot serialize HTML Processor with parsing error: {$this->get_last_error()}.",
1306 E_USER_WARNING
1307 );
1308 return null;
1309 }
1310
1311 return $html;
1312 }
1313
1314 /**
1315 * Serializes the currently-matched token.
1316 *
1317 * This method produces a fully-normative HTML string for the currently-matched token,
1318 * if able. If not matched at any token or if the token doesn't correspond to any HTML
1319 * it will return an empty string (for example, presumptuous end tags are ignored).
1320 *
1321 * @see static::serialize()
1322 *
1323 * @since 6.7.0
1324 * @since 6.9.0 Converted from protected to public method.
1325 *
1326 * @return string Serialization of token, or empty string if no serialization exists.
1327 */
1328 public function serialize_token(): string {
1329 $html = '';
1330 $token_type = $this->get_token_type();
1331
1332 switch ( $token_type ) {
1333 case '#doctype':
1334 $doctype = $this->get_doctype_info();
1335 if ( null === $doctype ) {
1336 break;
1337 }
1338
1339 $html .= '<!DOCTYPE';
1340
1341 if ( $doctype->name ) {
1342 $html .= " {$doctype->name}";
1343 }
1344
1345 if ( null !== $doctype->public_identifier ) {
1346 $quote = str_contains( $doctype->public_identifier, '"' ) ? "'" : '"';
1347 $html .= " PUBLIC {$quote}{$doctype->public_identifier}{$quote}";
1348 }
1349 if ( null !== $doctype->system_identifier ) {
1350 if ( null === $doctype->public_identifier ) {
1351 $html .= ' SYSTEM';
1352 }
1353 $quote = str_contains( $doctype->system_identifier, '"' ) ? "'" : '"';
1354 $html .= " {$quote}{$doctype->system_identifier}{$quote}";
1355 }
1356
1357 $html .= '>';
1358 break;
1359
1360 case '#text':
1361 $html .= htmlspecialchars( $this->get_modifiable_text(), ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5, 'UTF-8' );
1362 break;
1363
1364 // Unlike the `<>` which is interpreted as plaintext, this is ignored entirely.
1365 case '#presumptuous-tag':
1366 break;
1367
1368 case '#funky-comment':
1369 case '#comment':
1370 $html .= "<!--{$this->get_full_comment_text()}-->";
1371 break;
1372
1373 case '#cdata-section':
1374 $html .= "<![CDATA[{$this->get_modifiable_text()}]]>";
1375 break;
1376 }
1377
1378 if ( '#tag' !== $token_type ) {
1379 return $html;
1380 }
1381
1382 $tag_name = str_replace( "\x00", "\u{FFFD}", $this->get_tag() );
1383 $in_html = 'html' === $this->get_namespace();
1384 $qualified_name = $in_html ? strtolower( $tag_name ) : $this->get_qualified_tag_name();
1385
1386 if ( $this->is_tag_closer() ) {
1387 $html .= "</{$qualified_name}>";
1388 return $html;
1389 }
1390
1391 $attribute_names = $this->get_attribute_names_with_prefix( '' );
1392 if ( ! isset( $attribute_names ) ) {
1393 $html .= "<{$qualified_name}>";
1394 return $html;
1395 }
1396
1397 $html .= "<{$qualified_name}";
1398 foreach ( $attribute_names as $attribute_name ) {
1399 $html .= " {$this->get_qualified_attribute_name( $attribute_name )}";
1400 $value = $this->get_attribute( $attribute_name );
1401
1402 if ( is_string( $value ) ) {
1403 $html .= '="' . htmlspecialchars( $value, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5 ) . '"';
1404 }
1405
1406 $html = str_replace( "\x00", "\u{FFFD}", $html );
1407 }
1408
1409 if ( ! $in_html && $this->has_self_closing_flag() ) {
1410 $html .= ' /';
1411 }
1412
1413 $html .= '>';
1414
1415 // Flush out self-contained elements.
1416 if ( $in_html && in_array( $tag_name, array( 'IFRAME', 'NOEMBED', 'NOFRAMES', 'SCRIPT', 'STYLE', 'TEXTAREA', 'TITLE', 'XMP' ), true ) ) {
1417 $text = $this->get_modifiable_text();
1418
1419 switch ( $tag_name ) {
1420 case 'IFRAME':
1421 case 'NOEMBED':
1422 case 'NOFRAMES':
1423 $text = '';
1424 break;
1425
1426 case 'SCRIPT':
1427 case 'STYLE':
1428 break;
1429
1430 default:
1431 $text = htmlspecialchars( $text, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5, 'UTF-8' );
1432 }
1433
1434 $html .= "{$text}</{$qualified_name}>";
1435 }
1436
1437 return $html;
1438 }
1439
1440 /**
1441 * Parses next element in the 'initial' insertion mode.
1442 *
1443 * This internal function performs the 'initial' insertion mode
1444 * logic for the generalized WP_HTML_Processor::step() function.
1445 *
1446 * @since 6.7.0
1447 *
1448 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1449 *
1450 * @see https://html.spec.whatwg.org/#the-initial-insertion-mode
1451 * @see WP_HTML_Processor::step
1452 *
1453 * @return bool Whether an element was found.
1454 */
1455 private function step_initial(): bool {
1456 $token_name = $this->get_token_name();
1457 $token_type = $this->get_token_type();
1458 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
1459 $op = "{$op_sigil}{$token_name}";
1460
1461 switch ( $op ) {
1462 /*
1463 * > A character token that is one of U+0009 CHARACTER TABULATION,
1464 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1465 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1466 *
1467 * Parse error: ignore the token.
1468 */
1469 case '#text':
1470 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1471 return $this->step();
1472 }
1473 goto initial_anything_else;
1474 break;
1475
1476 /*
1477 * > A comment token
1478 */
1479 case '#comment':
1480 case '#funky-comment':
1481 case '#presumptuous-tag':
1482 $this->insert_html_element( $this->state->current_token );
1483 return true;
1484
1485 /*
1486 * > A DOCTYPE token
1487 */
1488 case 'html':
1489 $doctype = $this->get_doctype_info();
1490 if ( null !== $doctype && 'quirks' === $doctype->indicated_compatibility_mode ) {
1491 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE;
1492 }
1493
1494 /*
1495 * > Then, switch the insertion mode to "before html".
1496 */
1497 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
1498 $this->insert_html_element( $this->state->current_token );
1499 return true;
1500 }
1501
1502 /*
1503 * > Anything else
1504 */
1505 initial_anything_else:
1506 $this->compat_mode = WP_HTML_Tag_Processor::QUIRKS_MODE;
1507 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML;
1508 return $this->step( self::REPROCESS_CURRENT_NODE );
1509 }
1510
1511 /**
1512 * Parses next element in the 'before html' insertion mode.
1513 *
1514 * This internal function performs the 'before html' insertion mode
1515 * logic for the generalized WP_HTML_Processor::step() function.
1516 *
1517 * @since 6.7.0
1518 *
1519 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1520 *
1521 * @see https://html.spec.whatwg.org/#the-before-html-insertion-mode
1522 * @see WP_HTML_Processor::step
1523 *
1524 * @return bool Whether an element was found.
1525 */
1526 private function step_before_html(): bool {
1527 $token_name = $this->get_token_name();
1528 $token_type = $this->get_token_type();
1529 $is_closer = parent::is_tag_closer();
1530 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1531 $op = "{$op_sigil}{$token_name}";
1532
1533 switch ( $op ) {
1534 /*
1535 * > A DOCTYPE token
1536 */
1537 case 'html':
1538 // Parse error: ignore the token.
1539 return $this->step();
1540
1541 /*
1542 * > A comment token
1543 */
1544 case '#comment':
1545 case '#funky-comment':
1546 case '#presumptuous-tag':
1547 $this->insert_html_element( $this->state->current_token );
1548 return true;
1549
1550 /*
1551 * > A character token that is one of U+0009 CHARACTER TABULATION,
1552 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1553 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1554 *
1555 * Parse error: ignore the token.
1556 */
1557 case '#text':
1558 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1559 return $this->step();
1560 }
1561 goto before_html_anything_else;
1562 break;
1563
1564 /*
1565 * > A start tag whose tag name is "html"
1566 */
1567 case '+HTML':
1568 $this->insert_html_element( $this->state->current_token );
1569 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
1570 return true;
1571
1572 /*
1573 * > An end tag whose tag name is one of: "head", "body", "html", "br"
1574 *
1575 * Closing BR tags are always reported by the Tag Processor as opening tags.
1576 */
1577 case '-HEAD':
1578 case '-BODY':
1579 case '-HTML':
1580 /*
1581 * > Act as described in the "anything else" entry below.
1582 */
1583 goto before_html_anything_else;
1584 break;
1585 }
1586
1587 /*
1588 * > Any other end tag
1589 */
1590 if ( $is_closer ) {
1591 // Parse error: ignore the token.
1592 return $this->step();
1593 }
1594
1595 /*
1596 * > Anything else.
1597 *
1598 * > Create an html element whose node document is the Document object.
1599 * > Append it to the Document object. Put this element in the stack of open elements.
1600 * > Switch the insertion mode to "before head", then reprocess the token.
1601 */
1602 before_html_anything_else:
1603 $this->insert_virtual_node( 'HTML' );
1604 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
1605 return $this->step( self::REPROCESS_CURRENT_NODE );
1606 }
1607
1608 /**
1609 * Parses next element in the 'before head' insertion mode.
1610 *
1611 * This internal function performs the 'before head' insertion mode
1612 * logic for the generalized WP_HTML_Processor::step() function.
1613 *
1614 * @since 6.7.0 Stub implementation.
1615 *
1616 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1617 *
1618 * @see https://html.spec.whatwg.org/#the-before-head-insertion-mode
1619 * @see WP_HTML_Processor::step
1620 *
1621 * @return bool Whether an element was found.
1622 */
1623 private function step_before_head(): bool {
1624 $token_name = $this->get_token_name();
1625 $token_type = $this->get_token_type();
1626 $is_closer = parent::is_tag_closer();
1627 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1628 $op = "{$op_sigil}{$token_name}";
1629
1630 switch ( $op ) {
1631 /*
1632 * > A character token that is one of U+0009 CHARACTER TABULATION,
1633 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1634 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1635 *
1636 * Parse error: ignore the token.
1637 */
1638 case '#text':
1639 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1640 return $this->step();
1641 }
1642 goto before_head_anything_else;
1643 break;
1644
1645 /*
1646 * > A comment token
1647 */
1648 case '#comment':
1649 case '#funky-comment':
1650 case '#presumptuous-tag':
1651 $this->insert_html_element( $this->state->current_token );
1652 return true;
1653
1654 /*
1655 * > A DOCTYPE token
1656 */
1657 case 'html':
1658 // Parse error: ignore the token.
1659 return $this->step();
1660
1661 /*
1662 * > A start tag whose tag name is "html"
1663 */
1664 case '+HTML':
1665 return $this->step_in_body();
1666
1667 /*
1668 * > A start tag whose tag name is "head"
1669 */
1670 case '+HEAD':
1671 $this->insert_html_element( $this->state->current_token );
1672 $this->state->head_element = $this->state->current_token;
1673 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1674 return true;
1675
1676 /*
1677 * > An end tag whose tag name is one of: "head", "body", "html", "br"
1678 * > Act as described in the "anything else" entry below.
1679 *
1680 * Closing BR tags are always reported by the Tag Processor as opening tags.
1681 */
1682 case '-HEAD':
1683 case '-BODY':
1684 case '-HTML':
1685 goto before_head_anything_else;
1686 break;
1687 }
1688
1689 if ( $is_closer ) {
1690 // Parse error: ignore the token.
1691 return $this->step();
1692 }
1693
1694 /*
1695 * > Anything else
1696 *
1697 * > Insert an HTML element for a "head" start tag token with no attributes.
1698 */
1699 before_head_anything_else:
1700 $this->state->head_element = $this->insert_virtual_node( 'HEAD' );
1701 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1702 return $this->step( self::REPROCESS_CURRENT_NODE );
1703 }
1704
1705 /**
1706 * Parses next element in the 'in head' insertion mode.
1707 *
1708 * This internal function performs the 'in head' insertion mode
1709 * logic for the generalized WP_HTML_Processor::step() function.
1710 *
1711 * @since 6.7.0
1712 *
1713 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1714 *
1715 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inhead
1716 * @see WP_HTML_Processor::step
1717 *
1718 * @return bool Whether an element was found.
1719 */
1720 private function step_in_head(): bool {
1721 $token_name = $this->get_token_name();
1722 $token_type = $this->get_token_type();
1723 $is_closer = parent::is_tag_closer();
1724 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1725 $op = "{$op_sigil}{$token_name}";
1726
1727 switch ( $op ) {
1728 case '#text':
1729 /*
1730 * > A character token that is one of U+0009 CHARACTER TABULATION,
1731 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1732 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1733 */
1734 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1735 // Insert the character.
1736 $this->insert_html_element( $this->state->current_token );
1737 return true;
1738 }
1739
1740 goto in_head_anything_else;
1741 break;
1742
1743 /*
1744 * > A comment token
1745 */
1746 case '#comment':
1747 case '#funky-comment':
1748 case '#presumptuous-tag':
1749 $this->insert_html_element( $this->state->current_token );
1750 return true;
1751
1752 /*
1753 * > A DOCTYPE token
1754 */
1755 case 'html':
1756 // Parse error: ignore the token.
1757 return $this->step();
1758
1759 /*
1760 * > A start tag whose tag name is "html"
1761 */
1762 case '+HTML':
1763 return $this->step_in_body();
1764
1765 /*
1766 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link"
1767 */
1768 case '+BASE':
1769 case '+BASEFONT':
1770 case '+BGSOUND':
1771 case '+LINK':
1772 $this->insert_html_element( $this->state->current_token );
1773 return true;
1774
1775 /*
1776 * > A start tag whose tag name is "meta"
1777 */
1778 case '+META':
1779 $this->insert_html_element( $this->state->current_token );
1780
1781 // All following conditions depend on "tentative" encoding confidence.
1782 if ( 'tentative' !== $this->state->encoding_confidence ) {
1783 return true;
1784 }
1785
1786 /*
1787 * > If the active speculative HTML parser is null, then:
1788 * > - If the element has a charset attribute, and getting an encoding from
1789 * > its value results in an encoding, and the confidence is currently
1790 * > tentative, then change the encoding to the resulting encoding.
1791 */
1792 $charset = $this->get_attribute( 'charset' );
1793 if ( is_string( $charset ) ) {
1794 $this->bail( 'Cannot yet process META tags with charset to determine encoding.' );
1795 }
1796
1797 /*
1798 * > - Otherwise, if the element has an http-equiv attribute whose value is
1799 * > an ASCII case-insensitive match for the string "Content-Type", and
1800 * > the element has a content attribute, and applying the algorithm for
1801 * > extracting a character encoding from a meta element to that attribute's
1802 * > value returns an encoding, and the confidence is currently tentative,
1803 * > then change the encoding to the extracted encoding.
1804 */
1805 $http_equiv = $this->get_attribute( 'http-equiv' );
1806 $content = $this->get_attribute( 'content' );
1807 if (
1808 is_string( $http_equiv ) &&
1809 is_string( $content ) &&
1810 0 === strcasecmp( $http_equiv, 'Content-Type' )
1811 ) {
1812 $this->bail( 'Cannot yet process META tags with http-equiv Content-Type to determine encoding.' );
1813 }
1814
1815 return true;
1816
1817 /*
1818 * > A start tag whose tag name is "title"
1819 */
1820 case '+TITLE':
1821 $this->insert_html_element( $this->state->current_token );
1822 return true;
1823
1824 /*
1825 * > A start tag whose tag name is "noscript", if the scripting flag is enabled
1826 * > A start tag whose tag name is one of: "noframes", "style"
1827 *
1828 * The scripting flag is never enabled in this parser.
1829 */
1830 case '+NOFRAMES':
1831 case '+STYLE':
1832 $this->insert_html_element( $this->state->current_token );
1833 return true;
1834
1835 /*
1836 * > A start tag whose tag name is "noscript", if the scripting flag is disabled
1837 */
1838 case '+NOSCRIPT':
1839 $this->insert_html_element( $this->state->current_token );
1840 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT;
1841 return true;
1842
1843 /*
1844 * > A start tag whose tag name is "script"
1845 *
1846 * @todo Could the adjusted insertion location be anything other than the current location?
1847 */
1848 case '+SCRIPT':
1849 $this->insert_html_element( $this->state->current_token );
1850 return true;
1851
1852 /*
1853 * > An end tag whose tag name is "head"
1854 */
1855 case '-HEAD':
1856 $this->state->stack_of_open_elements->pop();
1857 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD;
1858 return true;
1859
1860 /*
1861 * > An end tag whose tag name is one of: "body", "html", "br"
1862 *
1863 * BR tags are always reported by the Tag Processor as opening tags.
1864 */
1865 case '-BODY':
1866 case '-HTML':
1867 /*
1868 * > Act as described in the "anything else" entry below.
1869 */
1870 goto in_head_anything_else;
1871 break;
1872
1873 /*
1874 * > A start tag whose tag name is "template"
1875 *
1876 * @todo Could the adjusted insertion location be anything other than the current location?
1877 */
1878 case '+TEMPLATE':
1879 $this->state->active_formatting_elements->insert_marker();
1880 $this->state->frameset_ok = false;
1881
1882 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
1883 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
1884
1885 $this->insert_html_element( $this->state->current_token );
1886 return true;
1887
1888 /*
1889 * > An end tag whose tag name is "template"
1890 */
1891 case '-TEMPLATE':
1892 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
1893 // @todo Indicate a parse error once it's possible.
1894 return $this->step();
1895 }
1896
1897 $this->generate_implied_end_tags_thoroughly();
1898 if ( ! $this->state->stack_of_open_elements->current_node_is( 'TEMPLATE' ) ) {
1899 // @todo Indicate a parse error once it's possible.
1900 }
1901
1902 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' );
1903 $this->state->active_formatting_elements->clear_up_to_last_marker();
1904 array_pop( $this->state->stack_of_template_insertion_modes );
1905 $this->reset_insertion_mode_appropriately();
1906 return true;
1907 }
1908
1909 /*
1910 * > A start tag whose tag name is "head"
1911 * > Any other end tag
1912 */
1913 if ( '+HEAD' === $op || $is_closer ) {
1914 // Parse error: ignore the token.
1915 return $this->step();
1916 }
1917
1918 /*
1919 * > Anything else
1920 */
1921 in_head_anything_else:
1922 $this->state->stack_of_open_elements->pop();
1923 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD;
1924 return $this->step( self::REPROCESS_CURRENT_NODE );
1925 }
1926
1927 /**
1928 * Parses next element in the 'in head noscript' insertion mode.
1929 *
1930 * This internal function performs the 'in head noscript' insertion mode
1931 * logic for the generalized WP_HTML_Processor::step() function.
1932 *
1933 * @since 6.7.0 Stub implementation.
1934 *
1935 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
1936 *
1937 * @see https://html.spec.whatwg.org/#parsing-main-inheadnoscript
1938 * @see WP_HTML_Processor::step
1939 *
1940 * @return bool Whether an element was found.
1941 */
1942 private function step_in_head_noscript(): bool {
1943 $token_name = $this->get_token_name();
1944 $token_type = $this->get_token_type();
1945 $is_closer = parent::is_tag_closer();
1946 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
1947 $op = "{$op_sigil}{$token_name}";
1948
1949 switch ( $op ) {
1950 /*
1951 * > A character token that is one of U+0009 CHARACTER TABULATION,
1952 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
1953 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
1954 *
1955 * Parse error: ignore the token.
1956 */
1957 case '#text':
1958 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
1959 return $this->step_in_head();
1960 }
1961
1962 goto in_head_noscript_anything_else;
1963 break;
1964
1965 /*
1966 * > A DOCTYPE token
1967 */
1968 case 'html':
1969 // Parse error: ignore the token.
1970 return $this->step();
1971
1972 /*
1973 * > A start tag whose tag name is "html"
1974 */
1975 case '+HTML':
1976 return $this->step_in_body();
1977
1978 /*
1979 * > An end tag whose tag name is "noscript"
1980 */
1981 case '-NOSCRIPT':
1982 $this->state->stack_of_open_elements->pop();
1983 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
1984 return true;
1985
1986 /*
1987 * > A comment token
1988 * >
1989 * > A start tag whose tag name is one of: "basefont", "bgsound",
1990 * > "link", "meta", "noframes", "style"
1991 */
1992 case '#comment':
1993 case '#funky-comment':
1994 case '#presumptuous-tag':
1995 case '+BASEFONT':
1996 case '+BGSOUND':
1997 case '+LINK':
1998 case '+META':
1999 case '+NOFRAMES':
2000 case '+STYLE':
2001 return $this->step_in_head();
2002
2003 /*
2004 * > An end tag whose tag name is "br"
2005 *
2006 * This should never happen, as the Tag Processor prevents showing a BR closing tag.
2007 */
2008 }
2009
2010 /*
2011 * > A start tag whose tag name is one of: "head", "noscript"
2012 * > Any other end tag
2013 */
2014 if ( '+HEAD' === $op || '+NOSCRIPT' === $op || $is_closer ) {
2015 // Parse error: ignore the token.
2016 return $this->step();
2017 }
2018
2019 /*
2020 * > Anything else
2021 *
2022 * Anything here is a parse error.
2023 */
2024 in_head_noscript_anything_else:
2025 $this->state->stack_of_open_elements->pop();
2026 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
2027 return $this->step( self::REPROCESS_CURRENT_NODE );
2028 }
2029
2030 /**
2031 * Parses next element in the 'after head' insertion mode.
2032 *
2033 * This internal function performs the 'after head' insertion mode
2034 * logic for the generalized WP_HTML_Processor::step() function.
2035 *
2036 * @since 6.7.0 Stub implementation.
2037 *
2038 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
2039 *
2040 * @see https://html.spec.whatwg.org/#the-after-head-insertion-mode
2041 * @see WP_HTML_Processor::step
2042 *
2043 * @return bool Whether an element was found.
2044 */
2045 private function step_after_head(): bool {
2046 $token_name = $this->get_token_name();
2047 $token_type = $this->get_token_type();
2048 $is_closer = parent::is_tag_closer();
2049 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
2050 $op = "{$op_sigil}{$token_name}";
2051
2052 switch ( $op ) {
2053 /*
2054 * > A character token that is one of U+0009 CHARACTER TABULATION,
2055 * > U+000A LINE FEED (LF), U+000C FORM FEED (FF),
2056 * > U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
2057 */
2058 case '#text':
2059 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
2060 // Insert the character.
2061 $this->insert_html_element( $this->state->current_token );
2062 return true;
2063 }
2064 goto after_head_anything_else;
2065 break;
2066
2067 /*
2068 * > A comment token
2069 */
2070 case '#comment':
2071 case '#funky-comment':
2072 case '#presumptuous-tag':
2073 $this->insert_html_element( $this->state->current_token );
2074 return true;
2075
2076 /*
2077 * > A DOCTYPE token
2078 */
2079 case 'html':
2080 // Parse error: ignore the token.
2081 return $this->step();
2082
2083 /*
2084 * > A start tag whose tag name is "html"
2085 */
2086 case '+HTML':
2087 return $this->step_in_body();
2088
2089 /*
2090 * > A start tag whose tag name is "body"
2091 */
2092 case '+BODY':
2093 $this->insert_html_element( $this->state->current_token );
2094 $this->state->frameset_ok = false;
2095 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
2096 return true;
2097
2098 /*
2099 * > A start tag whose tag name is "frameset"
2100 */
2101 case '+FRAMESET':
2102 $this->insert_html_element( $this->state->current_token );
2103 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET;
2104 return true;
2105
2106 /*
2107 * > A start tag whose tag name is one of: "base", "basefont", "bgsound",
2108 * > "link", "meta", "noframes", "script", "style", "template", "title"
2109 *
2110 * Anything here is a parse error.
2111 */
2112 case '+BASE':
2113 case '+BASEFONT':
2114 case '+BGSOUND':
2115 case '+LINK':
2116 case '+META':
2117 case '+NOFRAMES':
2118 case '+SCRIPT':
2119 case '+STYLE':
2120 case '+TEMPLATE':
2121 case '+TITLE':
2122 /*
2123 * > Push the node pointed to by the head element pointer onto the stack of open elements.
2124 * > Process the token using the rules for the "in head" insertion mode.
2125 * > Remove the node pointed to by the head element pointer from the stack of open elements. (It might not be the current node at this point.)
2126 */
2127 $this->bail( 'Cannot process elements after HEAD which reopen the HEAD element.' );
2128 /*
2129 * Do not leave this break in when adding support; it's here to prevent
2130 * WPCS from getting confused at the switch structure without a return,
2131 * because it doesn't know that `bail()` always throws.
2132 */
2133 break;
2134
2135 /*
2136 * > An end tag whose tag name is "template"
2137 */
2138 case '-TEMPLATE':
2139 return $this->step_in_head();
2140
2141 /*
2142 * > An end tag whose tag name is one of: "body", "html", "br"
2143 *
2144 * Closing BR tags are always reported by the Tag Processor as opening tags.
2145 */
2146 case '-BODY':
2147 case '-HTML':
2148 /*
2149 * > Act as described in the "anything else" entry below.
2150 */
2151 goto after_head_anything_else;
2152 break;
2153 }
2154
2155 /*
2156 * > A start tag whose tag name is "head"
2157 * > Any other end tag
2158 */
2159 if ( '+HEAD' === $op || $is_closer ) {
2160 // Parse error: ignore the token.
2161 return $this->step();
2162 }
2163
2164 /*
2165 * > Anything else
2166 * > Insert an HTML element for a "body" start tag token with no attributes.
2167 */
2168 after_head_anything_else:
2169 $this->insert_virtual_node( 'BODY' );
2170 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
2171 return $this->step( self::REPROCESS_CURRENT_NODE );
2172 }
2173
2174 /**
2175 * Parses next element in the 'in body' insertion mode.
2176 *
2177 * This internal function performs the 'in body' insertion mode
2178 * logic for the generalized WP_HTML_Processor::step() function.
2179 *
2180 * @since 6.4.0
2181 *
2182 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
2183 *
2184 * @see https://html.spec.whatwg.org/#parsing-main-inbody
2185 * @see WP_HTML_Processor::step
2186 *
2187 * @return bool Whether an element was found.
2188 */
2189 private function step_in_body(): bool {
2190 $token_name = $this->get_token_name();
2191 $token_type = $this->get_token_type();
2192 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
2193 $op = "{$op_sigil}{$token_name}";
2194
2195 switch ( $op ) {
2196 case '#text':
2197 /*
2198 * > A character token that is U+0000 NULL
2199 *
2200 * Any successive sequence of NULL bytes is ignored and won't
2201 * trigger active format reconstruction. Therefore, if the text
2202 * only comprises NULL bytes then the token should be ignored
2203 * here, but if there are any other characters in the stream
2204 * the active formats should be reconstructed.
2205 */
2206 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
2207 // Parse error: ignore the token.
2208 return $this->step();
2209 }
2210
2211 $this->reconstruct_active_formatting_elements();
2212
2213 /*
2214 * Whitespace-only text does not affect the frameset-ok flag.
2215 * It is probably inter-element whitespace, but it may also
2216 * contain character references which decode only to whitespace.
2217 */
2218 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) {
2219 $this->state->frameset_ok = false;
2220 }
2221
2222 $this->insert_html_element( $this->state->current_token );
2223 return true;
2224
2225 case '#comment':
2226 case '#funky-comment':
2227 case '#presumptuous-tag':
2228 $this->insert_html_element( $this->state->current_token );
2229 return true;
2230
2231 /*
2232 * > A DOCTYPE token
2233 * > Parse error. Ignore the token.
2234 */
2235 case 'html':
2236 return $this->step();
2237
2238 /*
2239 * > A start tag whose tag name is "html"
2240 */
2241 case '+HTML':
2242 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
2243 /*
2244 * > Otherwise, for each attribute on the token, check to see if the attribute
2245 * > is already present on the top element of the stack of open elements. If
2246 * > it is not, add the attribute and its corresponding value to that element.
2247 *
2248 * This parser does not currently support this behavior: ignore the token.
2249 */
2250 }
2251
2252 // Ignore the token.
2253 return $this->step();
2254
2255 /*
2256 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link",
2257 * > "meta", "noframes", "script", "style", "template", "title"
2258 * >
2259 * > An end tag whose tag name is "template"
2260 */
2261 case '+BASE':
2262 case '+BASEFONT':
2263 case '+BGSOUND':
2264 case '+LINK':
2265 case '+META':
2266 case '+NOFRAMES':
2267 case '+SCRIPT':
2268 case '+STYLE':
2269 case '+TEMPLATE':
2270 case '+TITLE':
2271 case '-TEMPLATE':
2272 return $this->step_in_head();
2273
2274 /*
2275 * > A start tag whose tag name is "body"
2276 *
2277 * This tag in the IN BODY insertion mode is a parse error.
2278 */
2279 case '+BODY':
2280 if (
2281 1 === $this->state->stack_of_open_elements->count() ||
2282 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) ||
2283 $this->state->stack_of_open_elements->contains( 'TEMPLATE' )
2284 ) {
2285 // Ignore the token.
2286 return $this->step();
2287 }
2288
2289 /*
2290 * > Otherwise, set the frameset-ok flag to "not ok"; then, for each attribute
2291 * > on the token, check to see if the attribute is already present on the body
2292 * > element (the second element) on the stack of open elements, and if it is
2293 * > not, add the attribute and its corresponding value to that element.
2294 *
2295 * This parser does not currently support this behavior: ignore the token.
2296 */
2297 $this->state->frameset_ok = false;
2298 return $this->step();
2299
2300 /*
2301 * > A start tag whose tag name is "frameset"
2302 *
2303 * This tag in the IN BODY insertion mode is a parse error.
2304 */
2305 case '+FRAMESET':
2306 if (
2307 1 === $this->state->stack_of_open_elements->count() ||
2308 'BODY' !== ( $this->state->stack_of_open_elements->at( 2 )->node_name ?? null ) ||
2309 false === $this->state->frameset_ok
2310 ) {
2311 // Ignore the token.
2312 return $this->step();
2313 }
2314
2315 /*
2316 * > Otherwise, run the following steps:
2317 */
2318 $this->bail( 'Cannot process non-ignored FRAMESET tags.' );
2319 break;
2320
2321 /*
2322 * > An end tag whose tag name is "body"
2323 */
2324 case '-BODY':
2325 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) {
2326 // Parse error: ignore the token.
2327 return $this->step();
2328 }
2329
2330 /*
2331 * > Otherwise, if there is a node in the stack of open elements that is not either a
2332 * > dd element, a dt element, an li element, an optgroup element, an option element,
2333 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody
2334 * > element, a td element, a tfoot element, a th element, a thread element, a tr
2335 * > element, the body element, or the html element, then this is a parse error.
2336 *
2337 * There is nothing to do for this parse error, so don't check for it.
2338 */
2339
2340 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY;
2341 /*
2342 * The BODY element is not removed from the stack of open elements.
2343 * Only internal state has changed, this does not qualify as a "step"
2344 * in terms of advancing through the document to another token.
2345 * Nothing has been pushed or popped.
2346 * Proceed to parse the next item.
2347 */
2348 return $this->step();
2349
2350 /*
2351 * > An end tag whose tag name is "html"
2352 */
2353 case '-HTML':
2354 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'BODY' ) ) {
2355 // Parse error: ignore the token.
2356 return $this->step();
2357 }
2358
2359 /*
2360 * > Otherwise, if there is a node in the stack of open elements that is not either a
2361 * > dd element, a dt element, an li element, an optgroup element, an option element,
2362 * > a p element, an rb element, an rp element, an rt element, an rtc element, a tbody
2363 * > element, a td element, a tfoot element, a th element, a thread element, a tr
2364 * > element, the body element, or the html element, then this is a parse error.
2365 *
2366 * There is nothing to do for this parse error, so don't check for it.
2367 */
2368
2369 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY;
2370 return $this->step( self::REPROCESS_CURRENT_NODE );
2371
2372 /*
2373 * > A start tag whose tag name is one of: "address", "article", "aside",
2374 * > "blockquote", "center", "details", "dialog", "dir", "div", "dl",
2375 * > "fieldset", "figcaption", "figure", "footer", "header", "hgroup",
2376 * > "main", "menu", "nav", "ol", "p", "search", "section", "summary", "ul"
2377 */
2378 case '+ADDRESS':
2379 case '+ARTICLE':
2380 case '+ASIDE':
2381 case '+BLOCKQUOTE':
2382 case '+CENTER':
2383 case '+DETAILS':
2384 case '+DIALOG':
2385 case '+DIR':
2386 case '+DIV':
2387 case '+DL':
2388 case '+FIELDSET':
2389 case '+FIGCAPTION':
2390 case '+FIGURE':
2391 case '+FOOTER':
2392 case '+HEADER':
2393 case '+HGROUP':
2394 case '+MAIN':
2395 case '+MENU':
2396 case '+NAV':
2397 case '+OL':
2398 case '+P':
2399 case '+SEARCH':
2400 case '+SECTION':
2401 case '+SUMMARY':
2402 case '+UL':
2403 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2404 $this->close_a_p_element();
2405 }
2406
2407 $this->insert_html_element( $this->state->current_token );
2408 return true;
2409
2410 /*
2411 * > A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
2412 */
2413 case '+H1':
2414 case '+H2':
2415 case '+H3':
2416 case '+H4':
2417 case '+H5':
2418 case '+H6':
2419 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2420 $this->close_a_p_element();
2421 }
2422
2423 if (
2424 in_array(
2425 $this->state->stack_of_open_elements->current_node()->node_name,
2426 array( 'H1', 'H2', 'H3', 'H4', 'H5', 'H6' ),
2427 true
2428 )
2429 ) {
2430 // @todo Indicate a parse error once it's possible.
2431 $this->state->stack_of_open_elements->pop();
2432 }
2433
2434 $this->insert_html_element( $this->state->current_token );
2435 return true;
2436
2437 /*
2438 * > A start tag whose tag name is one of: "pre", "listing"
2439 */
2440 case '+PRE':
2441 case '+LISTING':
2442 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2443 $this->close_a_p_element();
2444 }
2445
2446 /*
2447 * > If the next token is a U+000A LINE FEED (LF) character token,
2448 * > then ignore that token and move on to the next one. (Newlines
2449 * > at the start of pre blocks are ignored as an authoring convenience.)
2450 *
2451 * This is handled in `get_modifiable_text()`.
2452 */
2453
2454 $this->insert_html_element( $this->state->current_token );
2455 $this->state->frameset_ok = false;
2456 return true;
2457
2458 /*
2459 * > A start tag whose tag name is "form"
2460 */
2461 case '+FORM':
2462 $stack_contains_template = $this->state->stack_of_open_elements->contains( 'TEMPLATE' );
2463
2464 if ( isset( $this->state->form_element ) && ! $stack_contains_template ) {
2465 // Parse error: ignore the token.
2466 return $this->step();
2467 }
2468
2469 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2470 $this->close_a_p_element();
2471 }
2472
2473 $this->insert_html_element( $this->state->current_token );
2474 if ( ! $stack_contains_template ) {
2475 $this->state->form_element = $this->state->current_token;
2476 }
2477
2478 return true;
2479
2480 /*
2481 * > A start tag whose tag name is "li"
2482 * > A start tag whose tag name is one of: "dd", "dt"
2483 */
2484 case '+DD':
2485 case '+DT':
2486 case '+LI':
2487 $this->state->frameset_ok = false;
2488 $node = $this->state->stack_of_open_elements->current_node();
2489 $is_li = 'LI' === $token_name;
2490
2491 in_body_list_loop:
2492 /*
2493 * The logic for LI and DT/DD is the same except for one point: LI elements _only_
2494 * close other LI elements, but a DT or DD element closes _any_ open DT or DD element.
2495 */
2496 if ( $is_li ? 'LI' === $node->node_name : ( 'DD' === $node->node_name || 'DT' === $node->node_name ) ) {
2497 $node_name = $is_li ? 'LI' : $node->node_name;
2498 $this->generate_implied_end_tags( $node_name );
2499 if ( ! $this->state->stack_of_open_elements->current_node_is( $node_name ) ) {
2500 // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2501 }
2502
2503 $this->state->stack_of_open_elements->pop_until( $node_name );
2504 goto in_body_list_done;
2505 }
2506
2507 if (
2508 'ADDRESS' !== $node->node_name &&
2509 'DIV' !== $node->node_name &&
2510 'P' !== $node->node_name &&
2511 self::is_special( $node )
2512 ) {
2513 /*
2514 * > If node is in the special category, but is not an address, div,
2515 * > or p element, then jump to the step labeled done below.
2516 */
2517 goto in_body_list_done;
2518 } else {
2519 /*
2520 * > Otherwise, set node to the previous entry in the stack of open elements
2521 * > and return to the step labeled loop.
2522 */
2523 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) {
2524 $node = $item;
2525 break;
2526 }
2527 goto in_body_list_loop;
2528 }
2529
2530 in_body_list_done:
2531 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2532 $this->close_a_p_element();
2533 }
2534
2535 $this->insert_html_element( $this->state->current_token );
2536 return true;
2537
2538 case '+PLAINTEXT':
2539 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2540 $this->close_a_p_element();
2541 }
2542
2543 /*
2544 * @todo This may need to be handled in the Tag Processor and turn into
2545 * a single self-contained tag like TEXTAREA, whose modifiable text
2546 * is the rest of the input document as plaintext.
2547 */
2548 $this->bail( 'Cannot process PLAINTEXT elements.' );
2549 break;
2550
2551 /*
2552 * > A start tag whose tag name is "button"
2553 */
2554 case '+BUTTON':
2555 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'BUTTON' ) ) {
2556 // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2557 $this->generate_implied_end_tags();
2558 $this->state->stack_of_open_elements->pop_until( 'BUTTON' );
2559 }
2560
2561 $this->reconstruct_active_formatting_elements();
2562 $this->insert_html_element( $this->state->current_token );
2563 $this->state->frameset_ok = false;
2564
2565 return true;
2566
2567 /*
2568 * > An end tag whose tag name is one of: "address", "article", "aside", "blockquote",
2569 * > "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset",
2570 * > "figcaption", "figure", "footer", "header", "hgroup", "listing", "main",
2571 * > "menu", "nav", "ol", "pre", "search", "section", "summary", "ul"
2572 */
2573 case '-ADDRESS':
2574 case '-ARTICLE':
2575 case '-ASIDE':
2576 case '-BLOCKQUOTE':
2577 case '-BUTTON':
2578 case '-CENTER':
2579 case '-DETAILS':
2580 case '-DIALOG':
2581 case '-DIR':
2582 case '-DIV':
2583 case '-DL':
2584 case '-FIELDSET':
2585 case '-FIGCAPTION':
2586 case '-FIGURE':
2587 case '-FOOTER':
2588 case '-HEADER':
2589 case '-HGROUP':
2590 case '-LISTING':
2591 case '-MAIN':
2592 case '-MENU':
2593 case '-NAV':
2594 case '-OL':
2595 case '-PRE':
2596 case '-SEARCH':
2597 case '-SECTION':
2598 case '-SUMMARY':
2599 case '-UL':
2600 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) {
2601 // @todo Report parse error.
2602 // Ignore the token.
2603 return $this->step();
2604 }
2605
2606 $this->generate_implied_end_tags();
2607 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2608 // @todo Record parse error: this error doesn't impact parsing.
2609 }
2610 $this->state->stack_of_open_elements->pop_until( $token_name );
2611 return true;
2612
2613 /*
2614 * > An end tag whose tag name is "form"
2615 */
2616 case '-FORM':
2617 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
2618 $node = $this->state->form_element;
2619 $this->state->form_element = null;
2620
2621 /*
2622 * > If node is null or if the stack of open elements does not have node
2623 * > in scope, then this is a parse error; return and ignore the token.
2624 *
2625 * @todo It's necessary to check if the form token itself is in scope, not
2626 * simply whether any FORM is in scope.
2627 */
2628 if (
2629 null === $node ||
2630 ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' )
2631 ) {
2632 // Parse error: ignore the token.
2633 return $this->step();
2634 }
2635
2636 $this->generate_implied_end_tags();
2637 if ( $node !== $this->state->stack_of_open_elements->current_node() ) {
2638 // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2639 $this->bail( 'Cannot close a FORM when other elements remain open as this would throw off the breadcrumbs for the following tokens.' );
2640 }
2641
2642 $this->state->stack_of_open_elements->remove_node( $node );
2643 return true;
2644 } else {
2645 /*
2646 * > If the stack of open elements does not have a form element in scope,
2647 * > then this is a parse error; return and ignore the token.
2648 *
2649 * Note that unlike in the clause above, this is checking for any FORM in scope.
2650 */
2651 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( 'FORM' ) ) {
2652 // Parse error: ignore the token.
2653 return $this->step();
2654 }
2655
2656 $this->generate_implied_end_tags();
2657
2658 if ( ! $this->state->stack_of_open_elements->current_node_is( 'FORM' ) ) {
2659 // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2660 }
2661
2662 $this->state->stack_of_open_elements->pop_until( 'FORM' );
2663 return true;
2664 }
2665 break;
2666
2667 /*
2668 * > An end tag whose tag name is "p"
2669 */
2670 case '-P':
2671 if ( ! $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2672 $this->insert_html_element( $this->state->current_token );
2673 }
2674
2675 $this->close_a_p_element();
2676 return true;
2677
2678 /*
2679 * > An end tag whose tag name is "li"
2680 * > An end tag whose tag name is one of: "dd", "dt"
2681 */
2682 case '-DD':
2683 case '-DT':
2684 case '-LI':
2685 if (
2686 /*
2687 * An end tag whose tag name is "li":
2688 * If the stack of open elements does not have an li element in list item scope,
2689 * then this is a parse error; ignore the token.
2690 */
2691 (
2692 'LI' === $token_name &&
2693 ! $this->state->stack_of_open_elements->has_element_in_list_item_scope( 'LI' )
2694 ) ||
2695 /*
2696 * An end tag whose tag name is one of: "dd", "dt":
2697 * If the stack of open elements does not have an element in scope that is an
2698 * HTML element with the same tag name as that of the token, then this is a
2699 * parse error; ignore the token.
2700 */
2701 (
2702 'LI' !== $token_name &&
2703 ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name )
2704 )
2705 ) {
2706 /*
2707 * This is a parse error, ignore the token.
2708 *
2709 * @todo Indicate a parse error once it's possible.
2710 */
2711 return $this->step();
2712 }
2713
2714 $this->generate_implied_end_tags( $token_name );
2715
2716 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2717 // @todo Indicate a parse error once it's possible. This error does not impact the logic here.
2718 }
2719
2720 $this->state->stack_of_open_elements->pop_until( $token_name );
2721 return true;
2722
2723 /*
2724 * > An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
2725 */
2726 case '-H1':
2727 case '-H2':
2728 case '-H3':
2729 case '-H4':
2730 case '-H5':
2731 case '-H6':
2732 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( '(internal: H1 through H6 - do not use)' ) ) {
2733 /*
2734 * This is a parse error; ignore the token.
2735 *
2736 * @todo Indicate a parse error once it's possible.
2737 */
2738 return $this->step();
2739 }
2740
2741 $this->generate_implied_end_tags();
2742
2743 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2744 // @todo Record parse error: this error doesn't impact parsing.
2745 }
2746
2747 $this->state->stack_of_open_elements->pop_until( '(internal: H1 through H6 - do not use)' );
2748 return true;
2749
2750 /*
2751 * > A start tag whose tag name is "a"
2752 */
2753 case '+A':
2754 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
2755 switch ( $item->node_name ) {
2756 case 'marker':
2757 break 2;
2758
2759 case 'A':
2760 $this->run_adoption_agency_algorithm();
2761 $this->state->active_formatting_elements->remove_node( $item );
2762 $this->state->stack_of_open_elements->remove_node( $item );
2763 break 2;
2764 }
2765 }
2766
2767 $this->reconstruct_active_formatting_elements();
2768 $this->insert_html_element( $this->state->current_token );
2769 $this->state->active_formatting_elements->push( $this->state->current_token );
2770 return true;
2771
2772 /*
2773 * > A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i",
2774 * > "s", "small", "strike", "strong", "tt", "u"
2775 */
2776 case '+B':
2777 case '+BIG':
2778 case '+CODE':
2779 case '+EM':
2780 case '+FONT':
2781 case '+I':
2782 case '+S':
2783 case '+SMALL':
2784 case '+STRIKE':
2785 case '+STRONG':
2786 case '+TT':
2787 case '+U':
2788 $this->reconstruct_active_formatting_elements();
2789 $this->insert_html_element( $this->state->current_token );
2790 $this->state->active_formatting_elements->push( $this->state->current_token );
2791 return true;
2792
2793 /*
2794 * > A start tag whose tag name is "nobr"
2795 */
2796 case '+NOBR':
2797 $this->reconstruct_active_formatting_elements();
2798
2799 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'NOBR' ) ) {
2800 // Parse error.
2801 $this->run_adoption_agency_algorithm();
2802 $this->reconstruct_active_formatting_elements();
2803 }
2804
2805 $this->insert_html_element( $this->state->current_token );
2806 $this->state->active_formatting_elements->push( $this->state->current_token );
2807 return true;
2808
2809 /*
2810 * > An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i",
2811 * > "nobr", "s", "small", "strike", "strong", "tt", "u"
2812 */
2813 case '-A':
2814 case '-B':
2815 case '-BIG':
2816 case '-CODE':
2817 case '-EM':
2818 case '-FONT':
2819 case '-I':
2820 case '-NOBR':
2821 case '-S':
2822 case '-SMALL':
2823 case '-STRIKE':
2824 case '-STRONG':
2825 case '-TT':
2826 case '-U':
2827 $this->run_adoption_agency_algorithm();
2828 return true;
2829
2830 /*
2831 * > A start tag whose tag name is one of: "applet", "marquee", "object"
2832 */
2833 case '+APPLET':
2834 case '+MARQUEE':
2835 case '+OBJECT':
2836 $this->reconstruct_active_formatting_elements();
2837 $this->insert_html_element( $this->state->current_token );
2838 $this->state->active_formatting_elements->insert_marker();
2839 $this->state->frameset_ok = false;
2840 return true;
2841
2842 /*
2843 * > A end tag token whose tag name is one of: "applet", "marquee", "object"
2844 */
2845 case '-APPLET':
2846 case '-MARQUEE':
2847 case '-OBJECT':
2848 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $token_name ) ) {
2849 // Parse error: ignore the token.
2850 return $this->step();
2851 }
2852
2853 $this->generate_implied_end_tags();
2854 if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
2855 // This is a parse error.
2856 }
2857
2858 $this->state->stack_of_open_elements->pop_until( $token_name );
2859 $this->state->active_formatting_elements->clear_up_to_last_marker();
2860 return true;
2861
2862 /*
2863 * > A start tag whose tag name is "table"
2864 */
2865 case '+TABLE':
2866 /*
2867 * > If the Document is not set to quirks mode, and the stack of open elements
2868 * > has a p element in button scope, then close a p element.
2869 */
2870 if (
2871 WP_HTML_Tag_Processor::QUIRKS_MODE !== $this->compat_mode &&
2872 $this->state->stack_of_open_elements->has_p_in_button_scope()
2873 ) {
2874 $this->close_a_p_element();
2875 }
2876
2877 $this->insert_html_element( $this->state->current_token );
2878 $this->state->frameset_ok = false;
2879 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
2880 return true;
2881
2882 /*
2883 * > An end tag whose tag name is "br"
2884 *
2885 * This is prevented from happening because the Tag Processor
2886 * reports all closing BR tags as if they were opening tags.
2887 */
2888
2889 /*
2890 * > A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr"
2891 */
2892 case '+AREA':
2893 case '+BR':
2894 case '+EMBED':
2895 case '+IMG':
2896 case '+KEYGEN':
2897 case '+WBR':
2898 $this->reconstruct_active_formatting_elements();
2899 $this->insert_html_element( $this->state->current_token );
2900 $this->state->frameset_ok = false;
2901 return true;
2902
2903 /*
2904 * > A start tag whose tag name is "input"
2905 */
2906 case '+INPUT':
2907 $this->reconstruct_active_formatting_elements();
2908 $this->insert_html_element( $this->state->current_token );
2909
2910 /*
2911 * > If the token does not have an attribute with the name "type", or if it does,
2912 * > but that attribute's value is not an ASCII case-insensitive match for the
2913 * > string "hidden", then: set the frameset-ok flag to "not ok".
2914 */
2915 $type_attribute = $this->get_attribute( 'type' );
2916 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) {
2917 $this->state->frameset_ok = false;
2918 }
2919
2920 return true;
2921
2922 /*
2923 * > A start tag whose tag name is one of: "param", "source", "track"
2924 */
2925 case '+PARAM':
2926 case '+SOURCE':
2927 case '+TRACK':
2928 $this->insert_html_element( $this->state->current_token );
2929 return true;
2930
2931 /*
2932 * > A start tag whose tag name is "hr"
2933 */
2934 case '+HR':
2935 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2936 $this->close_a_p_element();
2937 }
2938 $this->insert_html_element( $this->state->current_token );
2939 $this->state->frameset_ok = false;
2940 return true;
2941
2942 /*
2943 * > A start tag whose tag name is "image"
2944 */
2945 case '+IMAGE':
2946 /*
2947 * > Parse error. Change the token's tag name to "img" and reprocess it. (Don't ask.)
2948 *
2949 * Note that this is handled elsewhere, so it should not be possible to reach this code.
2950 */
2951 $this->bail( "Cannot process an IMAGE tag. (Don't ask.)" );
2952 break;
2953
2954 /*
2955 * > A start tag whose tag name is "textarea"
2956 */
2957 case '+TEXTAREA':
2958 $this->insert_html_element( $this->state->current_token );
2959
2960 /*
2961 * > If the next token is a U+000A LINE FEED (LF) character token, then ignore
2962 * > that token and move on to the next one. (Newlines at the start of
2963 * > textarea elements are ignored as an authoring convenience.)
2964 *
2965 * This is handled in `get_modifiable_text()`.
2966 */
2967
2968 $this->state->frameset_ok = false;
2969
2970 /*
2971 * > Switch the insertion mode to "text".
2972 *
2973 * As a self-contained node, this behavior is handled in the Tag Processor.
2974 */
2975 return true;
2976
2977 /*
2978 * > A start tag whose tag name is "xmp"
2979 */
2980 case '+XMP':
2981 if ( $this->state->stack_of_open_elements->has_p_in_button_scope() ) {
2982 $this->close_a_p_element();
2983 }
2984
2985 $this->reconstruct_active_formatting_elements();
2986 $this->state->frameset_ok = false;
2987
2988 /*
2989 * > Follow the generic raw text element parsing algorithm.
2990 *
2991 * As a self-contained node, this behavior is handled in the Tag Processor.
2992 */
2993 $this->insert_html_element( $this->state->current_token );
2994 return true;
2995
2996 /*
2997 * A start tag whose tag name is "iframe"
2998 */
2999 case '+IFRAME':
3000 $this->state->frameset_ok = false;
3001
3002 /*
3003 * > Follow the generic raw text element parsing algorithm.
3004 *
3005 * As a self-contained node, this behavior is handled in the Tag Processor.
3006 */
3007 $this->insert_html_element( $this->state->current_token );
3008 return true;
3009
3010 /*
3011 * > A start tag whose tag name is "noembed"
3012 * > A start tag whose tag name is "noscript", if the scripting flag is enabled
3013 *
3014 * The scripting flag is never enabled in this parser.
3015 */
3016 case '+NOEMBED':
3017 $this->insert_html_element( $this->state->current_token );
3018 return true;
3019
3020 /*
3021 * > A start tag whose tag name is "select"
3022 */
3023 case '+SELECT':
3024 $this->reconstruct_active_formatting_elements();
3025 $this->insert_html_element( $this->state->current_token );
3026 $this->state->frameset_ok = false;
3027
3028 switch ( $this->state->insertion_mode ) {
3029 /*
3030 * > If the insertion mode is one of "in table", "in caption", "in table body", "in row",
3031 * > or "in cell", then switch the insertion mode to "in select in table".
3032 */
3033 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
3034 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
3035 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
3036 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
3037 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
3038 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE;
3039 break;
3040
3041 /*
3042 * > Otherwise, switch the insertion mode to "in select".
3043 */
3044 default:
3045 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT;
3046 break;
3047 }
3048 return true;
3049
3050 /*
3051 * > A start tag whose tag name is one of: "optgroup", "option"
3052 */
3053 case '+OPTGROUP':
3054 case '+OPTION':
3055 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
3056 $this->state->stack_of_open_elements->pop();
3057 }
3058 $this->reconstruct_active_formatting_elements();
3059 $this->insert_html_element( $this->state->current_token );
3060 return true;
3061
3062 /*
3063 * > A start tag whose tag name is one of: "rb", "rtc"
3064 */
3065 case '+RB':
3066 case '+RTC':
3067 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) {
3068 $this->generate_implied_end_tags();
3069
3070 if ( $this->state->stack_of_open_elements->current_node_is( 'RUBY' ) ) {
3071 // @todo Indicate a parse error once it's possible.
3072 }
3073 }
3074
3075 $this->insert_html_element( $this->state->current_token );
3076 return true;
3077
3078 /*
3079 * > A start tag whose tag name is one of: "rp", "rt"
3080 */
3081 case '+RP':
3082 case '+RT':
3083 if ( $this->state->stack_of_open_elements->has_element_in_scope( 'RUBY' ) ) {
3084 $this->generate_implied_end_tags( 'RTC' );
3085
3086 $current_node_name = $this->state->stack_of_open_elements->current_node()->node_name;
3087 if ( 'RTC' === $current_node_name || 'RUBY' === $current_node_name ) {
3088 // @todo Indicate a parse error once it's possible.
3089 }
3090 }
3091
3092 $this->insert_html_element( $this->state->current_token );
3093 return true;
3094
3095 /*
3096 * > A start tag whose tag name is "math"
3097 */
3098 case '+MATH':
3099 $this->reconstruct_active_formatting_elements();
3100
3101 /*
3102 * @todo Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.)
3103 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.)
3104 *
3105 * These ought to be handled in the attribute methods.
3106 */
3107 $this->state->current_token->namespace = 'math';
3108 $this->insert_html_element( $this->state->current_token );
3109 if ( $this->state->current_token->has_self_closing_flag ) {
3110 $this->state->stack_of_open_elements->pop();
3111 }
3112 return true;
3113
3114 /*
3115 * > A start tag whose tag name is "svg"
3116 */
3117 case '+SVG':
3118 $this->reconstruct_active_formatting_elements();
3119
3120 /*
3121 * @todo Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.)
3122 * @todo Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.)
3123 *
3124 * These ought to be handled in the attribute methods.
3125 */
3126 $this->state->current_token->namespace = 'svg';
3127 $this->insert_html_element( $this->state->current_token );
3128 if ( $this->state->current_token->has_self_closing_flag ) {
3129 $this->state->stack_of_open_elements->pop();
3130 }
3131 return true;
3132
3133 /*
3134 * > A start tag whose tag name is one of: "caption", "col", "colgroup",
3135 * > "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr"
3136 */
3137 case '+CAPTION':
3138 case '+COL':
3139 case '+COLGROUP':
3140 case '+FRAME':
3141 case '+HEAD':
3142 case '+TBODY':
3143 case '+TD':
3144 case '+TFOOT':
3145 case '+TH':
3146 case '+THEAD':
3147 case '+TR':
3148 // Parse error. Ignore the token.
3149 return $this->step();
3150 }
3151
3152 if ( ! parent::is_tag_closer() ) {
3153 /*
3154 * > Any other start tag
3155 */
3156 $this->reconstruct_active_formatting_elements();
3157 $this->insert_html_element( $this->state->current_token );
3158 return true;
3159 } else {
3160 /*
3161 * > Any other end tag
3162 */
3163
3164 /*
3165 * Find the corresponding tag opener in the stack of open elements, if
3166 * it exists before reaching a special element, which provides a kind
3167 * of boundary in the stack. For example, a `</custom-tag>` should not
3168 * close anything beyond its containing `P` or `DIV` element.
3169 */
3170 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
3171 if ( 'html' === $node->namespace && $token_name === $node->node_name ) {
3172 break;
3173 }
3174
3175 if ( self::is_special( $node ) ) {
3176 // This is a parse error, ignore the token.
3177 return $this->step();
3178 }
3179 }
3180
3181 $this->generate_implied_end_tags( $token_name );
3182 if ( $node !== $this->state->stack_of_open_elements->current_node() ) {
3183 // @todo Record parse error: this error doesn't impact parsing.
3184 }
3185
3186 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
3187 $this->state->stack_of_open_elements->pop();
3188 if ( $node === $item ) {
3189 return true;
3190 }
3191 }
3192 }
3193
3194 $this->bail( 'Should not have been able to reach end of IN BODY processing. Check HTML API code.' );
3195 // This unnecessary return prevents tools from inaccurately reporting type errors.
3196 return false;
3197 }
3198
3199 /**
3200 * Parses next element in the 'in table' insertion mode.
3201 *
3202 * This internal function performs the 'in table' insertion mode
3203 * logic for the generalized WP_HTML_Processor::step() function.
3204 *
3205 * @since 6.7.0
3206 *
3207 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3208 *
3209 * @see https://html.spec.whatwg.org/#parsing-main-intable
3210 * @see WP_HTML_Processor::step
3211 *
3212 * @return bool Whether an element was found.
3213 */
3214 private function step_in_table(): bool {
3215 $token_name = $this->get_token_name();
3216 $token_type = $this->get_token_type();
3217 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
3218 $op = "{$op_sigil}{$token_name}";
3219
3220 switch ( $op ) {
3221 /*
3222 * > A character token, if the current node is table,
3223 * > tbody, template, tfoot, thead, or tr element
3224 */
3225 case '#text':
3226 $current_node = $this->state->stack_of_open_elements->current_node();
3227 $current_node_name = $current_node ? $current_node->node_name : null;
3228 if (
3229 $current_node_name && (
3230 'TABLE' === $current_node_name ||
3231 'TBODY' === $current_node_name ||
3232 'TEMPLATE' === $current_node_name ||
3233 'TFOOT' === $current_node_name ||
3234 'THEAD' === $current_node_name ||
3235 'TR' === $current_node_name
3236 )
3237 ) {
3238 /*
3239 * If the text is empty after processing HTML entities and stripping
3240 * U+0000 NULL bytes then ignore the token.
3241 */
3242 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
3243 return $this->step();
3244 }
3245
3246 /*
3247 * This follows the rules for "in table text" insertion mode.
3248 *
3249 * Whitespace-only text nodes are inserted in-place. Otherwise
3250 * foster parenting is enabled and the nodes would be
3251 * inserted out-of-place.
3252 *
3253 * > If any of the tokens in the pending table character tokens
3254 * > list are character tokens that are not ASCII whitespace,
3255 * > then this is a parse error: reprocess the character tokens
3256 * > in the pending table character tokens list using the rules
3257 * > given in the "anything else" entry in the "in table"
3258 * > insertion mode.
3259 * >
3260 * > Otherwise, insert the characters given by the pending table
3261 * > character tokens list.
3262 *
3263 * @see https://html.spec.whatwg.org/#parsing-main-intabletext
3264 */
3265 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
3266 $this->insert_html_element( $this->state->current_token );
3267 return true;
3268 }
3269
3270 // Non-whitespace would trigger fostering, unsupported at this time.
3271 $this->bail( 'Foster parenting is not supported.' );
3272 break;
3273 }
3274 break;
3275
3276 /*
3277 * > A comment token
3278 */
3279 case '#comment':
3280 case '#funky-comment':
3281 case '#presumptuous-tag':
3282 $this->insert_html_element( $this->state->current_token );
3283 return true;
3284
3285 /*
3286 * > A DOCTYPE token
3287 */
3288 case 'html':
3289 // Parse error: ignore the token.
3290 return $this->step();
3291
3292 /*
3293 * > A start tag whose tag name is "caption"
3294 */
3295 case '+CAPTION':
3296 $this->state->stack_of_open_elements->clear_to_table_context();
3297 $this->state->active_formatting_elements->insert_marker();
3298 $this->insert_html_element( $this->state->current_token );
3299 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION;
3300 return true;
3301
3302 /*
3303 * > A start tag whose tag name is "colgroup"
3304 */
3305 case '+COLGROUP':
3306 $this->state->stack_of_open_elements->clear_to_table_context();
3307 $this->insert_html_element( $this->state->current_token );
3308 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
3309 return true;
3310
3311 /*
3312 * > A start tag whose tag name is "col"
3313 */
3314 case '+COL':
3315 $this->state->stack_of_open_elements->clear_to_table_context();
3316
3317 /*
3318 * > Insert an HTML element for a "colgroup" start tag token with no attributes,
3319 * > then switch the insertion mode to "in column group".
3320 */
3321 $this->insert_virtual_node( 'COLGROUP' );
3322 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
3323 return $this->step( self::REPROCESS_CURRENT_NODE );
3324
3325 /*
3326 * > A start tag whose tag name is one of: "tbody", "tfoot", "thead"
3327 */
3328 case '+TBODY':
3329 case '+TFOOT':
3330 case '+THEAD':
3331 $this->state->stack_of_open_elements->clear_to_table_context();
3332 $this->insert_html_element( $this->state->current_token );
3333 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3334 return true;
3335
3336 /*
3337 * > A start tag whose tag name is one of: "td", "th", "tr"
3338 */
3339 case '+TD':
3340 case '+TH':
3341 case '+TR':
3342 $this->state->stack_of_open_elements->clear_to_table_context();
3343 /*
3344 * > Insert an HTML element for a "tbody" start tag token with no attributes,
3345 * > then switch the insertion mode to "in table body".
3346 */
3347 $this->insert_virtual_node( 'TBODY' );
3348 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3349 return $this->step( self::REPROCESS_CURRENT_NODE );
3350
3351 /*
3352 * > A start tag whose tag name is "table"
3353 *
3354 * This tag in the IN TABLE insertion mode is a parse error.
3355 */
3356 case '+TABLE':
3357 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) {
3358 return $this->step();
3359 }
3360
3361 $this->state->stack_of_open_elements->pop_until( 'TABLE' );
3362 $this->reset_insertion_mode_appropriately();
3363 return $this->step( self::REPROCESS_CURRENT_NODE );
3364
3365 /*
3366 * > An end tag whose tag name is "table"
3367 */
3368 case '-TABLE':
3369 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TABLE' ) ) {
3370 // @todo Indicate a parse error once it's possible.
3371 return $this->step();
3372 }
3373
3374 $this->state->stack_of_open_elements->pop_until( 'TABLE' );
3375 $this->reset_insertion_mode_appropriately();
3376 return true;
3377
3378 /*
3379 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
3380 */
3381 case '-BODY':
3382 case '-CAPTION':
3383 case '-COL':
3384 case '-COLGROUP':
3385 case '-HTML':
3386 case '-TBODY':
3387 case '-TD':
3388 case '-TFOOT':
3389 case '-TH':
3390 case '-THEAD':
3391 case '-TR':
3392 // Parse error: ignore the token.
3393 return $this->step();
3394
3395 /*
3396 * > A start tag whose tag name is one of: "style", "script", "template"
3397 * > An end tag whose tag name is "template"
3398 */
3399 case '+STYLE':
3400 case '+SCRIPT':
3401 case '+TEMPLATE':
3402 case '-TEMPLATE':
3403 /*
3404 * > Process the token using the rules for the "in head" insertion mode.
3405 */
3406 return $this->step_in_head();
3407
3408 /*
3409 * > A start tag whose tag name is "input"
3410 *
3411 * > If the token does not have an attribute with the name "type", or if it does, but
3412 * > that attribute's value is not an ASCII case-insensitive match for the string
3413 * > "hidden", then: act as described in the "anything else" entry below.
3414 */
3415 case '+INPUT':
3416 $type_attribute = $this->get_attribute( 'type' );
3417 if ( ! is_string( $type_attribute ) || 'hidden' !== strtolower( $type_attribute ) ) {
3418 goto anything_else;
3419 }
3420 // @todo Indicate a parse error once it's possible.
3421 $this->insert_html_element( $this->state->current_token );
3422 return true;
3423
3424 /*
3425 * > A start tag whose tag name is "form"
3426 *
3427 * This tag in the IN TABLE insertion mode is a parse error.
3428 */
3429 case '+FORM':
3430 if (
3431 $this->state->stack_of_open_elements->has_element_in_scope( 'TEMPLATE' ) ||
3432 isset( $this->state->form_element )
3433 ) {
3434 return $this->step();
3435 }
3436
3437 // This FORM is special because it immediately closes and cannot have other children.
3438 $this->insert_html_element( $this->state->current_token );
3439 $this->state->form_element = $this->state->current_token;
3440 $this->state->stack_of_open_elements->pop();
3441 return true;
3442 }
3443
3444 /*
3445 * > Anything else
3446 * > Parse error. Enable foster parenting, process the token using the rules for the
3447 * > "in body" insertion mode, and then disable foster parenting.
3448 *
3449 * @todo Indicate a parse error once it's possible.
3450 */
3451 anything_else:
3452 $this->bail( 'Foster parenting is not supported.' );
3453 }
3454
3455 /**
3456 * Parses next element in the 'in table text' insertion mode.
3457 *
3458 * This internal function performs the 'in table text' insertion mode
3459 * logic for the generalized WP_HTML_Processor::step() function.
3460 *
3461 * @since 6.7.0 Stub implementation.
3462 *
3463 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3464 *
3465 * @see https://html.spec.whatwg.org/#parsing-main-intabletext
3466 * @see WP_HTML_Processor::step
3467 *
3468 * @return bool Whether an element was found.
3469 */
3470 private function step_in_table_text(): bool {
3471 $this->bail( 'No support for parsing in the ' . WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT . ' state.' );
3472 }
3473
3474 /**
3475 * Parses next element in the 'in caption' insertion mode.
3476 *
3477 * This internal function performs the 'in caption' insertion mode
3478 * logic for the generalized WP_HTML_Processor::step() function.
3479 *
3480 * @since 6.7.0
3481 *
3482 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3483 *
3484 * @see https://html.spec.whatwg.org/#parsing-main-incaption
3485 * @see WP_HTML_Processor::step
3486 *
3487 * @return bool Whether an element was found.
3488 */
3489 private function step_in_caption(): bool {
3490 $tag_name = $this->get_tag();
3491 $op_sigil = $this->is_tag_closer() ? '-' : '+';
3492 $op = "{$op_sigil}{$tag_name}";
3493
3494 switch ( $op ) {
3495 /*
3496 * > An end tag whose tag name is "caption"
3497 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr"
3498 * > An end tag whose tag name is "table"
3499 *
3500 * These tag handling rules are identical except for the final instruction.
3501 * Handle them in a single block.
3502 */
3503 case '-CAPTION':
3504 case '+CAPTION':
3505 case '+COL':
3506 case '+COLGROUP':
3507 case '+TBODY':
3508 case '+TD':
3509 case '+TFOOT':
3510 case '+TH':
3511 case '+THEAD':
3512 case '+TR':
3513 case '-TABLE':
3514 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'CAPTION' ) ) {
3515 // Parse error: ignore the token.
3516 return $this->step();
3517 }
3518
3519 $this->generate_implied_end_tags();
3520 if ( ! $this->state->stack_of_open_elements->current_node_is( 'CAPTION' ) ) {
3521 // @todo Indicate a parse error once it's possible.
3522 }
3523
3524 $this->state->stack_of_open_elements->pop_until( 'CAPTION' );
3525 $this->state->active_formatting_elements->clear_up_to_last_marker();
3526 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3527
3528 // If this is not a CAPTION end tag, the token should be reprocessed.
3529 if ( '-CAPTION' === $op ) {
3530 return true;
3531 }
3532 return $this->step( self::REPROCESS_CURRENT_NODE );
3533
3534 /**
3535 * > An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
3536 */
3537 case '-BODY':
3538 case '-COL':
3539 case '-COLGROUP':
3540 case '-HTML':
3541 case '-TBODY':
3542 case '-TD':
3543 case '-TFOOT':
3544 case '-TH':
3545 case '-THEAD':
3546 case '-TR':
3547 // Parse error: ignore the token.
3548 return $this->step();
3549 }
3550
3551 /**
3552 * > Anything else
3553 * > Process the token using the rules for the "in body" insertion mode.
3554 */
3555 return $this->step_in_body();
3556 }
3557
3558 /**
3559 * Parses next element in the 'in column group' insertion mode.
3560 *
3561 * This internal function performs the 'in column group' insertion mode
3562 * logic for the generalized WP_HTML_Processor::step() function.
3563 *
3564 * @since 6.7.0
3565 *
3566 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3567 *
3568 * @see https://html.spec.whatwg.org/#parsing-main-incolgroup
3569 * @see WP_HTML_Processor::step
3570 *
3571 * @return bool Whether an element was found.
3572 */
3573 private function step_in_column_group(): bool {
3574 $token_name = $this->get_token_name();
3575 $token_type = $this->get_token_type();
3576 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
3577 $op = "{$op_sigil}{$token_name}";
3578
3579 switch ( $op ) {
3580 /*
3581 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
3582 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
3583 */
3584 case '#text':
3585 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
3586 // Insert the character.
3587 $this->insert_html_element( $this->state->current_token );
3588 return true;
3589 }
3590
3591 goto in_column_group_anything_else;
3592 break;
3593
3594 /*
3595 * > A comment token
3596 */
3597 case '#comment':
3598 case '#funky-comment':
3599 case '#presumptuous-tag':
3600 $this->insert_html_element( $this->state->current_token );
3601 return true;
3602
3603 /*
3604 * > A DOCTYPE token
3605 */
3606 case 'html':
3607 // @todo Indicate a parse error once it's possible.
3608 return $this->step();
3609
3610 /*
3611 * > A start tag whose tag name is "html"
3612 */
3613 case '+HTML':
3614 return $this->step_in_body();
3615
3616 /*
3617 * > A start tag whose tag name is "col"
3618 */
3619 case '+COL':
3620 $this->insert_html_element( $this->state->current_token );
3621 $this->state->stack_of_open_elements->pop();
3622 return true;
3623
3624 /*
3625 * > An end tag whose tag name is "colgroup"
3626 */
3627 case '-COLGROUP':
3628 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) {
3629 // @todo Indicate a parse error once it's possible.
3630 return $this->step();
3631 }
3632 $this->state->stack_of_open_elements->pop();
3633 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3634 return true;
3635
3636 /*
3637 * > An end tag whose tag name is "col"
3638 */
3639 case '-COL':
3640 // Parse error: ignore the token.
3641 return $this->step();
3642
3643 /*
3644 * > A start tag whose tag name is "template"
3645 * > An end tag whose tag name is "template"
3646 */
3647 case '+TEMPLATE':
3648 case '-TEMPLATE':
3649 return $this->step_in_head();
3650 }
3651
3652 in_column_group_anything_else:
3653 /*
3654 * > Anything else
3655 */
3656 if ( ! $this->state->stack_of_open_elements->current_node_is( 'COLGROUP' ) ) {
3657 // @todo Indicate a parse error once it's possible.
3658 return $this->step();
3659 }
3660 $this->state->stack_of_open_elements->pop();
3661 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3662 return $this->step( self::REPROCESS_CURRENT_NODE );
3663 }
3664
3665 /**
3666 * Parses next element in the 'in table body' insertion mode.
3667 *
3668 * This internal function performs the 'in table body' insertion mode
3669 * logic for the generalized WP_HTML_Processor::step() function.
3670 *
3671 * @since 6.7.0
3672 *
3673 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3674 *
3675 * @see https://html.spec.whatwg.org/#parsing-main-intbody
3676 * @see WP_HTML_Processor::step
3677 *
3678 * @return bool Whether an element was found.
3679 */
3680 private function step_in_table_body(): bool {
3681 $tag_name = $this->get_tag();
3682 $op_sigil = $this->is_tag_closer() ? '-' : '+';
3683 $op = "{$op_sigil}{$tag_name}";
3684
3685 switch ( $op ) {
3686 /*
3687 * > A start tag whose tag name is "tr"
3688 */
3689 case '+TR':
3690 $this->state->stack_of_open_elements->clear_to_table_body_context();
3691 $this->insert_html_element( $this->state->current_token );
3692 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3693 return true;
3694
3695 /*
3696 * > A start tag whose tag name is one of: "th", "td"
3697 */
3698 case '+TH':
3699 case '+TD':
3700 // @todo Indicate a parse error once it's possible.
3701 $this->state->stack_of_open_elements->clear_to_table_body_context();
3702 $this->insert_virtual_node( 'TR' );
3703 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3704 return $this->step( self::REPROCESS_CURRENT_NODE );
3705
3706 /*
3707 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead"
3708 */
3709 case '-TBODY':
3710 case '-TFOOT':
3711 case '-THEAD':
3712 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3713 // Parse error: ignore the token.
3714 return $this->step();
3715 }
3716
3717 $this->state->stack_of_open_elements->clear_to_table_body_context();
3718 $this->state->stack_of_open_elements->pop();
3719 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3720 return true;
3721
3722 /*
3723 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead"
3724 * > An end tag whose tag name is "table"
3725 */
3726 case '+CAPTION':
3727 case '+COL':
3728 case '+COLGROUP':
3729 case '+TBODY':
3730 case '+TFOOT':
3731 case '+THEAD':
3732 case '-TABLE':
3733 if (
3734 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TBODY' ) &&
3735 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'THEAD' ) &&
3736 ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TFOOT' )
3737 ) {
3738 // Parse error: ignore the token.
3739 return $this->step();
3740 }
3741 $this->state->stack_of_open_elements->clear_to_table_body_context();
3742 $this->state->stack_of_open_elements->pop();
3743 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
3744 return $this->step( self::REPROCESS_CURRENT_NODE );
3745
3746 /*
3747 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr"
3748 */
3749 case '-BODY':
3750 case '-CAPTION':
3751 case '-COL':
3752 case '-COLGROUP':
3753 case '-HTML':
3754 case '-TD':
3755 case '-TH':
3756 case '-TR':
3757 // Parse error: ignore the token.
3758 return $this->step();
3759 }
3760
3761 /*
3762 * > Anything else
3763 * > Process the token using the rules for the "in table" insertion mode.
3764 */
3765 return $this->step_in_table();
3766 }
3767
3768 /**
3769 * Parses next element in the 'in row' insertion mode.
3770 *
3771 * This internal function performs the 'in row' insertion mode
3772 * logic for the generalized WP_HTML_Processor::step() function.
3773 *
3774 * @since 6.7.0
3775 *
3776 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3777 *
3778 * @see https://html.spec.whatwg.org/#parsing-main-intr
3779 * @see WP_HTML_Processor::step
3780 *
3781 * @return bool Whether an element was found.
3782 */
3783 private function step_in_row(): bool {
3784 $tag_name = $this->get_tag();
3785 $op_sigil = $this->is_tag_closer() ? '-' : '+';
3786 $op = "{$op_sigil}{$tag_name}";
3787
3788 switch ( $op ) {
3789 /*
3790 * > A start tag whose tag name is one of: "th", "td"
3791 */
3792 case '+TH':
3793 case '+TD':
3794 $this->state->stack_of_open_elements->clear_to_table_row_context();
3795 $this->insert_html_element( $this->state->current_token );
3796 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL;
3797 $this->state->active_formatting_elements->insert_marker();
3798 return true;
3799
3800 /*
3801 * > An end tag whose tag name is "tr"
3802 */
3803 case '-TR':
3804 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3805 // Parse error: ignore the token.
3806 return $this->step();
3807 }
3808
3809 $this->state->stack_of_open_elements->clear_to_table_row_context();
3810 $this->state->stack_of_open_elements->pop();
3811 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3812 return true;
3813
3814 /*
3815 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr"
3816 * > An end tag whose tag name is "table"
3817 */
3818 case '+CAPTION':
3819 case '+COL':
3820 case '+COLGROUP':
3821 case '+TBODY':
3822 case '+TFOOT':
3823 case '+THEAD':
3824 case '+TR':
3825 case '-TABLE':
3826 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3827 // Parse error: ignore the token.
3828 return $this->step();
3829 }
3830
3831 $this->state->stack_of_open_elements->clear_to_table_row_context();
3832 $this->state->stack_of_open_elements->pop();
3833 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3834 return $this->step( self::REPROCESS_CURRENT_NODE );
3835
3836 /*
3837 * > An end tag whose tag name is one of: "tbody", "tfoot", "thead"
3838 */
3839 case '-TBODY':
3840 case '-TFOOT':
3841 case '-THEAD':
3842 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3843 // Parse error: ignore the token.
3844 return $this->step();
3845 }
3846
3847 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( 'TR' ) ) {
3848 // Ignore the token.
3849 return $this->step();
3850 }
3851
3852 $this->state->stack_of_open_elements->clear_to_table_row_context();
3853 $this->state->stack_of_open_elements->pop();
3854 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
3855 return $this->step( self::REPROCESS_CURRENT_NODE );
3856
3857 /*
3858 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th"
3859 */
3860 case '-BODY':
3861 case '-CAPTION':
3862 case '-COL':
3863 case '-COLGROUP':
3864 case '-HTML':
3865 case '-TD':
3866 case '-TH':
3867 // Parse error: ignore the token.
3868 return $this->step();
3869 }
3870
3871 /*
3872 * > Anything else
3873 * > Process the token using the rules for the "in table" insertion mode.
3874 */
3875 return $this->step_in_table();
3876 }
3877
3878 /**
3879 * Parses next element in the 'in cell' insertion mode.
3880 *
3881 * This internal function performs the 'in cell' insertion mode
3882 * logic for the generalized WP_HTML_Processor::step() function.
3883 *
3884 * @since 6.7.0
3885 *
3886 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3887 *
3888 * @see https://html.spec.whatwg.org/#parsing-main-intd
3889 * @see WP_HTML_Processor::step
3890 *
3891 * @return bool Whether an element was found.
3892 */
3893 private function step_in_cell(): bool {
3894 $tag_name = $this->get_tag();
3895 $op_sigil = $this->is_tag_closer() ? '-' : '+';
3896 $op = "{$op_sigil}{$tag_name}";
3897
3898 switch ( $op ) {
3899 /*
3900 * > An end tag whose tag name is one of: "td", "th"
3901 */
3902 case '-TD':
3903 case '-TH':
3904 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3905 // Parse error: ignore the token.
3906 return $this->step();
3907 }
3908
3909 $this->generate_implied_end_tags();
3910
3911 /*
3912 * @todo This needs to check if the current node is an HTML element, meaning that
3913 * when SVG and MathML support is added, this needs to differentiate between an
3914 * HTML element of the given name, such as `<center>`, and a foreign element of
3915 * the same given name.
3916 */
3917 if ( ! $this->state->stack_of_open_elements->current_node_is( $tag_name ) ) {
3918 // @todo Indicate a parse error once it's possible.
3919 }
3920
3921 $this->state->stack_of_open_elements->pop_until( $tag_name );
3922 $this->state->active_formatting_elements->clear_up_to_last_marker();
3923 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
3924 return true;
3925
3926 /*
3927 * > A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td",
3928 * > "tfoot", "th", "thead", "tr"
3929 */
3930 case '+CAPTION':
3931 case '+COL':
3932 case '+COLGROUP':
3933 case '+TBODY':
3934 case '+TD':
3935 case '+TFOOT':
3936 case '+TH':
3937 case '+THEAD':
3938 case '+TR':
3939 /*
3940 * > Assert: The stack of open elements has a td or th element in table scope.
3941 *
3942 * Nothing to do here, except to verify in tests that this never appears.
3943 */
3944
3945 $this->close_cell();
3946 return $this->step( self::REPROCESS_CURRENT_NODE );
3947
3948 /*
3949 * > An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html"
3950 */
3951 case '-BODY':
3952 case '-CAPTION':
3953 case '-COL':
3954 case '-COLGROUP':
3955 case '-HTML':
3956 // Parse error: ignore the token.
3957 return $this->step();
3958
3959 /*
3960 * > An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr"
3961 */
3962 case '-TABLE':
3963 case '-TBODY':
3964 case '-TFOOT':
3965 case '-THEAD':
3966 case '-TR':
3967 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $tag_name ) ) {
3968 // Parse error: ignore the token.
3969 return $this->step();
3970 }
3971 $this->close_cell();
3972 return $this->step( self::REPROCESS_CURRENT_NODE );
3973 }
3974
3975 /*
3976 * > Anything else
3977 * > Process the token using the rules for the "in body" insertion mode.
3978 */
3979 return $this->step_in_body();
3980 }
3981
3982 /**
3983 * Parses next element in the 'in select' insertion mode.
3984 *
3985 * This internal function performs the 'in select' insertion mode
3986 * logic for the generalized WP_HTML_Processor::step() function.
3987 *
3988 * @since 6.7.0
3989 *
3990 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
3991 *
3992 * @see https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect
3993 * @see WP_HTML_Processor::step
3994 *
3995 * @return bool Whether an element was found.
3996 */
3997 private function step_in_select(): bool {
3998 $token_name = $this->get_token_name();
3999 $token_type = $this->get_token_type();
4000 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
4001 $op = "{$op_sigil}{$token_name}";
4002
4003 switch ( $op ) {
4004 /*
4005 * > Any other character token
4006 */
4007 case '#text':
4008 /*
4009 * > A character token that is U+0000 NULL
4010 *
4011 * If a text node only comprises null bytes then it should be
4012 * entirely ignored and should not return to calling code.
4013 */
4014 if ( parent::TEXT_IS_NULL_SEQUENCE === $this->text_node_classification ) {
4015 // Parse error: ignore the token.
4016 return $this->step();
4017 }
4018
4019 $this->insert_html_element( $this->state->current_token );
4020 return true;
4021
4022 /*
4023 * > A comment token
4024 */
4025 case '#comment':
4026 case '#funky-comment':
4027 case '#presumptuous-tag':
4028 $this->insert_html_element( $this->state->current_token );
4029 return true;
4030
4031 /*
4032 * > A DOCTYPE token
4033 */
4034 case 'html':
4035 // Parse error: ignore the token.
4036 return $this->step();
4037
4038 /*
4039 * > A start tag whose tag name is "html"
4040 */
4041 case '+HTML':
4042 return $this->step_in_body();
4043
4044 /*
4045 * > A start tag whose tag name is "option"
4046 */
4047 case '+OPTION':
4048 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
4049 $this->state->stack_of_open_elements->pop();
4050 }
4051 $this->insert_html_element( $this->state->current_token );
4052 return true;
4053
4054 /*
4055 * > A start tag whose tag name is "optgroup"
4056 * > A start tag whose tag name is "hr"
4057 *
4058 * These rules are identical except for the treatment of the self-closing flag and
4059 * the subsequent pop of the HR void element, all of which is handled elsewhere in the processor.
4060 */
4061 case '+OPTGROUP':
4062 case '+HR':
4063 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
4064 $this->state->stack_of_open_elements->pop();
4065 }
4066
4067 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
4068 $this->state->stack_of_open_elements->pop();
4069 }
4070
4071 $this->insert_html_element( $this->state->current_token );
4072 return true;
4073
4074 /*
4075 * > An end tag whose tag name is "optgroup"
4076 */
4077 case '-OPTGROUP':
4078 $current_node = $this->state->stack_of_open_elements->current_node();
4079 if ( $current_node && 'OPTION' === $current_node->node_name ) {
4080 foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) {
4081 break;
4082 }
4083 if ( $parent && 'OPTGROUP' === $parent->node_name ) {
4084 $this->state->stack_of_open_elements->pop();
4085 }
4086 }
4087
4088 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTGROUP' ) ) {
4089 $this->state->stack_of_open_elements->pop();
4090 return true;
4091 }
4092
4093 // Parse error: ignore the token.
4094 return $this->step();
4095
4096 /*
4097 * > An end tag whose tag name is "option"
4098 */
4099 case '-OPTION':
4100 if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) ) {
4101 $this->state->stack_of_open_elements->pop();
4102 return true;
4103 }
4104
4105 // Parse error: ignore the token.
4106 return $this->step();
4107
4108 /*
4109 * > An end tag whose tag name is "select"
4110 * > A start tag whose tag name is "select"
4111 *
4112 * > It just gets treated like an end tag.
4113 */
4114 case '-SELECT':
4115 case '+SELECT':
4116 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
4117 // Parse error: ignore the token.
4118 return $this->step();
4119 }
4120 $this->state->stack_of_open_elements->pop_until( 'SELECT' );
4121 $this->reset_insertion_mode_appropriately();
4122 return true;
4123
4124 /*
4125 * > A start tag whose tag name is one of: "input", "keygen", "textarea"
4126 *
4127 * All three of these tags are considered a parse error when found in this insertion mode.
4128 */
4129 case '+INPUT':
4130 case '+KEYGEN':
4131 case '+TEXTAREA':
4132 if ( ! $this->state->stack_of_open_elements->has_element_in_select_scope( 'SELECT' ) ) {
4133 // Ignore the token.
4134 return $this->step();
4135 }
4136 $this->state->stack_of_open_elements->pop_until( 'SELECT' );
4137 $this->reset_insertion_mode_appropriately();
4138 return $this->step( self::REPROCESS_CURRENT_NODE );
4139
4140 /*
4141 * > A start tag whose tag name is one of: "script", "template"
4142 * > An end tag whose tag name is "template"
4143 */
4144 case '+SCRIPT':
4145 case '+TEMPLATE':
4146 case '-TEMPLATE':
4147 return $this->step_in_head();
4148 }
4149
4150 /*
4151 * > Anything else
4152 * > Parse error: ignore the token.
4153 */
4154 return $this->step();
4155 }
4156
4157 /**
4158 * Parses next element in the 'in select in table' insertion mode.
4159 *
4160 * This internal function performs the 'in select in table' insertion mode
4161 * logic for the generalized WP_HTML_Processor::step() function.
4162 *
4163 * @since 6.7.0
4164 *
4165 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4166 *
4167 * @see https://html.spec.whatwg.org/#parsing-main-inselectintable
4168 * @see WP_HTML_Processor::step
4169 *
4170 * @return bool Whether an element was found.
4171 */
4172 private function step_in_select_in_table(): bool {
4173 $token_name = $this->get_token_name();
4174 $token_type = $this->get_token_type();
4175 $op_sigil = '#tag' === $token_type ? ( parent::is_tag_closer() ? '-' : '+' ) : '';
4176 $op = "{$op_sigil}{$token_name}";
4177
4178 switch ( $op ) {
4179 /*
4180 * > A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th"
4181 */
4182 case '+CAPTION':
4183 case '+TABLE':
4184 case '+TBODY':
4185 case '+TFOOT':
4186 case '+THEAD':
4187 case '+TR':
4188 case '+TD':
4189 case '+TH':
4190 // @todo Indicate a parse error once it's possible.
4191 $this->state->stack_of_open_elements->pop_until( 'SELECT' );
4192 $this->reset_insertion_mode_appropriately();
4193 return $this->step( self::REPROCESS_CURRENT_NODE );
4194
4195 /*
4196 * > An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th"
4197 */
4198 case '-CAPTION':
4199 case '-TABLE':
4200 case '-TBODY':
4201 case '-TFOOT':
4202 case '-THEAD':
4203 case '-TR':
4204 case '-TD':
4205 case '-TH':
4206 // @todo Indicate a parse error once it's possible.
4207 if ( ! $this->state->stack_of_open_elements->has_element_in_table_scope( $token_name ) ) {
4208 return $this->step();
4209 }
4210 $this->state->stack_of_open_elements->pop_until( 'SELECT' );
4211 $this->reset_insertion_mode_appropriately();
4212 return $this->step( self::REPROCESS_CURRENT_NODE );
4213 }
4214
4215 /*
4216 * > Anything else
4217 */
4218 return $this->step_in_select();
4219 }
4220
4221 /**
4222 * Parses next element in the 'in template' insertion mode.
4223 *
4224 * This internal function performs the 'in template' insertion mode
4225 * logic for the generalized WP_HTML_Processor::step() function.
4226 *
4227 * @since 6.7.0 Stub implementation.
4228 *
4229 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4230 *
4231 * @see https://html.spec.whatwg.org/#parsing-main-intemplate
4232 * @see WP_HTML_Processor::step
4233 *
4234 * @return bool Whether an element was found.
4235 */
4236 private function step_in_template(): bool {
4237 $token_name = $this->get_token_name();
4238 $token_type = $this->get_token_type();
4239 $is_closer = $this->is_tag_closer();
4240 $op_sigil = '#tag' === $token_type ? ( $is_closer ? '-' : '+' ) : '';
4241 $op = "{$op_sigil}{$token_name}";
4242
4243 switch ( $op ) {
4244 /*
4245 * > A character token
4246 * > A comment token
4247 * > A DOCTYPE token
4248 */
4249 case '#text':
4250 case '#comment':
4251 case '#funky-comment':
4252 case '#presumptuous-tag':
4253 case 'html':
4254 return $this->step_in_body();
4255
4256 /*
4257 * > A start tag whose tag name is one of: "base", "basefont", "bgsound", "link",
4258 * > "meta", "noframes", "script", "style", "template", "title"
4259 * > An end tag whose tag name is "template"
4260 */
4261 case '+BASE':
4262 case '+BASEFONT':
4263 case '+BGSOUND':
4264 case '+LINK':
4265 case '+META':
4266 case '+NOFRAMES':
4267 case '+SCRIPT':
4268 case '+STYLE':
4269 case '+TEMPLATE':
4270 case '+TITLE':
4271 case '-TEMPLATE':
4272 return $this->step_in_head();
4273
4274 /*
4275 * > A start tag whose tag name is one of: "caption", "colgroup", "tbody", "tfoot", "thead"
4276 */
4277 case '+CAPTION':
4278 case '+COLGROUP':
4279 case '+TBODY':
4280 case '+TFOOT':
4281 case '+THEAD':
4282 array_pop( $this->state->stack_of_template_insertion_modes );
4283 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
4284 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
4285 return $this->step( self::REPROCESS_CURRENT_NODE );
4286
4287 /*
4288 * > A start tag whose tag name is "col"
4289 */
4290 case '+COL':
4291 array_pop( $this->state->stack_of_template_insertion_modes );
4292 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
4293 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
4294 return $this->step( self::REPROCESS_CURRENT_NODE );
4295
4296 /*
4297 * > A start tag whose tag name is "tr"
4298 */
4299 case '+TR':
4300 array_pop( $this->state->stack_of_template_insertion_modes );
4301 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
4302 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
4303 return $this->step( self::REPROCESS_CURRENT_NODE );
4304
4305 /*
4306 * > A start tag whose tag name is one of: "td", "th"
4307 */
4308 case '+TD':
4309 case '+TH':
4310 array_pop( $this->state->stack_of_template_insertion_modes );
4311 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
4312 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
4313 return $this->step( self::REPROCESS_CURRENT_NODE );
4314 }
4315
4316 /*
4317 * > Any other start tag
4318 */
4319 if ( ! $is_closer ) {
4320 array_pop( $this->state->stack_of_template_insertion_modes );
4321 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4322 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4323 return $this->step( self::REPROCESS_CURRENT_NODE );
4324 }
4325
4326 /*
4327 * > Any other end tag
4328 */
4329 if ( $is_closer ) {
4330 // Parse error: ignore the token.
4331 return $this->step();
4332 }
4333
4334 /*
4335 * > An end-of-file token
4336 */
4337 if ( ! $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) {
4338 // Stop parsing.
4339 return false;
4340 }
4341
4342 // @todo Indicate a parse error once it's possible.
4343 $this->state->stack_of_open_elements->pop_until( 'TEMPLATE' );
4344 $this->state->active_formatting_elements->clear_up_to_last_marker();
4345 array_pop( $this->state->stack_of_template_insertion_modes );
4346 $this->reset_insertion_mode_appropriately();
4347 return $this->step( self::REPROCESS_CURRENT_NODE );
4348 }
4349
4350 /**
4351 * Parses next element in the 'after body' insertion mode.
4352 *
4353 * This internal function performs the 'after body' insertion mode
4354 * logic for the generalized WP_HTML_Processor::step() function.
4355 *
4356 * @since 6.7.0 Stub implementation.
4357 *
4358 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4359 *
4360 * @see https://html.spec.whatwg.org/#parsing-main-afterbody
4361 * @see WP_HTML_Processor::step
4362 *
4363 * @return bool Whether an element was found.
4364 */
4365 private function step_after_body(): bool {
4366 $tag_name = $this->get_token_name();
4367 $token_type = $this->get_token_type();
4368 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4369 $op = "{$op_sigil}{$tag_name}";
4370
4371 switch ( $op ) {
4372 /*
4373 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4374 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4375 *
4376 * > Process the token using the rules for the "in body" insertion mode.
4377 */
4378 case '#text':
4379 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4380 return $this->step_in_body();
4381 }
4382 goto after_body_anything_else;
4383 break;
4384
4385 /*
4386 * > A comment token
4387 */
4388 case '#comment':
4389 case '#funky-comment':
4390 case '#presumptuous-tag':
4391 $this->bail( 'Content outside of BODY is unsupported.' );
4392 break;
4393
4394 /*
4395 * > A DOCTYPE token
4396 */
4397 case 'html':
4398 // Parse error: ignore the token.
4399 return $this->step();
4400
4401 /*
4402 * > A start tag whose tag name is "html"
4403 */
4404 case '+HTML':
4405 return $this->step_in_body();
4406
4407 /*
4408 * > An end tag whose tag name is "html"
4409 *
4410 * > If the parser was created as part of the HTML fragment parsing algorithm,
4411 * > this is a parse error; ignore the token. (fragment case)
4412 * >
4413 * > Otherwise, switch the insertion mode to "after after body".
4414 */
4415 case '-HTML':
4416 if ( isset( $this->context_node ) ) {
4417 return $this->step();
4418 }
4419
4420 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY;
4421 /*
4422 * The HTML element is not removed from the stack of open elements.
4423 * Only internal state has changed, this does not qualify as a "step"
4424 * in terms of advancing through the document to another token.
4425 * Nothing has been pushed or popped.
4426 * Proceed to parse the next item.
4427 */
4428 return $this->step();
4429 }
4430
4431 /*
4432 * > Parse error. Switch the insertion mode to "in body" and reprocess the token.
4433 */
4434 after_body_anything_else:
4435 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4436 return $this->step( self::REPROCESS_CURRENT_NODE );
4437 }
4438
4439 /**
4440 * Parses next element in the 'in frameset' insertion mode.
4441 *
4442 * This internal function performs the 'in frameset' insertion mode
4443 * logic for the generalized WP_HTML_Processor::step() function.
4444 *
4445 * @since 6.7.0 Stub implementation.
4446 *
4447 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4448 *
4449 * @see https://html.spec.whatwg.org/#parsing-main-inframeset
4450 * @see WP_HTML_Processor::step
4451 *
4452 * @return bool Whether an element was found.
4453 */
4454 private function step_in_frameset(): bool {
4455 $tag_name = $this->get_token_name();
4456 $token_type = $this->get_token_type();
4457 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4458 $op = "{$op_sigil}{$tag_name}";
4459
4460 switch ( $op ) {
4461 /*
4462 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4463 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4464 * >
4465 * > Insert the character.
4466 *
4467 * This algorithm effectively strips non-whitespace characters from text and inserts
4468 * them under HTML. This is not supported at this time.
4469 */
4470 case '#text':
4471 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4472 return $this->step_in_body();
4473 }
4474 $this->bail( 'Non-whitespace characters cannot be handled in frameset.' );
4475 break;
4476
4477 /*
4478 * > A comment token
4479 */
4480 case '#comment':
4481 case '#funky-comment':
4482 case '#presumptuous-tag':
4483 $this->insert_html_element( $this->state->current_token );
4484 return true;
4485
4486 /*
4487 * > A DOCTYPE token
4488 */
4489 case 'html':
4490 // Parse error: ignore the token.
4491 return $this->step();
4492
4493 /*
4494 * > A start tag whose tag name is "html"
4495 */
4496 case '+HTML':
4497 return $this->step_in_body();
4498
4499 /*
4500 * > A start tag whose tag name is "frameset"
4501 */
4502 case '+FRAMESET':
4503 $this->insert_html_element( $this->state->current_token );
4504 return true;
4505
4506 /*
4507 * > An end tag whose tag name is "frameset"
4508 */
4509 case '-FRAMESET':
4510 /*
4511 * > If the current node is the root html element, then this is a parse error;
4512 * > ignore the token. (fragment case)
4513 */
4514 if ( $this->state->stack_of_open_elements->current_node_is( 'HTML' ) ) {
4515 return $this->step();
4516 }
4517
4518 /*
4519 * > Otherwise, pop the current node from the stack of open elements.
4520 */
4521 $this->state->stack_of_open_elements->pop();
4522
4523 /*
4524 * > If the parser was not created as part of the HTML fragment parsing algorithm
4525 * > (fragment case), and the current node is no longer a frameset element, then
4526 * > switch the insertion mode to "after frameset".
4527 */
4528 if ( ! isset( $this->context_node ) && ! $this->state->stack_of_open_elements->current_node_is( 'FRAMESET' ) ) {
4529 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET;
4530 }
4531
4532 return true;
4533
4534 /*
4535 * > A start tag whose tag name is "frame"
4536 *
4537 * > Insert an HTML element for the token. Immediately pop the
4538 * > current node off the stack of open elements.
4539 * >
4540 * > Acknowledge the token's self-closing flag, if it is set.
4541 */
4542 case '+FRAME':
4543 $this->insert_html_element( $this->state->current_token );
4544 $this->state->stack_of_open_elements->pop();
4545 return true;
4546
4547 /*
4548 * > A start tag whose tag name is "noframes"
4549 */
4550 case '+NOFRAMES':
4551 return $this->step_in_head();
4552 }
4553
4554 // Parse error: ignore the token.
4555 return $this->step();
4556 }
4557
4558 /**
4559 * Parses next element in the 'after frameset' insertion mode.
4560 *
4561 * This internal function performs the 'after frameset' insertion mode
4562 * logic for the generalized WP_HTML_Processor::step() function.
4563 *
4564 * @since 6.7.0 Stub implementation.
4565 *
4566 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4567 *
4568 * @see https://html.spec.whatwg.org/#parsing-main-afterframeset
4569 * @see WP_HTML_Processor::step
4570 *
4571 * @return bool Whether an element was found.
4572 */
4573 private function step_after_frameset(): bool {
4574 $tag_name = $this->get_token_name();
4575 $token_type = $this->get_token_type();
4576 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4577 $op = "{$op_sigil}{$tag_name}";
4578
4579 switch ( $op ) {
4580 /*
4581 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4582 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4583 * >
4584 * > Insert the character.
4585 *
4586 * This algorithm effectively strips non-whitespace characters from text and inserts
4587 * them under HTML. This is not supported at this time.
4588 */
4589 case '#text':
4590 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4591 return $this->step_in_body();
4592 }
4593 $this->bail( 'Non-whitespace characters cannot be handled in after frameset' );
4594 break;
4595
4596 /*
4597 * > A comment token
4598 */
4599 case '#comment':
4600 case '#funky-comment':
4601 case '#presumptuous-tag':
4602 $this->insert_html_element( $this->state->current_token );
4603 return true;
4604
4605 /*
4606 * > A DOCTYPE token
4607 */
4608 case 'html':
4609 // Parse error: ignore the token.
4610 return $this->step();
4611
4612 /*
4613 * > A start tag whose tag name is "html"
4614 */
4615 case '+HTML':
4616 return $this->step_in_body();
4617
4618 /*
4619 * > An end tag whose tag name is "html"
4620 */
4621 case '-HTML':
4622 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET;
4623 /*
4624 * The HTML element is not removed from the stack of open elements.
4625 * Only internal state has changed, this does not qualify as a "step"
4626 * in terms of advancing through the document to another token.
4627 * Nothing has been pushed or popped.
4628 * Proceed to parse the next item.
4629 */
4630 return $this->step();
4631
4632 /*
4633 * > A start tag whose tag name is "noframes"
4634 */
4635 case '+NOFRAMES':
4636 return $this->step_in_head();
4637 }
4638
4639 // Parse error: ignore the token.
4640 return $this->step();
4641 }
4642
4643 /**
4644 * Parses next element in the 'after after body' insertion mode.
4645 *
4646 * This internal function performs the 'after after body' insertion mode
4647 * logic for the generalized WP_HTML_Processor::step() function.
4648 *
4649 * @since 6.7.0 Stub implementation.
4650 *
4651 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4652 *
4653 * @see https://html.spec.whatwg.org/#the-after-after-body-insertion-mode
4654 * @see WP_HTML_Processor::step
4655 *
4656 * @return bool Whether an element was found.
4657 */
4658 private function step_after_after_body(): bool {
4659 $tag_name = $this->get_token_name();
4660 $token_type = $this->get_token_type();
4661 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4662 $op = "{$op_sigil}{$tag_name}";
4663
4664 switch ( $op ) {
4665 /*
4666 * > A comment token
4667 */
4668 case '#comment':
4669 case '#funky-comment':
4670 case '#presumptuous-tag':
4671 $this->bail( 'Content outside of HTML is unsupported.' );
4672 break;
4673
4674 /*
4675 * > A DOCTYPE token
4676 * > A start tag whose tag name is "html"
4677 *
4678 * > Process the token using the rules for the "in body" insertion mode.
4679 */
4680 case 'html':
4681 case '+HTML':
4682 return $this->step_in_body();
4683
4684 /*
4685 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4686 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4687 * >
4688 * > Process the token using the rules for the "in body" insertion mode.
4689 */
4690 case '#text':
4691 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4692 return $this->step_in_body();
4693 }
4694 goto after_after_body_anything_else;
4695 break;
4696 }
4697
4698 /*
4699 * > Parse error. Switch the insertion mode to "in body" and reprocess the token.
4700 */
4701 after_after_body_anything_else:
4702 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
4703 return $this->step( self::REPROCESS_CURRENT_NODE );
4704 }
4705
4706 /**
4707 * Parses next element in the 'after after frameset' insertion mode.
4708 *
4709 * This internal function performs the 'after after frameset' insertion mode
4710 * logic for the generalized WP_HTML_Processor::step() function.
4711 *
4712 * @since 6.7.0 Stub implementation.
4713 *
4714 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4715 *
4716 * @see https://html.spec.whatwg.org/#the-after-after-frameset-insertion-mode
4717 * @see WP_HTML_Processor::step
4718 *
4719 * @return bool Whether an element was found.
4720 */
4721 private function step_after_after_frameset(): bool {
4722 $tag_name = $this->get_token_name();
4723 $token_type = $this->get_token_type();
4724 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4725 $op = "{$op_sigil}{$tag_name}";
4726
4727 switch ( $op ) {
4728 /*
4729 * > A comment token
4730 */
4731 case '#comment':
4732 case '#funky-comment':
4733 case '#presumptuous-tag':
4734 $this->bail( 'Content outside of HTML is unsupported.' );
4735 break;
4736
4737 /*
4738 * > A DOCTYPE token
4739 * > A start tag whose tag name is "html"
4740 *
4741 * > Process the token using the rules for the "in body" insertion mode.
4742 */
4743 case 'html':
4744 case '+HTML':
4745 return $this->step_in_body();
4746
4747 /*
4748 * > A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF),
4749 * > U+000C FORM FEED (FF), U+000D CARRIAGE RETURN (CR), or U+0020 SPACE
4750 * >
4751 * > Process the token using the rules for the "in body" insertion mode.
4752 *
4753 * This algorithm effectively strips non-whitespace characters from text and inserts
4754 * them under HTML. This is not supported at this time.
4755 */
4756 case '#text':
4757 if ( parent::TEXT_IS_WHITESPACE === $this->text_node_classification ) {
4758 return $this->step_in_body();
4759 }
4760 $this->bail( 'Non-whitespace characters cannot be handled in after after frameset.' );
4761 break;
4762
4763 /*
4764 * > A start tag whose tag name is "noframes"
4765 */
4766 case '+NOFRAMES':
4767 return $this->step_in_head();
4768 }
4769
4770 // Parse error: ignore the token.
4771 return $this->step();
4772 }
4773
4774 /**
4775 * Parses next element in the 'in foreign content' insertion mode.
4776 *
4777 * This internal function performs the 'in foreign content' insertion mode
4778 * logic for the generalized WP_HTML_Processor::step() function.
4779 *
4780 * @since 6.7.0 Stub implementation.
4781 *
4782 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
4783 *
4784 * @see https://html.spec.whatwg.org/#parsing-main-inforeign
4785 * @see WP_HTML_Processor::step
4786 *
4787 * @return bool Whether an element was found.
4788 */
4789 private function step_in_foreign_content(): bool {
4790 $tag_name = $this->get_token_name();
4791 $token_type = $this->get_token_type();
4792 $op_sigil = '#tag' === $token_type ? ( $this->is_tag_closer() ? '-' : '+' ) : '';
4793 $op = "{$op_sigil}{$tag_name}";
4794
4795 /*
4796 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size"
4797 *
4798 * This section drawn out above the switch to more easily incorporate
4799 * the additional rules based on the presence of the attributes.
4800 */
4801 if (
4802 '+FONT' === $op &&
4803 (
4804 null !== $this->get_attribute( 'color' ) ||
4805 null !== $this->get_attribute( 'face' ) ||
4806 null !== $this->get_attribute( 'size' )
4807 )
4808 ) {
4809 $op = '+FONT with attributes';
4810 }
4811
4812 switch ( $op ) {
4813 case '#text':
4814 /*
4815 * > A character token that is U+0000 NULL
4816 *
4817 * This is handled by `get_modifiable_text()`.
4818 */
4819
4820 /*
4821 * Whitespace-only text does not affect the frameset-ok flag.
4822 * It is probably inter-element whitespace, but it may also
4823 * contain character references which decode only to whitespace.
4824 */
4825 if ( parent::TEXT_IS_GENERIC === $this->text_node_classification ) {
4826 $this->state->frameset_ok = false;
4827 }
4828
4829 $this->insert_foreign_element( $this->state->current_token, false );
4830 return true;
4831
4832 /*
4833 * CDATA sections are alternate wrappers for text content and therefore
4834 * ought to follow the same rules as text nodes.
4835 */
4836 case '#cdata-section':
4837 /*
4838 * NULL bytes and whitespace do not change the frameset-ok flag.
4839 */
4840 $current_token = $this->bookmarks[ $this->state->current_token->bookmark_name ];
4841 $cdata_content_start = $current_token->start + 9;
4842 $cdata_content_length = $current_token->length - 12;
4843 if ( strspn( $this->html, "\0 \t\n\f\r", $cdata_content_start, $cdata_content_length ) !== $cdata_content_length ) {
4844 $this->state->frameset_ok = false;
4845 }
4846
4847 $this->insert_foreign_element( $this->state->current_token, false );
4848 return true;
4849
4850 /*
4851 * > A comment token
4852 */
4853 case '#comment':
4854 case '#funky-comment':
4855 case '#presumptuous-tag':
4856 $this->insert_foreign_element( $this->state->current_token, false );
4857 return true;
4858
4859 /*
4860 * > A DOCTYPE token
4861 */
4862 case 'html':
4863 // Parse error: ignore the token.
4864 return $this->step();
4865
4866 /*
4867 * > A start tag whose tag name is "b", "big", "blockquote", "body", "br", "center",
4868 * > "code", "dd", "div", "dl", "dt", "em", "embed", "h1", "h2", "h3", "h4", "h5",
4869 * > "h6", "head", "hr", "i", "img", "li", "listing", "menu", "meta", "nobr", "ol",
4870 * > "p", "pre", "ruby", "s", "small", "span", "strong", "strike", "sub", "sup",
4871 * > "table", "tt", "u", "ul", "var"
4872 *
4873 * > A start tag whose name is "font", if the token has any attributes named "color", "face", or "size"
4874 *
4875 * > An end tag whose tag name is "br", "p"
4876 *
4877 * Closing BR tags are always reported by the Tag Processor as opening tags.
4878 */
4879 case '+B':
4880 case '+BIG':
4881 case '+BLOCKQUOTE':
4882 case '+BODY':
4883 case '+BR':
4884 case '+CENTER':
4885 case '+CODE':
4886 case '+DD':
4887 case '+DIV':
4888 case '+DL':
4889 case '+DT':
4890 case '+EM':
4891 case '+EMBED':
4892 case '+H1':
4893 case '+H2':
4894 case '+H3':
4895 case '+H4':
4896 case '+H5':
4897 case '+H6':
4898 case '+HEAD':
4899 case '+HR':
4900 case '+I':
4901 case '+IMG':
4902 case '+LI':
4903 case '+LISTING':
4904 case '+MENU':
4905 case '+META':
4906 case '+NOBR':
4907 case '+OL':
4908 case '+P':
4909 case '+PRE':
4910 case '+RUBY':
4911 case '+S':
4912 case '+SMALL':
4913 case '+SPAN':
4914 case '+STRONG':
4915 case '+STRIKE':
4916 case '+SUB':
4917 case '+SUP':
4918 case '+TABLE':
4919 case '+TT':
4920 case '+U':
4921 case '+UL':
4922 case '+VAR':
4923 case '+FONT with attributes':
4924 case '-BR':
4925 case '-P':
4926 // @todo Indicate a parse error once it's possible.
4927 foreach ( $this->state->stack_of_open_elements->walk_up() as $current_node ) {
4928 if (
4929 'math' === $current_node->integration_node_type ||
4930 'html' === $current_node->integration_node_type ||
4931 'html' === $current_node->namespace
4932 ) {
4933 break;
4934 }
4935
4936 $this->state->stack_of_open_elements->pop();
4937 }
4938 goto in_foreign_content_process_in_current_insertion_mode;
4939 }
4940
4941 /*
4942 * > Any other start tag
4943 */
4944 if ( ! $this->is_tag_closer() ) {
4945 $this->insert_foreign_element( $this->state->current_token, false );
4946
4947 /*
4948 * > If the token has its self-closing flag set, then run
4949 * > the appropriate steps from the following list:
4950 * >
4951 * > ↪ the token's tag name is "script", and the new current node is in the SVG namespace
4952 * > Acknowledge the token's self-closing flag, and then act as
4953 * > described in the steps for a "script" end tag below.
4954 * >
4955 * > ↪ Otherwise
4956 * > Pop the current node off the stack of open elements and
4957 * > acknowledge the token's self-closing flag.
4958 *
4959 * Since the rules for SCRIPT below indicate to pop the element off of the stack of
4960 * open elements, which is the same for the Otherwise condition, there's no need to
4961 * separate these checks. The difference comes when a parser operates with the scripting
4962 * flag enabled, and executes the script, which this parser does not support.
4963 */
4964 if ( $this->state->current_token->has_self_closing_flag ) {
4965 $this->state->stack_of_open_elements->pop();
4966 }
4967 return true;
4968 }
4969
4970 /*
4971 * > An end tag whose name is "script", if the current node is an SVG script element.
4972 */
4973 if ( $this->is_tag_closer() && 'SCRIPT' === $this->state->current_token->node_name && 'svg' === $this->state->current_token->namespace ) {
4974 $this->state->stack_of_open_elements->pop();
4975 return true;
4976 }
4977
4978 /*
4979 * > Any other end tag
4980 */
4981 if ( $this->is_tag_closer() ) {
4982 $node = $this->state->stack_of_open_elements->current_node();
4983 if ( $tag_name !== $node->node_name ) {
4984 // @todo Indicate a parse error once it's possible.
4985 }
4986 in_foreign_content_end_tag_loop:
4987 if ( $node === $this->state->stack_of_open_elements->at( 1 ) ) {
4988 return true;
4989 }
4990
4991 /*
4992 * > If node's tag name, converted to ASCII lowercase, is the same as the tag name
4993 * > of the token, pop elements from the stack of open elements until node has
4994 * > been popped from the stack, and then return.
4995 */
4996 if ( 0 === strcasecmp( $node->node_name, $tag_name ) ) {
4997 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
4998 $this->state->stack_of_open_elements->pop();
4999 if ( $node === $item ) {
5000 return true;
5001 }
5002 }
5003 }
5004
5005 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $item ) {
5006 $node = $item;
5007 break;
5008 }
5009
5010 if ( 'html' !== $node->namespace ) {
5011 goto in_foreign_content_end_tag_loop;
5012 }
5013
5014 in_foreign_content_process_in_current_insertion_mode:
5015 switch ( $this->state->insertion_mode ) {
5016 case WP_HTML_Processor_State::INSERTION_MODE_INITIAL:
5017 return $this->step_initial();
5018
5019 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HTML:
5020 return $this->step_before_html();
5021
5022 case WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD:
5023 return $this->step_before_head();
5024
5025 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD:
5026 return $this->step_in_head();
5027
5028 case WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD_NOSCRIPT:
5029 return $this->step_in_head_noscript();
5030
5031 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD:
5032 return $this->step_after_head();
5033
5034 case WP_HTML_Processor_State::INSERTION_MODE_IN_BODY:
5035 return $this->step_in_body();
5036
5037 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE:
5038 return $this->step_in_table();
5039
5040 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_TEXT:
5041 return $this->step_in_table_text();
5042
5043 case WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION:
5044 return $this->step_in_caption();
5045
5046 case WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP:
5047 return $this->step_in_column_group();
5048
5049 case WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY:
5050 return $this->step_in_table_body();
5051
5052 case WP_HTML_Processor_State::INSERTION_MODE_IN_ROW:
5053 return $this->step_in_row();
5054
5055 case WP_HTML_Processor_State::INSERTION_MODE_IN_CELL:
5056 return $this->step_in_cell();
5057
5058 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT:
5059 return $this->step_in_select();
5060
5061 case WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE:
5062 return $this->step_in_select_in_table();
5063
5064 case WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE:
5065 return $this->step_in_template();
5066
5067 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_BODY:
5068 return $this->step_after_body();
5069
5070 case WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET:
5071 return $this->step_in_frameset();
5072
5073 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_FRAMESET:
5074 return $this->step_after_frameset();
5075
5076 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_BODY:
5077 return $this->step_after_after_body();
5078
5079 case WP_HTML_Processor_State::INSERTION_MODE_AFTER_AFTER_FRAMESET:
5080 return $this->step_after_after_frameset();
5081
5082 // This should be unreachable but PHP doesn't have total type checking on switch.
5083 default:
5084 $this->bail( "Unaware of the requested parsing mode: '{$this->state->insertion_mode}'." );
5085 }
5086 }
5087
5088 $this->bail( 'Should not have been able to reach end of IN FOREIGN CONTENT processing. Check HTML API code.' );
5089 // This unnecessary return prevents tools from inaccurately reporting type errors.
5090 return false;
5091 }
5092
5093 /*
5094 * Internal helpers
5095 */
5096
5097 /**
5098 * Creates a new bookmark for the currently-matched token and returns the generated name.
5099 *
5100 * @since 6.4.0
5101 * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token().
5102 *
5103 * @throws Exception When unable to allocate requested bookmark.
5104 *
5105 * @return string|false Name of created bookmark, or false if unable to create.
5106 */
5107 private function bookmark_token() {
5108 if ( ! parent::set_bookmark( ++$this->bookmark_counter ) ) {
5109 $this->last_error = self::ERROR_EXCEEDED_MAX_BOOKMARKS;
5110 throw new Exception( 'could not allocate bookmark' );
5111 }
5112
5113 return "{$this->bookmark_counter}";
5114 }
5115
5116 /*
5117 * HTML semantic overrides for Tag Processor
5118 */
5119
5120 /**
5121 * Indicates the namespace of the current token, or "html" if there is none.
5122 *
5123 * @return string One of "html", "math", or "svg".
5124 */
5125 public function get_namespace(): string {
5126 if ( ! isset( $this->current_element ) ) {
5127 return parent::get_namespace();
5128 }
5129
5130 return $this->current_element->token->namespace;
5131 }
5132
5133 /**
5134 * Returns the uppercase name of the matched tag.
5135 *
5136 * The semantic rules for HTML specify that certain tags be reprocessed
5137 * with a different tag name. Because of this, the tag name presented
5138 * by the HTML Processor may differ from the one reported by the HTML
5139 * Tag Processor, which doesn't apply these semantic rules.
5140 *
5141 * Example:
5142 *
5143 * $processor = new WP_HTML_Tag_Processor( '<div class="test">Test</div>' );
5144 * $processor->next_tag() === true;
5145 * $processor->get_tag() === 'DIV';
5146 *
5147 * $processor->next_tag() === false;
5148 * $processor->get_tag() === null;
5149 *
5150 * @since 6.4.0
5151 *
5152 * @return string|null Name of currently matched tag in input HTML, or `null` if none found.
5153 */
5154 public function get_tag(): ?string {
5155 if ( null !== $this->last_error ) {
5156 return null;
5157 }
5158
5159 if ( $this->is_virtual() ) {
5160 return $this->current_element->token->node_name;
5161 }
5162
5163 $tag_name = parent::get_tag();
5164
5165 /*
5166 * > A start tag whose tag name is "image"
5167 * > Change the token's tag name to "img" and reprocess it. (Don't ask.)
5168 */
5169 return ( 'IMAGE' === $tag_name && 'html' === $this->get_namespace() )
5170 ? 'IMG'
5171 : $tag_name;
5172 }
5173
5174 /**
5175 * Indicates if the currently matched tag contains the self-closing flag.
5176 *
5177 * No HTML elements ought to have the self-closing flag and for those, the self-closing
5178 * flag will be ignored. For void elements this is benign because they "self close"
5179 * automatically. For non-void HTML elements though problems will appear if someone
5180 * intends to use a self-closing element in place of that element with an empty body.
5181 * For HTML foreign elements and custom elements the self-closing flag determines if
5182 * they self-close or not.
5183 *
5184 * This function does not determine if a tag is self-closing,
5185 * but only if the self-closing flag is present in the syntax.
5186 *
5187 * @since 6.6.0 Subclassed for the HTML Processor.
5188 *
5189 * @return bool Whether the currently matched tag contains the self-closing flag.
5190 */
5191 public function has_self_closing_flag(): bool {
5192 return $this->is_virtual() ? false : parent::has_self_closing_flag();
5193 }
5194
5195 /**
5196 * Returns the node name represented by the token.
5197 *
5198 * This matches the DOM API value `nodeName`. Some values
5199 * are static, such as `#text` for a text node, while others
5200 * are dynamically generated from the token itself.
5201 *
5202 * Dynamic names:
5203 * - Uppercase tag name for tag matches.
5204 * - `html` for DOCTYPE declarations.
5205 *
5206 * Note that if the Tag Processor is not matched on a token
5207 * then this function will return `null`, either because it
5208 * hasn't yet found a token or because it reached the end
5209 * of the document without matching a token.
5210 *
5211 * @since 6.6.0 Subclassed for the HTML Processor.
5212 *
5213 * @return string|null Name of the matched token.
5214 */
5215 public function get_token_name(): ?string {
5216 return $this->is_virtual()
5217 ? $this->current_element->token->node_name
5218 : parent::get_token_name();
5219 }
5220
5221 /**
5222 * Indicates the kind of matched token, if any.
5223 *
5224 * This differs from `get_token_name()` in that it always
5225 * returns a static string indicating the type, whereas
5226 * `get_token_name()` may return values derived from the
5227 * token itself, such as a tag name or processing
5228 * instruction tag.
5229 *
5230 * Possible values:
5231 * - `#tag` when matched on a tag.
5232 * - `#text` when matched on a text node.
5233 * - `#cdata-section` when matched on a CDATA node.
5234 * - `#comment` when matched on a comment.
5235 * - `#doctype` when matched on a DOCTYPE declaration.
5236 * - `#presumptuous-tag` when matched on an empty tag closer.
5237 * - `#funky-comment` when matched on a funky comment.
5238 *
5239 * @since 6.6.0 Subclassed for the HTML Processor.
5240 *
5241 * @return string|null What kind of token is matched, or null.
5242 */
5243 public function get_token_type(): ?string {
5244 if ( $this->is_virtual() ) {
5245 /*
5246 * This logic comes from the Tag Processor.
5247 *
5248 * @todo It would be ideal not to repeat this here, but it's not clearly
5249 * better to allow passing a token name to `get_token_type()`.
5250 */
5251 $node_name = $this->current_element->token->node_name;
5252 $starting_char = $node_name[0];
5253 if ( 'A' <= $starting_char && 'Z' >= $starting_char ) {
5254 return '#tag';
5255 }
5256
5257 if ( 'html' === $node_name ) {
5258 return '#doctype';
5259 }
5260
5261 return $node_name;
5262 }
5263
5264 return parent::get_token_type();
5265 }
5266
5267 /**
5268 * Returns the value of a requested attribute from a matched tag opener if that attribute exists.
5269 *
5270 * Example:
5271 *
5272 * $p = WP_HTML_Processor::create_fragment( '<div enabled class="test" data-test-id="14">Test</div>' );
5273 * $p->next_token() === true;
5274 * $p->get_attribute( 'data-test-id' ) === '14';
5275 * $p->get_attribute( 'enabled' ) === true;
5276 * $p->get_attribute( 'aria-label' ) === null;
5277 *
5278 * $p->next_tag() === false;
5279 * $p->get_attribute( 'class' ) === null;
5280 *
5281 * @since 6.6.0 Subclassed for HTML Processor.
5282 *
5283 * @param string $name Name of attribute whose value is requested.
5284 * @return string|true|null Value of attribute or `null` if not available. Boolean attributes return `true`.
5285 */
5286 public function get_attribute( $name ) {
5287 return $this->is_virtual() ? null : parent::get_attribute( $name );
5288 }
5289
5290 /**
5291 * Updates or creates a new attribute on the currently matched tag with the passed value.
5292 *
5293 * This function handles all necessary HTML encoding. Provide normal, unescaped string values.
5294 * The HTML API will encode the strings appropriately so that the browser will interpret them
5295 * as the intended value.
5296 *
5297 * Example:
5298 *
5299 * // Renders “Eggs & Milk” in a browser, encoded as `<abbr title="Eggs &amp; Milk">`.
5300 * $processor->set_attribute( 'title', 'Eggs & Milk' );
5301 *
5302 * // Renders “Eggs &amp; Milk” in a browser, encoded as `<abbr title="Eggs &amp;amp; Milk">`.
5303 * $processor->set_attribute( 'title', 'Eggs &amp; Milk' );
5304 *
5305 * // Renders `true` as `<abbr title>`.
5306 * $processor->set_attribute( 'title', true );
5307 *
5308 * // Renders without the attribute for `false` as `<abbr>`.
5309 * $processor->set_attribute( 'title', false );
5310 *
5311 * Special handling is provided for boolean attribute values:
5312 * - When `true` is passed as the value, then only the attribute name is added to the tag.
5313 * - When `false` is passed, the attribute gets removed if it existed before.
5314 *
5315 * @since 6.6.0 Subclassed for the HTML Processor.
5316 * @since 6.9.0 Escapes all character references instead of trying to avoid double-escaping.
5317 *
5318 * @param string $name The attribute name to target.
5319 * @param string|bool $value The new attribute value.
5320 * @return bool Whether an attribute value was set.
5321 */
5322 public function set_attribute( $name, $value ): bool {
5323 return $this->is_virtual() ? false : parent::set_attribute( $name, $value );
5324 }
5325
5326 /**
5327 * Remove an attribute from the currently-matched tag.
5328 *
5329 * @since 6.6.0 Subclassed for HTML Processor.
5330 *
5331 * @param string $name The attribute name to remove.
5332 * @return bool Whether an attribute was removed.
5333 */
5334 public function remove_attribute( $name ): bool {
5335 return $this->is_virtual() ? false : parent::remove_attribute( $name );
5336 }
5337
5338 /**
5339 * Gets lowercase names of all attributes matching a given prefix in the current tag.
5340 *
5341 * Note that matching is case-insensitive. This is in accordance with the spec:
5342 *
5343 * > There must never be two or more attributes on
5344 * > the same start tag whose names are an ASCII
5345 * > case-insensitive match for each other.
5346 * - HTML 5 spec
5347 *
5348 * Example:
5349 *
5350 * $p = new WP_HTML_Tag_Processor( '<div data-ENABLED class="test" DATA-test-id="14">Test</div>' );
5351 * $p->next_tag( array( 'class_name' => 'test' ) ) === true;
5352 * $p->get_attribute_names_with_prefix( 'data-' ) === array( 'data-enabled', 'data-test-id' );
5353 *
5354 * $p->next_tag() === false;
5355 * $p->get_attribute_names_with_prefix( 'data-' ) === null;
5356 *
5357 * @since 6.6.0 Subclassed for the HTML Processor.
5358 *
5359 * @see https://html.spec.whatwg.org/multipage/syntax.html#attributes-2:ascii-case-insensitive
5360 *
5361 * @param string $prefix Prefix of requested attribute names.
5362 * @return array|null List of attribute names, or `null` when no tag opener is matched.
5363 */
5364 public function get_attribute_names_with_prefix( $prefix ): ?array {
5365 return $this->is_virtual() ? null : parent::get_attribute_names_with_prefix( $prefix );
5366 }
5367
5368 /**
5369 * Adds a new class name to the currently matched tag.
5370 *
5371 * @since 6.6.0 Subclassed for the HTML Processor.
5372 *
5373 * @param string $class_name The class name to add.
5374 * @return bool Whether the class was set to be added.
5375 */
5376 public function add_class( $class_name ): bool {
5377 return $this->is_virtual() ? false : parent::add_class( $class_name );
5378 }
5379
5380 /**
5381 * Removes a class name from the currently matched tag.
5382 *
5383 * @since 6.6.0 Subclassed for the HTML Processor.
5384 *
5385 * @param string $class_name The class name to remove.
5386 * @return bool Whether the class was set to be removed.
5387 */
5388 public function remove_class( $class_name ): bool {
5389 return $this->is_virtual() ? false : parent::remove_class( $class_name );
5390 }
5391
5392 /**
5393 * Returns if a matched tag contains the given ASCII case-insensitive class name.
5394 *
5395 * @since 6.6.0 Subclassed for the HTML Processor.
5396 *
5397 * @todo When reconstructing active formatting elements with attributes, find a way
5398 * to indicate if the virtually-reconstructed formatting elements contain the
5399 * wanted class name.
5400 *
5401 * @param string $wanted_class Look for this CSS class name, ASCII case-insensitive.
5402 * @return bool|null Whether the matched tag contains the given class name, or null if not matched.
5403 */
5404 public function has_class( $wanted_class ): ?bool {
5405 return $this->is_virtual() ? null : parent::has_class( $wanted_class );
5406 }
5407
5408 /**
5409 * Generator for a foreach loop to step through each class name for the matched tag.
5410 *
5411 * This generator function is designed to be used inside a "foreach" loop.
5412 *
5413 * Example:
5414 *
5415 * $p = WP_HTML_Processor::create_fragment( "<div class='free &lt;egg&lt;\tlang-en'>" );
5416 * $p->next_tag();
5417 * foreach ( $p->class_list() as $class_name ) {
5418 * echo "{$class_name} ";
5419 * }
5420 * // Outputs: "free <egg> lang-en "
5421 *
5422 * @since 6.6.0 Subclassed for the HTML Processor.
5423 */
5424 public function class_list() {
5425 return $this->is_virtual() ? null : parent::class_list();
5426 }
5427
5428 /**
5429 * Returns the modifiable text for a matched token, or an empty string.
5430 *
5431 * Modifiable text is text content that may be read and changed without
5432 * changing the HTML structure of the document around it. This includes
5433 * the contents of `#text` nodes in the HTML as well as the inner
5434 * contents of HTML comments, Processing Instructions, and others, even
5435 * though these nodes aren't part of a parsed DOM tree. They also contain
5436 * the contents of SCRIPT and STYLE tags, of TEXTAREA tags, and of any
5437 * other section in an HTML document which cannot contain HTML markup (DATA).
5438 *
5439 * If a token has no modifiable text then an empty string is returned to
5440 * avoid needless crashing or type errors. An empty string does not mean
5441 * that a token has modifiable text, and a token with modifiable text may
5442 * have an empty string (e.g. a comment with no contents).
5443 *
5444 * @since 6.6.0 Subclassed for the HTML Processor.
5445 *
5446 * @return string
5447 */
5448 public function get_modifiable_text(): string {
5449 return $this->is_virtual() ? '' : parent::get_modifiable_text();
5450 }
5451
5452 /**
5453 * Indicates what kind of comment produced the comment node.
5454 *
5455 * Because there are different kinds of HTML syntax which produce
5456 * comments, the Tag Processor tracks and exposes this as a type
5457 * for the comment. Nominally only regular HTML comments exist as
5458 * they are commonly known, but a number of unrelated syntax errors
5459 * also produce comments.
5460 *
5461 * @see self::COMMENT_AS_ABRUPTLY_CLOSED_COMMENT
5462 * @see self::COMMENT_AS_CDATA_LOOKALIKE
5463 * @see self::COMMENT_AS_INVALID_HTML
5464 * @see self::COMMENT_AS_HTML_COMMENT
5465 * @see self::COMMENT_AS_PI_NODE_LOOKALIKE
5466 *
5467 * @since 6.6.0 Subclassed for the HTML Processor.
5468 *
5469 * @return string|null
5470 */
5471 public function get_comment_type(): ?string {
5472 return $this->is_virtual() ? null : parent::get_comment_type();
5473 }
5474
5475 /**
5476 * Removes a bookmark that is no longer needed.
5477 *
5478 * Releasing a bookmark frees up the small
5479 * performance overhead it requires.
5480 *
5481 * @since 6.4.0
5482 *
5483 * @param string $bookmark_name Name of the bookmark to remove.
5484 * @return bool Whether the bookmark already existed before removal.
5485 */
5486 public function release_bookmark( $bookmark_name ): bool {
5487 return parent::release_bookmark( "_{$bookmark_name}" );
5488 }
5489
5490 /**
5491 * Moves the internal cursor in the HTML Processor to a given bookmark's location.
5492 *
5493 * Be careful! Seeking backwards to a previous location resets the parser to the
5494 * start of the document and reparses the entire contents up until it finds the
5495 * sought-after bookmarked location.
5496 *
5497 * In order to prevent accidental infinite loops, there's a
5498 * maximum limit on the number of times seek() can be called.
5499 *
5500 * @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
5501 *
5502 * @since 6.4.0
5503 *
5504 * @param string $bookmark_name Jump to the place in the document identified by this bookmark name.
5505 * @return bool Whether the internal cursor was successfully moved to the bookmark's location.
5506 */
5507 public function seek( $bookmark_name ): bool {
5508 // Flush any pending updates to the document before beginning.
5509 $this->get_updated_html();
5510
5511 $actual_bookmark_name = "_{$bookmark_name}";
5512 $processor_started_at = $this->state->current_token
5513 ? $this->bookmarks[ $this->state->current_token->bookmark_name ]->start
5514 : 0;
5515 $bookmark_starts_at = $this->bookmarks[ $actual_bookmark_name ]->start;
5516 $direction = $bookmark_starts_at > $processor_started_at ? 'forward' : 'backward';
5517
5518 /*
5519 * If seeking backwards, it's possible that the sought-after bookmark exists within an element
5520 * which has been closed before the current cursor; in other words, it has already been removed
5521 * from the stack of open elements. This means that it's insufficient to simply pop off elements
5522 * from the stack of open elements which appear after the bookmarked location and then jump to
5523 * that location, as the elements which were open before won't be re-opened.
5524 *
5525 * In order to maintain consistency, the HTML Processor rewinds to the start of the document
5526 * and reparses everything until it finds the sought-after bookmark.
5527 *
5528 * There are potentially better ways to do this: cache the parser state for each bookmark and
5529 * restore it when seeking; store an immutable and idempotent register of where elements open
5530 * and close.
5531 *
5532 * If caching the parser state it will be essential to properly maintain the cached stack of
5533 * open elements and active formatting elements when modifying the document. This could be a
5534 * tedious and time-consuming process as well, and so for now will not be performed.
5535 *
5536 * It may be possible to track bookmarks for where elements open and close, and in doing so
5537 * be able to quickly recalculate breadcrumbs for any element in the document. It may even
5538 * be possible to remove the stack of open elements and compute it on the fly this way.
5539 * If doing this, the parser would need to track the opening and closing locations for all
5540 * tokens in the breadcrumb path for any and all bookmarks. By utilizing bookmarks themselves
5541 * this list could be automatically maintained while modifying the document. Finding the
5542 * breadcrumbs would then amount to traversing that list from the start until the token
5543 * being inspected. Once an element closes, if there are no bookmarks pointing to locations
5544 * within that element, then all of these locations may be forgotten to save on memory use
5545 * and computation time.
5546 */
5547 if ( 'backward' === $direction ) {
5548
5549 /*
5550 * When moving backward, stateful stacks should be cleared.
5551 */
5552 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
5553 $this->state->stack_of_open_elements->remove_node( $item );
5554 }
5555
5556 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
5557 $this->state->active_formatting_elements->remove_node( $item );
5558 }
5559
5560 /*
5561 * **After** clearing stacks, more processor state can be reset.
5562 * This must be done after clearing the stack because those stacks generate events that
5563 * would appear on a subsequent call to `next_token()`.
5564 */
5565 $this->state->frameset_ok = true;
5566 $this->state->stack_of_template_insertion_modes = array();
5567 $this->state->head_element = null;
5568 $this->state->form_element = null;
5569 $this->state->current_token = null;
5570 $this->current_element = null;
5571 $this->element_queue = array();
5572
5573 /*
5574 * The absence of a context node indicates a full parse.
5575 * The presence of a context node indicates a fragment parser.
5576 */
5577 if ( null === $this->context_node ) {
5578 $this->change_parsing_namespace( 'html' );
5579 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_INITIAL;
5580 $this->breadcrumbs = array();
5581
5582 $this->bookmarks['initial'] = new WP_HTML_Span( 0, 0 );
5583 parent::seek( 'initial' );
5584 unset( $this->bookmarks['initial'] );
5585 } else {
5586
5587 /*
5588 * Push the root-node (HTML) back onto the stack of open elements.
5589 *
5590 * Fragment parsers require this extra bit of setup.
5591 * It's handled in full parsers by advancing the processor state.
5592 */
5593 $this->state->stack_of_open_elements->push(
5594 new WP_HTML_Token(
5595 'root-node',
5596 'HTML',
5597 false
5598 )
5599 );
5600
5601 $this->change_parsing_namespace(
5602 $this->context_node->integration_node_type
5603 ? 'html'
5604 : $this->context_node->namespace
5605 );
5606
5607 if ( 'TEMPLATE' === $this->context_node->node_name ) {
5608 $this->state->stack_of_template_insertion_modes[] = WP_HTML_Processor_State::INSERTION_MODE_IN_TEMPLATE;
5609 }
5610
5611 $this->reset_insertion_mode_appropriately();
5612 $this->breadcrumbs = array_slice( $this->breadcrumbs, 0, 2 );
5613 parent::seek( $this->context_node->bookmark_name );
5614 }
5615 }
5616
5617 /*
5618 * Here, the processor moves forward through the document until it matches the bookmark.
5619 * do-while is used here because the processor is expected to already be stopped on
5620 * a token than may match the bookmarked location.
5621 */
5622 do {
5623 /*
5624 * The processor will stop on virtual tokens, but bookmarks may not be set on them.
5625 * They should not be matched when seeking a bookmark, skip them.
5626 */
5627 if ( $this->is_virtual() ) {
5628 continue;
5629 }
5630 if ( $bookmark_starts_at === $this->bookmarks[ $this->state->current_token->bookmark_name ]->start ) {
5631 return true;
5632 }
5633 } while ( $this->next_token() );
5634
5635 return false;
5636 }
5637
5638 /**
5639 * Sets a bookmark in the HTML document.
5640 *
5641 * Bookmarks represent specific places or tokens in the HTML
5642 * document, such as a tag opener or closer. When applying
5643 * edits to a document, such as setting an attribute, the
5644 * text offsets of that token may shift; the bookmark is
5645 * kept updated with those shifts and remains stable unless
5646 * the entire span of text in which the token sits is removed.
5647 *
5648 * Release bookmarks when they are no longer needed.
5649 *
5650 * Example:
5651 *
5652 * <main><h2>Surprising fact you may not know!</h2></main>
5653 * ^ ^
5654 * \-|-- this `H2` opener bookmark tracks the token
5655 *
5656 * <main class="clickbait"><h2>Surprising fact you may no…
5657 * ^ ^
5658 * \-|-- it shifts with edits
5659 *
5660 * Bookmarks provide the ability to seek to a previously-scanned
5661 * place in the HTML document. This avoids the need to re-scan
5662 * the entire document.
5663 *
5664 * Example:
5665 *
5666 * <ul><li>One</li><li>Two</li><li>Three</li></ul>
5667 * ^^^^
5668 * want to note this last item
5669 *
5670 * $p = new WP_HTML_Tag_Processor( $html );
5671 * $in_list = false;
5672 * while ( $p->next_tag( array( 'tag_closers' => $in_list ? 'visit' : 'skip' ) ) ) {
5673 * if ( 'UL' === $p->get_tag() ) {
5674 * if ( $p->is_tag_closer() ) {
5675 * $in_list = false;
5676 * $p->set_bookmark( 'resume' );
5677 * if ( $p->seek( 'last-li' ) ) {
5678 * $p->add_class( 'last-li' );
5679 * }
5680 * $p->seek( 'resume' );
5681 * $p->release_bookmark( 'last-li' );
5682 * $p->release_bookmark( 'resume' );
5683 * } else {
5684 * $in_list = true;
5685 * }
5686 * }
5687 *
5688 * if ( 'LI' === $p->get_tag() ) {
5689 * $p->set_bookmark( 'last-li' );
5690 * }
5691 * }
5692 *
5693 * Bookmarks intentionally hide the internal string offsets
5694 * to which they refer. They are maintained internally as
5695 * updates are applied to the HTML document and therefore
5696 * retain their "position" - the location to which they
5697 * originally pointed. The inability to use bookmarks with
5698 * functions like `substr` is therefore intentional to guard
5699 * against accidentally breaking the HTML.
5700 *
5701 * Because bookmarks allocate memory and require processing
5702 * for every applied update, they are limited and require
5703 * a name. They should not be created with programmatically-made
5704 * names, such as "li_{$index}" with some loop. As a general
5705 * rule they should only be created with string-literal names
5706 * like "start-of-section" or "last-paragraph".
5707 *
5708 * Bookmarks are a powerful tool to enable complicated behavior.
5709 * Consider double-checking that you need this tool if you are
5710 * reaching for it, as inappropriate use could lead to broken
5711 * HTML structure or unwanted processing overhead.
5712 *
5713 * Bookmarks cannot be set on tokens that do no appear in the original
5714 * HTML text. For example, the HTML `<table><td>` stops at tags `TABLE`,
5715 * `TBODY`, `TR`, and `TD`. The `TBODY` and `TR` tags do not appear in
5716 * the original HTML and cannot be used as bookmarks.
5717 *
5718 * @since 6.4.0
5719 *
5720 * @param string $bookmark_name Identifies this particular bookmark.
5721 * @return bool Whether the bookmark was successfully created.
5722 */
5723 public function set_bookmark( $bookmark_name ): bool {
5724 if ( $this->is_virtual() ) {
5725 _doing_it_wrong(
5726 __METHOD__,
5727 __( 'Cannot set bookmarks on tokens that do no appear in the original HTML text.' ),
5728 '6.8.0'
5729 );
5730 return false;
5731 }
5732 return parent::set_bookmark( "_{$bookmark_name}" );
5733 }
5734
5735 /**
5736 * Checks whether a bookmark with the given name exists.
5737 *
5738 * @since 6.5.0
5739 *
5740 * @param string $bookmark_name Name to identify a bookmark that potentially exists.
5741 * @return bool Whether that bookmark exists.
5742 */
5743 public function has_bookmark( $bookmark_name ): bool {
5744 return parent::has_bookmark( "_{$bookmark_name}" );
5745 }
5746
5747 /*
5748 * HTML Parsing Algorithms
5749 */
5750
5751 /**
5752 * Closes a P element.
5753 *
5754 * @since 6.4.0
5755 *
5756 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
5757 *
5758 * @see https://html.spec.whatwg.org/#close-a-p-element
5759 */
5760 private function close_a_p_element(): void {
5761 $this->generate_implied_end_tags( 'P' );
5762 $this->state->stack_of_open_elements->pop_until( 'P' );
5763 }
5764
5765 /**
5766 * Closes elements that have implied end tags.
5767 *
5768 * @since 6.4.0
5769 * @since 6.7.0 Full spec support.
5770 *
5771 * @see https://html.spec.whatwg.org/#generate-implied-end-tags
5772 *
5773 * @param string|null $except_for_this_element Perform as if this element doesn't exist in the stack of open elements.
5774 */
5775 private function generate_implied_end_tags( ?string $except_for_this_element = null ): void {
5776 $elements_with_implied_end_tags = array(
5777 'DD',
5778 'DT',
5779 'LI',
5780 'OPTGROUP',
5781 'OPTION',
5782 'P',
5783 'RB',
5784 'RP',
5785 'RT',
5786 'RTC',
5787 );
5788
5789 $no_exclusions = ! isset( $except_for_this_element );
5790
5791 while (
5792 ( $no_exclusions || ! $this->state->stack_of_open_elements->current_node_is( $except_for_this_element ) ) &&
5793 in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true )
5794 ) {
5795 $this->state->stack_of_open_elements->pop();
5796 }
5797 }
5798
5799 /**
5800 * Closes elements that have implied end tags, thoroughly.
5801 *
5802 * See the HTML specification for an explanation why this is
5803 * different from generating end tags in the normal sense.
5804 *
5805 * @since 6.4.0
5806 * @since 6.7.0 Full spec support.
5807 *
5808 * @see WP_HTML_Processor::generate_implied_end_tags
5809 * @see https://html.spec.whatwg.org/#generate-implied-end-tags
5810 */
5811 private function generate_implied_end_tags_thoroughly(): void {
5812 $elements_with_implied_end_tags = array(
5813 'CAPTION',
5814 'COLGROUP',
5815 'DD',
5816 'DT',
5817 'LI',
5818 'OPTGROUP',
5819 'OPTION',
5820 'P',
5821 'RB',
5822 'RP',
5823 'RT',
5824 'RTC',
5825 'TBODY',
5826 'TD',
5827 'TFOOT',
5828 'TH',
5829 'THEAD',
5830 'TR',
5831 );
5832
5833 while ( in_array( $this->state->stack_of_open_elements->current_node()->node_name, $elements_with_implied_end_tags, true ) ) {
5834 $this->state->stack_of_open_elements->pop();
5835 }
5836 }
5837
5838 /**
5839 * Returns the adjusted current node.
5840 *
5841 * > The adjusted current node is the context element if the parser was created as
5842 * > part of the HTML fragment parsing algorithm and the stack of open elements
5843 * > has only one element in it (fragment case); otherwise, the adjusted current
5844 * > node is the current node.
5845 *
5846 * @see https://html.spec.whatwg.org/#adjusted-current-node
5847 *
5848 * @since 6.7.0
5849 *
5850 * @return WP_HTML_Token|null The adjusted current node.
5851 */
5852 private function get_adjusted_current_node(): ?WP_HTML_Token {
5853 if ( isset( $this->context_node ) && 1 === $this->state->stack_of_open_elements->count() ) {
5854 return $this->context_node;
5855 }
5856
5857 return $this->state->stack_of_open_elements->current_node();
5858 }
5859
5860 /**
5861 * Reconstructs the active formatting elements.
5862 *
5863 * > This has the effect of reopening all the formatting elements that were opened
5864 * > in the current body, cell, or caption (whichever is youngest) that haven't
5865 * > been explicitly closed.
5866 *
5867 * @since 6.4.0
5868 *
5869 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
5870 *
5871 * @see https://html.spec.whatwg.org/#reconstruct-the-active-formatting-elements
5872 *
5873 * @return bool Whether any formatting elements needed to be reconstructed.
5874 */
5875 private function reconstruct_active_formatting_elements(): bool {
5876 /*
5877 * > If there are no entries in the list of active formatting elements, then there is nothing
5878 * > to reconstruct; stop this algorithm.
5879 */
5880 if ( 0 === $this->state->active_formatting_elements->count() ) {
5881 return false;
5882 }
5883
5884 $last_entry = $this->state->active_formatting_elements->current_node();
5885 if (
5886
5887 /*
5888 * > If the last (most recently added) entry in the list of active formatting elements is a marker;
5889 * > stop this algorithm.
5890 */
5891 'marker' === $last_entry->node_name ||
5892
5893 /*
5894 * > If the last (most recently added) entry in the list of active formatting elements is an
5895 * > element that is in the stack of open elements, then there is nothing to reconstruct;
5896 * > stop this algorithm.
5897 */
5898 $this->state->stack_of_open_elements->contains_node( $last_entry )
5899 ) {
5900 return false;
5901 }
5902
5903 $this->bail( 'Cannot reconstruct active formatting elements when advancing and rewinding is required.' );
5904 }
5905
5906 /**
5907 * Runs the reset the insertion mode appropriately algorithm.
5908 *
5909 * @since 6.7.0
5910 *
5911 * @see https://html.spec.whatwg.org/multipage/parsing.html#reset-the-insertion-mode-appropriately
5912 */
5913 private function reset_insertion_mode_appropriately(): void {
5914 // Set the first node.
5915 $first_node = null;
5916 foreach ( $this->state->stack_of_open_elements->walk_down() as $first_node ) {
5917 break;
5918 }
5919
5920 /*
5921 * > 1. Let _last_ be false.
5922 */
5923 $last = false;
5924 foreach ( $this->state->stack_of_open_elements->walk_up() as $node ) {
5925 /*
5926 * > 2. Let _node_ be the last node in the stack of open elements.
5927 * > 3. _Loop_: If _node_ is the first node in the stack of open elements, then set _last_
5928 * > to true, and, if the parser was created as part of the HTML fragment parsing
5929 * > algorithm (fragment case), set node to the context element passed to
5930 * > that algorithm.
5931 * > …
5932 */
5933 if ( $node === $first_node ) {
5934 $last = true;
5935 if ( isset( $this->context_node ) ) {
5936 $node = $this->context_node;
5937 }
5938 }
5939
5940 // All of the following rules are for matching HTML elements.
5941 if ( 'html' !== $node->namespace ) {
5942 continue;
5943 }
5944
5945 switch ( $node->node_name ) {
5946 /*
5947 * > 4. If node is a `select` element, run these substeps:
5948 * > 1. If _last_ is true, jump to the step below labeled done.
5949 * > 2. Let _ancestor_ be _node_.
5950 * > 3. _Loop_: If _ancestor_ is the first node in the stack of open elements,
5951 * > jump to the step below labeled done.
5952 * > 4. Let ancestor be the node before ancestor in the stack of open elements.
5953 * > …
5954 * > 7. Jump back to the step labeled _loop_.
5955 * > 8. _Done_: Switch the insertion mode to "in select" and return.
5956 */
5957 case 'SELECT':
5958 if ( ! $last ) {
5959 foreach ( $this->state->stack_of_open_elements->walk_up( $node ) as $ancestor ) {
5960 if ( 'html' !== $ancestor->namespace ) {
5961 continue;
5962 }
5963
5964 switch ( $ancestor->node_name ) {
5965 /*
5966 * > 5. If _ancestor_ is a `template` node, jump to the step below
5967 * > labeled _done_.
5968 */
5969 case 'TEMPLATE':
5970 break 2;
5971
5972 /*
5973 * > 6. If _ancestor_ is a `table` node, switch the insertion mode to
5974 * > "in select in table" and return.
5975 */
5976 case 'TABLE':
5977 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT_IN_TABLE;
5978 return;
5979 }
5980 }
5981 }
5982 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_SELECT;
5983 return;
5984
5985 /*
5986 * > 5. If _node_ is a `td` or `th` element and _last_ is false, then switch the
5987 * > insertion mode to "in cell" and return.
5988 */
5989 case 'TD':
5990 case 'TH':
5991 if ( ! $last ) {
5992 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CELL;
5993 return;
5994 }
5995 break;
5996
5997 /*
5998 * > 6. If _node_ is a `tr` element, then switch the insertion mode to "in row"
5999 * > and return.
6000 */
6001 case 'TR':
6002 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
6003 return;
6004
6005 /*
6006 * > 7. If _node_ is a `tbody`, `thead`, or `tfoot` element, then switch the
6007 * > insertion mode to "in table body" and return.
6008 */
6009 case 'TBODY':
6010 case 'THEAD':
6011 case 'TFOOT':
6012 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE_BODY;
6013 return;
6014
6015 /*
6016 * > 8. If _node_ is a `caption` element, then switch the insertion mode to
6017 * > "in caption" and return.
6018 */
6019 case 'CAPTION':
6020 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_CAPTION;
6021 return;
6022
6023 /*
6024 * > 9. If _node_ is a `colgroup` element, then switch the insertion mode to
6025 * > "in column group" and return.
6026 */
6027 case 'COLGROUP':
6028 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_COLUMN_GROUP;
6029 return;
6030
6031 /*
6032 * > 10. If _node_ is a `table` element, then switch the insertion mode to
6033 * > "in table" and return.
6034 */
6035 case 'TABLE':
6036 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_TABLE;
6037 return;
6038
6039 /*
6040 * > 11. If _node_ is a `template` element, then switch the insertion mode to the
6041 * > current template insertion mode and return.
6042 */
6043 case 'TEMPLATE':
6044 $this->state->insertion_mode = end( $this->state->stack_of_template_insertion_modes );
6045 return;
6046
6047 /*
6048 * > 12. If _node_ is a `head` element and _last_ is false, then switch the
6049 * > insertion mode to "in head" and return.
6050 */
6051 case 'HEAD':
6052 if ( ! $last ) {
6053 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_HEAD;
6054 return;
6055 }
6056 break;
6057
6058 /*
6059 * > 13. If _node_ is a `body` element, then switch the insertion mode to "in body"
6060 * > and return.
6061 */
6062 case 'BODY':
6063 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
6064 return;
6065
6066 /*
6067 * > 14. If _node_ is a `frameset` element, then switch the insertion mode to
6068 * > "in frameset" and return. (fragment case)
6069 */
6070 case 'FRAMESET':
6071 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_FRAMESET;
6072 return;
6073
6074 /*
6075 * > 15. If _node_ is an `html` element, run these substeps:
6076 * > 1. If the head element pointer is null, switch the insertion mode to
6077 * > "before head" and return. (fragment case)
6078 * > 2. Otherwise, the head element pointer is not null, switch the insertion
6079 * > mode to "after head" and return.
6080 */
6081 case 'HTML':
6082 $this->state->insertion_mode = isset( $this->state->head_element )
6083 ? WP_HTML_Processor_State::INSERTION_MODE_AFTER_HEAD
6084 : WP_HTML_Processor_State::INSERTION_MODE_BEFORE_HEAD;
6085 return;
6086 }
6087 }
6088
6089 /*
6090 * > 16. If _last_ is true, then switch the insertion mode to "in body"
6091 * > and return. (fragment case)
6092 *
6093 * This is only reachable if `$last` is true, as per the fragment parsing case.
6094 */
6095 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_BODY;
6096 }
6097
6098 /**
6099 * Runs the adoption agency algorithm.
6100 *
6101 * @since 6.4.0
6102 *
6103 * @throws WP_HTML_Unsupported_Exception When encountering unsupported HTML input.
6104 *
6105 * @see https://html.spec.whatwg.org/#adoption-agency-algorithm
6106 */
6107 private function run_adoption_agency_algorithm(): void {
6108 $budget = 1000;
6109 $subject = $this->get_tag();
6110 $current_node = $this->state->stack_of_open_elements->current_node();
6111
6112 if (
6113 // > If the current node is an HTML element whose tag name is subject
6114 $current_node && $subject === $current_node->node_name &&
6115 // > the current node is not in the list of active formatting elements
6116 ! $this->state->active_formatting_elements->contains_node( $current_node )
6117 ) {
6118 $this->state->stack_of_open_elements->pop();
6119 return;
6120 }
6121
6122 $outer_loop_counter = 0;
6123 while ( $budget-- > 0 ) {
6124 if ( $outer_loop_counter++ >= 8 ) {
6125 return;
6126 }
6127
6128 /*
6129 * > Let formatting element be the last element in the list of active formatting elements that:
6130 * > - is between the end of the list and the last marker in the list,
6131 * > if any, or the start of the list otherwise,
6132 * > - and has the tag name subject.
6133 */
6134 $formatting_element = null;
6135 foreach ( $this->state->active_formatting_elements->walk_up() as $item ) {
6136 if ( 'marker' === $item->node_name ) {
6137 break;
6138 }
6139
6140 if ( $subject === $item->node_name ) {
6141 $formatting_element = $item;
6142 break;
6143 }
6144 }
6145
6146 // > If there is no such element, then return and instead act as described in the "any other end tag" entry above.
6147 if ( null === $formatting_element ) {
6148 $this->bail( 'Cannot run adoption agency when "any other end tag" is required.' );
6149 }
6150
6151 // > If formatting element is not in the stack of open elements, then this is a parse error; remove the element from the list, and return.
6152 if ( ! $this->state->stack_of_open_elements->contains_node( $formatting_element ) ) {
6153 $this->state->active_formatting_elements->remove_node( $formatting_element );
6154 return;
6155 }
6156
6157 // > If formatting element is in the stack of open elements, but the element is not in scope, then this is a parse error; return.
6158 if ( ! $this->state->stack_of_open_elements->has_element_in_scope( $formatting_element->node_name ) ) {
6159 return;
6160 }
6161
6162 /*
6163 * > Let furthest block be the topmost node in the stack of open elements that is lower in the stack
6164 * > than formatting element, and is an element in the special category. There might not be one.
6165 */
6166 $is_above_formatting_element = true;
6167 $furthest_block = null;
6168 foreach ( $this->state->stack_of_open_elements->walk_down() as $item ) {
6169 if ( $is_above_formatting_element && $formatting_element->bookmark_name !== $item->bookmark_name ) {
6170 continue;
6171 }
6172
6173 if ( $is_above_formatting_element ) {
6174 $is_above_formatting_element = false;
6175 continue;
6176 }
6177
6178 if ( self::is_special( $item ) ) {
6179 $furthest_block = $item;
6180 break;
6181 }
6182 }
6183
6184 /*
6185 * > If there is no furthest block, then the UA must first pop all the nodes from the bottom of the
6186 * > stack of open elements, from the current node up to and including formatting element, then
6187 * > remove formatting element from the list of active formatting elements, and finally return.
6188 */
6189 if ( null === $furthest_block ) {
6190 foreach ( $this->state->stack_of_open_elements->walk_up() as $item ) {
6191 $this->state->stack_of_open_elements->pop();
6192
6193 if ( $formatting_element->bookmark_name === $item->bookmark_name ) {
6194 $this->state->active_formatting_elements->remove_node( $formatting_element );
6195 return;
6196 }
6197 }
6198 }
6199
6200 $this->bail( 'Cannot extract common ancestor in adoption agency algorithm.' );
6201 }
6202
6203 $this->bail( 'Cannot run adoption agency when looping required.' );
6204 }
6205
6206 /**
6207 * Runs the "close the cell" algorithm.
6208 *
6209 * > Where the steps above say to close the cell, they mean to run the following algorithm:
6210 * > 1. Generate implied end tags.
6211 * > 2. If the current node is not now a td element or a th element, then this is a parse error.
6212 * > 3. Pop elements from the stack of open elements stack until a td element or a th element has been popped from the stack.
6213 * > 4. Clear the list of active formatting elements up to the last marker.
6214 * > 5. Switch the insertion mode to "in row".
6215 *
6216 * @see https://html.spec.whatwg.org/multipage/parsing.html#close-the-cell
6217 *
6218 * @since 6.7.0
6219 */
6220 private function close_cell(): void {
6221 $this->generate_implied_end_tags();
6222 // @todo Parse error if the current node is a "td" or "th" element.
6223 foreach ( $this->state->stack_of_open_elements->walk_up() as $element ) {
6224 $this->state->stack_of_open_elements->pop();
6225 if ( 'TD' === $element->node_name || 'TH' === $element->node_name ) {
6226 break;
6227 }
6228 }
6229 $this->state->active_formatting_elements->clear_up_to_last_marker();
6230 $this->state->insertion_mode = WP_HTML_Processor_State::INSERTION_MODE_IN_ROW;
6231 }
6232
6233 /**
6234 * Inserts an HTML element on the stack of open elements.
6235 *
6236 * @since 6.4.0
6237 *
6238 * @see https://html.spec.whatwg.org/#insert-a-foreign-element
6239 *
6240 * @param WP_HTML_Token $token Name of bookmark pointing to element in original input HTML.
6241 */
6242 private function insert_html_element( WP_HTML_Token $token ): void {
6243 $this->state->stack_of_open_elements->push( $token );
6244 }
6245
6246 /**
6247 * Inserts a foreign element on to the stack of open elements.
6248 *
6249 * @since 6.7.0
6250 *
6251 * @see https://html.spec.whatwg.org/#insert-a-foreign-element
6252 *
6253 * @param WP_HTML_Token $token Insert this token. The token's namespace and
6254 * insertion point will be updated correctly.
6255 * @param bool $only_add_to_element_stack Whether to skip the "insert an element at the adjusted
6256 * insertion location" algorithm when adding this element.
6257 */
6258 private function insert_foreign_element( WP_HTML_Token $token, bool $only_add_to_element_stack ): void {
6259 $adjusted_current_node = $this->get_adjusted_current_node();
6260
6261 $token->namespace = $adjusted_current_node ? $adjusted_current_node->namespace : 'html';
6262
6263 if ( $this->is_mathml_integration_point() ) {
6264 $token->integration_node_type = 'math';
6265 } elseif ( $this->is_html_integration_point() ) {
6266 $token->integration_node_type = 'html';
6267 }
6268
6269 if ( false === $only_add_to_element_stack ) {
6270 /*
6271 * @todo Implement the "appropriate place for inserting a node" and the
6272 * "insert an element at the adjusted insertion location" algorithms.
6273 *
6274 * These algorithms mostly impacts DOM tree construction and not the HTML API.
6275 * Here, there's no DOM node onto which the element will be appended, so the
6276 * parser will skip this step.
6277 *
6278 * @see https://html.spec.whatwg.org/#insert-an-element-at-the-adjusted-insertion-location
6279 */
6280 }
6281
6282 $this->insert_html_element( $token );
6283 }
6284
6285 /**
6286 * Inserts a virtual element on the stack of open elements.
6287 *
6288 * @since 6.7.0
6289 *
6290 * @param string $token_name Name of token to create and insert into the stack of open elements.
6291 * @param string|null $bookmark_name Optional. Name to give bookmark for created virtual node.
6292 * Defaults to auto-creating a bookmark name.
6293 * @return WP_HTML_Token Newly-created virtual token.
6294 */
6295 private function insert_virtual_node( $token_name, $bookmark_name = null ): WP_HTML_Token {
6296 $here = $this->bookmarks[ $this->state->current_token->bookmark_name ];
6297 $name = $bookmark_name ?? $this->bookmark_token();
6298
6299 $this->bookmarks[ $name ] = new WP_HTML_Span( $here->start, 0 );
6300
6301 $token = new WP_HTML_Token( $name, $token_name, false );
6302 $this->insert_html_element( $token );
6303 return $token;
6304 }
6305
6306 /*
6307 * HTML Specification Helpers
6308 */
6309
6310 /**
6311 * Indicates if the current token is a MathML integration point.
6312 *
6313 * @since 6.7.0
6314 *
6315 * @see https://html.spec.whatwg.org/#mathml-text-integration-point
6316 *
6317 * @return bool Whether the current token is a MathML integration point.
6318 */
6319 private function is_mathml_integration_point(): bool {
6320 $current_token = $this->state->current_token;
6321 if ( ! isset( $current_token ) ) {
6322 return false;
6323 }
6324
6325 if ( 'math' !== $current_token->namespace || 'M' !== $current_token->node_name[0] ) {
6326 return false;
6327 }
6328
6329 $tag_name = $current_token->node_name;
6330
6331 return (
6332 'MI' === $tag_name ||
6333 'MO' === $tag_name ||
6334 'MN' === $tag_name ||
6335 'MS' === $tag_name ||
6336 'MTEXT' === $tag_name
6337 );
6338 }
6339
6340 /**
6341 * Indicates if the current token is an HTML integration point.
6342 *
6343 * Note that this method must be an instance method with access
6344 * to the current token, since it needs to examine the attributes
6345 * of the currently-matched tag, if it's in the MathML namespace.
6346 * Otherwise it would be required to scan the HTML and ensure that
6347 * no other accounting is overlooked.
6348 *
6349 * @since 6.7.0
6350 *
6351 * @see https://html.spec.whatwg.org/#html-integration-point
6352 *
6353 * @return bool Whether the current token is an HTML integration point.
6354 */
6355 private function is_html_integration_point(): bool {
6356 $current_token = $this->state->current_token;
6357 if ( ! isset( $current_token ) ) {
6358 return false;
6359 }
6360
6361 if ( 'html' === $current_token->namespace ) {
6362 return false;
6363 }
6364
6365 $tag_name = $current_token->node_name;
6366
6367 if ( 'svg' === $current_token->namespace ) {
6368 return (
6369 'DESC' === $tag_name ||
6370 'FOREIGNOBJECT' === $tag_name ||
6371 'TITLE' === $tag_name
6372 );
6373 }
6374
6375 if ( 'math' === $current_token->namespace ) {
6376 if ( 'ANNOTATION-XML' !== $tag_name ) {
6377 return false;
6378 }
6379
6380 $encoding = $this->get_attribute( 'encoding' );
6381
6382 return (
6383 is_string( $encoding ) &&
6384 (
6385 0 === strcasecmp( $encoding, 'application/xhtml+xml' ) ||
6386 0 === strcasecmp( $encoding, 'text/html' )
6387 )
6388 );
6389 }
6390
6391 $this->bail( 'Should not have reached end of HTML Integration Point detection: check HTML API code.' );
6392 // This unnecessary return prevents tools from inaccurately reporting type errors.
6393 return false;
6394 }
6395
6396 /**
6397 * Returns whether an element of a given name is in the HTML special category.
6398 *
6399 * @since 6.4.0
6400 *
6401 * @see https://html.spec.whatwg.org/#special
6402 *
6403 * @param WP_HTML_Token|string $tag_name Node to check, or only its name if in the HTML namespace.
6404 * @return bool Whether the element of the given name is in the special category.
6405 */
6406 public static function is_special( $tag_name ): bool {
6407 if ( is_string( $tag_name ) ) {
6408 $tag_name = strtoupper( $tag_name );
6409 } else {
6410 $tag_name = 'html' === $tag_name->namespace
6411 ? strtoupper( $tag_name->node_name )
6412 : "{$tag_name->namespace} {$tag_name->node_name}";
6413 }
6414
6415 return (
6416 'ADDRESS' === $tag_name ||
6417 'APPLET' === $tag_name ||
6418 'AREA' === $tag_name ||
6419 'ARTICLE' === $tag_name ||
6420 'ASIDE' === $tag_name ||
6421 'BASE' === $tag_name ||
6422 'BASEFONT' === $tag_name ||
6423 'BGSOUND' === $tag_name ||
6424 'BLOCKQUOTE' === $tag_name ||
6425 'BODY' === $tag_name ||
6426 'BR' === $tag_name ||
6427 'BUTTON' === $tag_name ||
6428 'CAPTION' === $tag_name ||
6429 'CENTER' === $tag_name ||
6430 'COL' === $tag_name ||
6431 'COLGROUP' === $tag_name ||
6432 'DD' === $tag_name ||
6433 'DETAILS' === $tag_name ||
6434 'DIR' === $tag_name ||
6435 'DIV' === $tag_name ||
6436 'DL' === $tag_name ||
6437 'DT' === $tag_name ||
6438 'EMBED' === $tag_name ||
6439 'FIELDSET' === $tag_name ||
6440 'FIGCAPTION' === $tag_name ||
6441 'FIGURE' === $tag_name ||
6442 'FOOTER' === $tag_name ||
6443 'FORM' === $tag_name ||
6444 'FRAME' === $tag_name ||
6445 'FRAMESET' === $tag_name ||
6446 'H1' === $tag_name ||
6447 'H2' === $tag_name ||
6448 'H3' === $tag_name ||
6449 'H4' === $tag_name ||
6450 'H5' === $tag_name ||
6451 'H6' === $tag_name ||
6452 'HEAD' === $tag_name ||
6453 'HEADER' === $tag_name ||
6454 'HGROUP' === $tag_name ||
6455 'HR' === $tag_name ||
6456 'HTML' === $tag_name ||
6457 'IFRAME' === $tag_name ||
6458 'IMG' === $tag_name ||
6459 'INPUT' === $tag_name ||
6460 'KEYGEN' === $tag_name ||
6461 'LI' === $tag_name ||
6462 'LINK' === $tag_name ||
6463 'LISTING' === $tag_name ||
6464 'MAIN' === $tag_name ||
6465 'MARQUEE' === $tag_name ||
6466 'MENU' === $tag_name ||
6467 'META' === $tag_name ||
6468 'NAV' === $tag_name ||
6469 'NOEMBED' === $tag_name ||
6470 'NOFRAMES' === $tag_name ||
6471 'NOSCRIPT' === $tag_name ||
6472 'OBJECT' === $tag_name ||
6473 'OL' === $tag_name ||
6474 'P' === $tag_name ||
6475 'PARAM' === $tag_name ||
6476 'PLAINTEXT' === $tag_name ||
6477 'PRE' === $tag_name ||
6478 'SCRIPT' === $tag_name ||
6479 'SEARCH' === $tag_name ||
6480 'SECTION' === $tag_name ||
6481 'SELECT' === $tag_name ||
6482 'SOURCE' === $tag_name ||
6483 'STYLE' === $tag_name ||
6484 'SUMMARY' === $tag_name ||
6485 'TABLE' === $tag_name ||
6486 'TBODY' === $tag_name ||
6487 'TD' === $tag_name ||
6488 'TEMPLATE' === $tag_name ||
6489 'TEXTAREA' === $tag_name ||
6490 'TFOOT' === $tag_name ||
6491 'TH' === $tag_name ||
6492 'THEAD' === $tag_name ||
6493 'TITLE' === $tag_name ||
6494 'TR' === $tag_name ||
6495 'TRACK' === $tag_name ||
6496 'UL' === $tag_name ||
6497 'WBR' === $tag_name ||
6498 'XMP' === $tag_name ||
6499
6500 // MathML.
6501 'math MI' === $tag_name ||
6502 'math MO' === $tag_name ||
6503 'math MN' === $tag_name ||
6504 'math MS' === $tag_name ||
6505 'math MTEXT' === $tag_name ||
6506 'math ANNOTATION-XML' === $tag_name ||
6507
6508 // SVG.
6509 'svg DESC' === $tag_name ||
6510 'svg FOREIGNOBJECT' === $tag_name ||
6511 'svg TITLE' === $tag_name
6512 );
6513 }
6514
6515 /**
6516 * Returns whether a given element is an HTML Void Element
6517 *
6518 * > area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr
6519 *
6520 * @since 6.4.0
6521 *
6522 * @see https://html.spec.whatwg.org/#void-elements
6523 *
6524 * @param string $tag_name Name of HTML tag to check.
6525 * @return bool Whether the given tag is an HTML Void Element.
6526 */
6527 public static function is_void( $tag_name ): bool {
6528 $tag_name = strtoupper( $tag_name );
6529
6530 return (
6531 'AREA' === $tag_name ||
6532 'BASE' === $tag_name ||
6533 'BASEFONT' === $tag_name || // Obsolete but still treated as void.
6534 'BGSOUND' === $tag_name || // Obsolete but still treated as void.
6535 'BR' === $tag_name ||
6536 'COL' === $tag_name ||
6537 'EMBED' === $tag_name ||
6538 'FRAME' === $tag_name ||
6539 'HR' === $tag_name ||
6540 'IMG' === $tag_name ||
6541 'INPUT' === $tag_name ||
6542 'KEYGEN' === $tag_name || // Obsolete but still treated as void.
6543 'LINK' === $tag_name ||
6544 'META' === $tag_name ||
6545 'PARAM' === $tag_name || // Obsolete but still treated as void.
6546 'SOURCE' === $tag_name ||
6547 'TRACK' === $tag_name ||
6548 'WBR' === $tag_name
6549 );
6550 }
6551
6552 /**
6553 * Gets an encoding from a given string.
6554 *
6555 * This is an algorithm defined in the WHAT-WG specification.
6556 *
6557 * Example:
6558 *
6559 * 'UTF-8' === self::get_encoding( 'utf8' );
6560 * 'UTF-8' === self::get_encoding( " \tUTF-8 " );
6561 * null === self::get_encoding( 'UTF-7' );
6562 * null === self::get_encoding( 'utf8; charset=' );
6563 *
6564 * @see https://encoding.spec.whatwg.org/#concept-encoding-get
6565 *
6566 * @todo As this parser only supports UTF-8, only the UTF-8
6567 * encodings are detected. Add more as desired, but the
6568 * parser will bail on non-UTF-8 encodings.
6569 *
6570 * @since 6.7.0
6571 *
6572 * @param string $label A string which may specify a known encoding.
6573 * @return string|null Known encoding if matched, otherwise null.
6574 */
6575 protected static function get_encoding( string $label ): ?string {
6576 /*
6577 * > Remove any leading and trailing ASCII whitespace from label.
6578 */
6579 $label = trim( $label, " \t\f\r\n" );
6580
6581 /*
6582 * > If label is an ASCII case-insensitive match for any of the labels listed in the
6583 * > table below, then return the corresponding encoding; otherwise return failure.
6584 */
6585 switch ( strtolower( $label ) ) {
6586 case 'unicode-1-1-utf-8':
6587 case 'unicode11utf8':
6588 case 'unicode20utf8':
6589 case 'utf-8':
6590 case 'utf8':
6591 case 'x-unicode20utf8':
6592 return 'UTF-8';
6593
6594 default:
6595 return null;
6596 }
6597 }
6598
6599 /*
6600 * Constants that would pollute the top of the class if they were found there.
6601 */
6602
6603 /**
6604 * Indicates that the next HTML token should be parsed and processed.
6605 *
6606 * @since 6.4.0
6607 *
6608 * @var string
6609 */
6610 const PROCESS_NEXT_NODE = 'process-next-node';
6611
6612 /**
6613 * Indicates that the current HTML token should be reprocessed in the newly-selected insertion mode.
6614 *
6615 * @since 6.4.0
6616 *
6617 * @var string
6618 */
6619 const REPROCESS_CURRENT_NODE = 'reprocess-current-node';
6620
6621 /**
6622 * Indicates that the current HTML token should be processed without advancing the parser.
6623 *
6624 * @since 6.5.0
6625 *
6626 * @var string
6627 */
6628 const PROCESS_CURRENT_NODE = 'process-current-node';
6629
6630 /**
6631 * Indicates that the parser encountered unsupported markup and has bailed.
6632 *
6633 * @since 6.4.0
6634 *
6635 * @var string
6636 */
6637 const ERROR_UNSUPPORTED = 'unsupported';
6638
6639 /**
6640 * Indicates that the parser encountered more HTML tokens than it
6641 * was able to process and has bailed.
6642 *
6643 * @since 6.4.0
6644 *
6645 * @var string
6646 */
6647 const ERROR_EXCEEDED_MAX_BOOKMARKS = 'exceeded-max-bookmarks';
6648
6649 /**
6650 * Unlock code that must be passed into the constructor to create this class.
6651 *
6652 * This class extends the WP_HTML_Tag_Processor, which has a public class
6653 * constructor. Therefore, it's not possible to have a private constructor here.
6654 *
6655 * This unlock code is used to ensure that anyone calling the constructor is
6656 * doing so with a full understanding that it's intended to be a private API.
6657 *
6658 * @access private
6659 */
6660 const CONSTRUCTOR_UNLOCK_CODE = 'Use WP_HTML_Processor::create_fragment() instead of calling the class constructor directly.';
6661}
6662
Ui Ux Design – Teachers Night Out https://cardgames4educators.com Wed, 16 Oct 2024 22:24:18 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://cardgames4educators.com/wp-content/uploads/2024/06/cropped-Card-4-Educators-logo-32x32.png Ui Ux Design – Teachers Night Out https://cardgames4educators.com 32 32 Masters In English How English Speaker https://cardgames4educators.com/masters-in-english-how-english-speaker/ https://cardgames4educators.com/masters-in-english-how-english-speaker/#comments Mon, 27 May 2024 08:54:45 +0000 https://themexriver.com/wp/kadu/?p=1

Erat himenaeos neque id sagittis massa. Hac suscipit pulvinar dignissim platea magnis eu. Don tellus a pharetra inceptos efficitur dui pulvinar. Feugiat facilisis penatibus pulvinar nunc dictumst donec odio platea habitasse. Lacus porta dolor purus elit ante bibendum tortor netus taciti nullam cubilia. Erat per suspendisse placerat morbi egestas pulvinar bibendum sollicitudin nec. Euismod cubilia eleifend velit himenaeos sodales lectus. Leo maximus cras ac porttitor aliquam torquent pulvinar odio volutpat parturient. Quisque risus finibus suspendisse mus purus magnis facilisi condimentum consectetur dui. Curae elit suspendisse cursus vehicula.

Turpis taciti class non vel pretium quis pulvinar tempor lobortis nunc. Libero phasellus parturient sapien volutpat malesuada ornare. Cubilia dignissim sollicitudin rhoncus lacinia maximus. Cras lorem fermentum bibendum pellentesque nisl etiam ligula enim cubilia. Vulputate pede sapien torquent montes tempus malesuada in mattis dis turpis vitae. Porta est tempor ex eget feugiat vulputate ipsum. Justo nec iaculis habitant diam arcu fermentum.

We offer comprehen sive emplo ment services such as assistance wit employer compliance.Our company is your strategic HR partner as instead of HR. john smithson

Cubilia dignissim sollicitudin rhoncus lacinia maximus. Cras lorem fermentum bibendum pellentesque nisl etiam ligula enim cubilia. Vulputate pede sapien torquent montes tempus malesuada in mattis dis turpis vitae.

Exploring Learning Landscapes in Academic

Feugiat facilisis penatibus pulvinar nunc dictumst donec odio platea habitasse. Lacus porta dolor purus elit ante bibendum tortor netus taciti nullam cubilia. Erat per suspendisse placerat morbi egestas pulvinar bibendum sollicitudin nec. Euismod cubilia eleifend velit himenaeos sodales lectus. Leo maximus cras ac porttitor aliquam torquent.

]]>
https://cardgames4educators.com/masters-in-english-how-english-speaker/feed/ 1