HTMLPurifier_Lexer - Code Metrics - Inspection of "Merge pull request #516 from mambax7/feature/purif..." - XOOPS/XoopsCore25 - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( e162f1...95ce61 )

by Richard

created 2017-07-07 19:07 UTC

HTMLPurifier_Lexer B

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	339
Duplicated Lines	0 %

Coupling/Cohesion

Components	1
Dependencies	8

Importance

Changes

Metric	Value
dl	0
loc	339
rs	8.6206
c	0
b	0
f	0
wmc	50
lcom	1
cbo	8

12 Methods

Rating	Name	Size	Complexity
D	create()	80	16
A	__construct()	4	1
A	parseText()	3	1
A	parseAttr()	3	1
C	parseData()	37	8
A	tokenizeHTML()	4	1
A	escapeCDATA()	8	1
A	escapeCommentedCDATA()	8	1
A	removeIEConditional()	8	1
A	CDATACallback()	5	1
C	normalize()	55	13
B	extractBody()	15	5

How to fix Complexity

<?php

/**
 * Forgivingly lexes HTML (SGML-style) markup into tokens.
 *
 * A lexer parses a string of SGML-style markup and converts them into
 * corresponding tokens.  It doesn't check for well-formedness, although its
 * internal mechanism may make this automatic (such as the case of
 * HTMLPurifier_Lexer_DOMLex).  There are several implementations to choose
 * from.
 *
 * A lexer is HTML-oriented: it might work with XML, but it's not
 * recommended, as we adhere to a subset of the specification for optimization
 * reasons. This might change in the future. Also, most tokenizers are not
 * expected to handle DTDs or PIs.
 *
 * This class should not be directly instantiated, but you may use create() to
 * retrieve a default copy of the lexer.  Being a supertype, this class
 * does not actually define any implementation, but offers commonly used
 * convenience functions for subclasses.
 *
 * @note The unit tests will instantiate this class for testing purposes, as
 *       many of the utility functions require a class to be instantiated.
 *       This means that, even though this class is not runnable, it will
 *       not be declared abstract.
 *
 * @par
 *
 * @note
 * We use tokens rather than create a DOM representation because DOM would:
 *
 * @par
 *  -# Require more processing and memory to create,
 *  -# Is not streamable, and
 *  -# Has the entire document structure (html and body not needed).
 *
 * @par
 * However, DOM is helpful in that it makes it easy to move around nodes
 * without a lot of lookaheads to see when a tag is closed. This is a
 * limitation of the token system and some workarounds would be nice.
 */
class HTMLPurifier_Lexer
{

    /**
     * Whether or not this lexer implements line-number/column-number tracking.
     * If it does, set to true.
     */
    public $tracksLineNumbers = false;

    // -- STATIC ----------------------------------------------------------

    /**
     * Retrieves or sets the default Lexer as a Prototype Factory.
     *
     * By default HTMLPurifier_Lexer_DOMLex will be returned. There are
     * a few exceptions involving special features that only DirectLex
     * implements.
     *
     * @note The behavior of this class has changed, rather than accepting
     *       a prototype object, it now accepts a configuration object.
     *       To specify your own prototype, set %Core.LexerImpl to it.
     *       This change in behavior de-singletonizes the lexer object.
     *
     * @param HTMLPurifier_Config $config
     * @return HTMLPurifier_Lexer
     * @throws HTMLPurifier_Exception
     */
    public static function create($config)
    {
        if (!($config instanceof HTMLPurifier_Config)) {
            $lexer = $config;
            trigger_error(
                "Passing a prototype to
                HTMLPurifier_Lexer::create() is deprecated, please instead
                use %Core.LexerImpl",
                E_USER_WARNING
            );
        } else {
            $lexer = $config->get('Core.LexerImpl');
        }

        $needs_tracking =
            $config->get('Core.MaintainLineNumbers') ||
            $config->get('Core.CollectErrors');

        $inst = null;
$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}
        if (is_object($lexer)) {
            $inst = $lexer;
        } else {
            if (is_null($lexer)) {
                do {
                    // auto-detection algorithm
                    if ($needs_tracking) {
                        $lexer = 'DirectLex';
                        break;
                    }

                    if (class_exists('DOMDocument', false) &&
                        method_exists('DOMDocument', 'loadHTML') &&
                        !extension_loaded('domxml')
                    ) {
                        // check for DOM support, because while it's part of the
                        // core, it can be disabled compile time. Also, the PECL
                        // domxml extension overrides the default DOM, and is evil
                        // and nasty and we shan't bother to support it
                        $lexer = 'DOMLex';
                    } else {
                        $lexer = 'DirectLex';
                    }
                } while (0);
            } // do..while so we can break

            // instantiate recognized string names
            switch ($lexer) {
                case 'DOMLex':
                    $inst = new HTMLPurifier_Lexer_DOMLex();
                    break;
                case 'DirectLex':
                    $inst = new HTMLPurifier_Lexer_DirectLex();
                    break;
                case 'PH5P':
                    $inst = new HTMLPurifier_Lexer_PH5P();
                    break;
                default:
                    throw new HTMLPurifier_Exception(
                        "Cannot instantiate unrecognized Lexer type " .
                        htmlspecialchars($lexer)
                    );
            }
        }

        if (!$inst) {
            throw new HTMLPurifier_Exception('No lexer was instantiated');
        }

        // once PHP DOM implements native line numbers, or we
        // hack out something using XSLT, remove this stipulation
        if ($needs_tracking && !$inst->tracksLineNumbers) {
            throw new HTMLPurifier_Exception(
                'Cannot use lexer that does not support line numbers with ' .
                'Core.MaintainLineNumbers or Core.CollectErrors (use DirectLex instead)'
            );
        }

        return $inst;

    }

    // -- CONVENIENCE MEMBERS ---------------------------------------------

    public function __construct()
    {
        $this->_entity_parser = new HTMLPurifier_EntityParser();
class MyClass { }

$x = new MyClass();
$x->foo = true;
    }

    /**
     * Most common entity to raw value conversion table for special entities.
     * @type array
     */
    protected $_special_entity2str =
        array(
            '&quot;' => '"',
            '&amp;' => '&',
            '&lt;' => '<',
            '&gt;' => '>',
            '&#39;' => "'",
            '&#039;' => "'",
            '&#x27;' => "'"
        );

    public function parseText($string, $config) {
        return $this->parseData($string, false, $config);
    }

    public function parseAttr($string, $config) {
        return $this->parseData($string, true, $config);
    }

    /**
     * Parses special entities into the proper characters.
     *
     * This string will translate escaped versions of the special characters
     * into the correct ones.
     *
     * @param string $string String character data to be parsed.
     * @return string Parsed character data.
     */
    public function parseData($string, $is_attr, $config)
    {
        // following functions require at least one character
        if ($string === '') {
            return '';
        }

        // subtracts amps that cannot possibly be escaped
        $num_amp = substr_count($string, '&') - substr_count($string, '& ') -
            ($string[strlen($string) - 1] === '&' ? 1 : 0);

        if (!$num_amp) {
            return $string;
        } // abort if no entities
        $num_esc_amp = substr_count($string, '&amp;');
        $string = strtr($string, $this->_special_entity2str);

        // code duplication for sake of optimization, see above
        $num_amp_2 = substr_count($string, '&') - substr_count($string, '& ') -
            ($string[strlen($string) - 1] === '&' ? 1 : 0);

        if ($num_amp_2 <= $num_esc_amp) {
            return $string;
        }

        // hmm... now we have some uncommon entities. Use the callback.
        if ($config->get('Core.LegacyEntityDecoder')) {
        $string = $this->_entity_parser->substituteSpecialEntities($string);
        } else {
            if ($is_attr) {
                $string = $this->_entity_parser->substituteAttrEntities($string);
            } else {
                $string = $this->_entity_parser->substituteTextEntities($string);
            }
        }
        return $string;
    }

    /**
     * Lexes an HTML string into tokens.
     * @param $string String HTML.
     * @param HTMLPurifier_Config $config
     * @param HTMLPurifier_Context $context
     * @return HTMLPurifier_Token[] array representation of HTML.

     */
    public function tokenizeHTML($string, $config, $context)
    {
        trigger_error('Call to abstract class', E_USER_ERROR);
    }

    /**
     * Translates CDATA sections into regular sections (through escaping).
     * @param string $string HTML string to process.
     * @return string HTML with CDATA sections escaped.
     */
    protected static function escapeCDATA($string)
    {
        return preg_replace_callback(
            '/<!\[CDATA\[(.+?)\]\]>/s',
            array('HTMLPurifier_Lexer', 'CDATACallback'),
            $string
        );
    }

    /**
     * Special CDATA case that is especially convoluted for <script>
     * @param string $string HTML string to process.
     * @return string HTML with CDATA sections escaped.
     */
    protected static function escapeCommentedCDATA($string)
    {
        return preg_replace_callback(
            '#<!--//--><!\[CDATA\[//><!--(.+?)//--><!\]\]>#s',
            array('HTMLPurifier_Lexer', 'CDATACallback'),
            $string
        );
    }

    /**
     * Special Internet Explorer conditional comments should be removed.
     * @param string $string HTML string to process.
     * @return string HTML with conditional comments removed.
     */
    protected static function removeIEConditional($string)
    {
        return preg_replace(
            '#<!--\[if [^>]+\]>.*?<!\[endif\]-->#si', // probably should generalize for all strings
            '',
            $string
        );
    }

    /**
     * Callback function for escapeCDATA() that does the work.
     *
     * @warning Though this is public in order to let the callback happen,
     *          calling it directly is not recommended.
     * @param array $matches PCRE matches array, with index 0 the entire match
     *                  and 1 the inside of the CDATA section.
     * @return string Escaped internals of the CDATA section.
     */
    protected static function CDATACallback($matches)
    {
        // not exactly sure why the character set is needed, but whatever
        return htmlspecialchars($matches[1], ENT_COMPAT, 'UTF-8');
    }

    /**
     * Takes a piece of HTML and normalizes it by converting entities, fixing
     * encoding, extracting bits, and other good stuff.
     * @param string $html HTML.
     * @param HTMLPurifier_Config $config
     * @param HTMLPurifier_Context $context
     * @return string
     * @todo Consider making protected
     */
    public function normalize($html, $config, $context)
    {
        // normalize newlines to \n
        if ($config->get('Core.NormalizeNewlines')) {
            $html = str_replace("\r\n", "\n", $html);
            $html = str_replace("\r", "\n", $html);
        }

        if ($config->get('HTML.Trusted')) {
            // escape convoluted CDATA
            $html = $this->escapeCommentedCDATA($html);
        }

        // escape CDATA
        $html = $this->escapeCDATA($html);

        $html = $this->removeIEConditional($html);

        // extract body from document if applicable
        if ($config->get('Core.ConvertDocumentToFragment')) {
            $e = false;
            if ($config->get('Core.CollectErrors')) {
                $e =& $context->get('ErrorCollector');
            }
            $new_html = $this->extractBody($html);
            if ($e && $new_html != $html) {
                $e->send(E_WARNING, 'Lexer: Extracted body');
            }
            $html = $new_html;
        }

        // expand entities that aren't the big five
        if ($config->get('Core.LegacyEntityDecoder')) {
        $html = $this->_entity_parser->substituteNonSpecialEntities($html);
        }

        // clean into wellformed UTF-8 string for an SGML context: this has
        // to be done after entity expansion because the entities sometimes
        // represent non-SGML characters (horror, horror!)
        $html = HTMLPurifier_Encoder::cleanUTF8($html);

        // if processing instructions are to removed, remove them now
        if ($config->get('Core.RemoveProcessingInstructions')) {
            $html = preg_replace('#<\?.+?\?>#s', '', $html);
        }

        $hidden_elements = $config->get('Core.HiddenElements');
        if ($config->get('Core.AggressivelyRemoveScript') &&
            !($config->get('HTML.Trusted') || !$config->get('Core.RemoveScriptContents')
            || empty($hidden_elements["script"]))) {
            $html = preg_replace('#<script[^>]*>.*?</script>#i', '', $html);
        }

        return $html;
    }

    /**
     * Takes a string of HTML (fragment or document) and returns the content
     * @todo Consider making protected
     */
    public function extractBody($html)

    {
        $matches = array();
        $result = preg_match('|(.*?)<body[^>]*>(.*)</body>|is', $html, $matches);
        if ($result) {
            // Make sure it's not in a comment
            $comment_start = strrpos($matches[1], '<!--');
            $comment_end   = strrpos($matches[1], '-->');
            if ($comment_start === false ||
                ($comment_end !== false && $comment_end > $comment_start)) {
                return $matches[2];
            }
        }
        return $html;
    }
}

// vim: et sw=4 sts=4


1			<?php
2
3			/**
4			* Forgivingly lexes HTML (SGML-style) markup into tokens.
5			*
6			* A lexer parses a string of SGML-style markup and converts them into
7			* corresponding tokens. It doesn't check for well-formedness, although its
8			* internal mechanism may make this automatic (such as the case of
9			* HTMLPurifier_Lexer_DOMLex). There are several implementations to choose
10			* from.
11			*
12			* A lexer is HTML-oriented: it might work with XML, but it's not
13			* recommended, as we adhere to a subset of the specification for optimization
14			* reasons. This might change in the future. Also, most tokenizers are not
15			* expected to handle DTDs or PIs.
16			*
17			* This class should not be directly instantiated, but you may use create() to
18			* retrieve a default copy of the lexer. Being a supertype, this class
19			* does not actually define any implementation, but offers commonly used
20			* convenience functions for subclasses.
21			*
22			* @note The unit tests will instantiate this class for testing purposes, as
23			* many of the utility functions require a class to be instantiated.
24			* This means that, even though this class is not runnable, it will
25			* not be declared abstract.
26			*
27			* @par
28			*
29			* @note
30			* We use tokens rather than create a DOM representation because DOM would:
31			*
32			* @par
33			* -# Require more processing and memory to create,
34			* -# Is not streamable, and
35			* -# Has the entire document structure (html and body not needed).
36			*
37			* @par
38			* However, DOM is helpful in that it makes it easy to move around nodes
39			* without a lot of lookaheads to see when a tag is closed. This is a
40			* limitation of the token system and some workarounds would be nice.
41			*/
42			class HTMLPurifier_Lexer
43			{
44
45			/**
46			* Whether or not this lexer implements line-number/column-number tracking.
47			* If it does, set to true.
48			*/
49			public $tracksLineNumbers = false;
50
51			// -- STATIC ----------------------------------------------------------
52
53			/**
54			* Retrieves or sets the default Lexer as a Prototype Factory.
55			*
56			* By default HTMLPurifier_Lexer_DOMLex will be returned. There are
57			* a few exceptions involving special features that only DirectLex
58			* implements.
59			*
60			* @note The behavior of this class has changed, rather than accepting
61			* a prototype object, it now accepts a configuration object.
62			* To specify your own prototype, set %Core.LexerImpl to it.
63			* This change in behavior de-singletonizes the lexer object.
64			*
65			* @param HTMLPurifier_Config $config
66			* @return HTMLPurifier_Lexer
67			* @throws HTMLPurifier_Exception
68			*/
69			public static function create($config)
70			{
71			if (!($config instanceof HTMLPurifier_Config)) {
72			$lexer = $config;
73			trigger_error(
74			"Passing a prototype to
75			HTMLPurifier_Lexer::create() is deprecated, please instead
76			use %Core.LexerImpl",
77			E_USER_WARNING
78			);
79			} else {
80			$lexer = $config->get('Core.LexerImpl');
81			}
82
83			$needs_tracking =
84			$config->get('Core.MaintainLineNumbers') \|\|
85			$config->get('Core.CollectErrors');
86
87			$inst = null;
			0 ignored issues – show Unused Code introduced 2016-03-07 05:16 UTC by Report Bug Copy Issue Report `$inst` is not used, you could remove the assignment. This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently. $myVar = 'Value'; $higher = false; if (rand(1, 6) > 3) { $higher = true; } else { $higher = false; } Both the `$myVar` assignment in line 1 and the `$higher` assignment in line 2 are dead. The first because `$myVar` is never used and the second because `$higher` is always overwritten for every possible time line. Loading history...
88			if (is_object($lexer)) {
89			$inst = $lexer;
90			} else {
91			if (is_null($lexer)) {
92			do {
93			// auto-detection algorithm
94			if ($needs_tracking) {
95			$lexer = 'DirectLex';
96			break;
97			}
98
99			if (class_exists('DOMDocument', false) &&
100			method_exists('DOMDocument', 'loadHTML') &&
101			!extension_loaded('domxml')
102			) {
103			// check for DOM support, because while it's part of the
104			// core, it can be disabled compile time. Also, the PECL
105			// domxml extension overrides the default DOM, and is evil
106			// and nasty and we shan't bother to support it
107			$lexer = 'DOMLex';
108			} else {
109			$lexer = 'DirectLex';
110			}
111			} while (0);
112			} // do..while so we can break
113
114			// instantiate recognized string names
115			switch ($lexer) {
116			case 'DOMLex':
117			$inst = new HTMLPurifier_Lexer_DOMLex();
118			break;
119			case 'DirectLex':
120			$inst = new HTMLPurifier_Lexer_DirectLex();
121			break;
122			case 'PH5P':
123			$inst = new HTMLPurifier_Lexer_PH5P();
124			break;
125			default:
126			throw new HTMLPurifier_Exception(
127			"Cannot instantiate unrecognized Lexer type " .
128			htmlspecialchars($lexer)
129			);
130			}
131			}
132
133			if (!$inst) {
134			throw new HTMLPurifier_Exception('No lexer was instantiated');
135			}
136
137			// once PHP DOM implements native line numbers, or we
138			// hack out something using XSLT, remove this stipulation
139			if ($needs_tracking && !$inst->tracksLineNumbers) {
140			throw new HTMLPurifier_Exception(
141			'Cannot use lexer that does not support line numbers with ' .
142			'Core.MaintainLineNumbers or Core.CollectErrors (use DirectLex instead)'
143			);
144			}
145
146			return $inst;
147
148			}
149
150			// -- CONVENIENCE MEMBERS ---------------------------------------------
151
152			public function __construct()
153			{
154			$this->_entity_parser = new HTMLPurifier_EntityParser();
			0 ignored issues – show Bug introduced 2016-03-07 05:16 UTC by Report Bug Copy Issue Report The property `_entity_parser` does not exist. Did you maybe forget to declare it? In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code: class MyClass { } $x = new MyClass(); $x->foo = true; Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion: class MyClass { public $foo; } $x = new MyClass(); $x->foo = true; Loading history...
155			}
156
157			/**
158			* Most common entity to raw value conversion table for special entities.
159			* @type array
160			*/
161			protected $_special_entity2str =
162			array(
163			'"' => '"',
164			'&' => '&',
165			'<' => '<',
166			'>' => '>',
167			''' => "'",
168			''' => "'",
169			''' => "'"
170			);
171
172			public function parseText($string, $config) {
173			return $this->parseData($string, false, $config);
174			}
175
176			public function parseAttr($string, $config) {
177			return $this->parseData($string, true, $config);
178			}
179
180			/**
181			* Parses special entities into the proper characters.
182			*
183			* This string will translate escaped versions of the special characters
184			* into the correct ones.
185			*
186			* @param string $string String character data to be parsed.
187			* @return string Parsed character data.
188			*/
189			public function parseData($string, $is_attr, $config)
190			{
191			// following functions require at least one character
192			if ($string === '') {
193			return '';
194			}
195
196			// subtracts amps that cannot possibly be escaped
197			$num_amp = substr_count($string, '&') - substr_count($string, '& ') -
198			($string[strlen($string) - 1] === '&' ? 1 : 0);
199
200			if (!$num_amp) {
201			return $string;
202			} // abort if no entities
203			$num_esc_amp = substr_count($string, '&');
204			$string = strtr($string, $this->_special_entity2str);
205
206			// code duplication for sake of optimization, see above
207			$num_amp_2 = substr_count($string, '&') - substr_count($string, '& ') -
208			($string[strlen($string) - 1] === '&' ? 1 : 0);
209
210			if ($num_amp_2 <= $num_esc_amp) {
211			return $string;
212			}
213
214			// hmm... now we have some uncommon entities. Use the callback.
215			if ($config->get('Core.LegacyEntityDecoder')) {
216			$string = $this->_entity_parser->substituteSpecialEntities($string);
217			} else {
218			if ($is_attr) {
219			$string = $this->_entity_parser->substituteAttrEntities($string);
220			} else {
221			$string = $this->_entity_parser->substituteTextEntities($string);
222			}
223			}
224			return $string;
225			}
226
227			/**
228			* Lexes an HTML string into tokens.
229			* @param $string String HTML.
230			* @param HTMLPurifier_Config $config
231			* @param HTMLPurifier_Context $context
232			* @return HTMLPurifier_Token[] array representation of HTML.
			0 ignored issues – show Documentation introduced 2016-03-07 05:16 UTC by Report Bug Copy Issue Report Should the return type not be `HTMLPurifier_Token[]\|null`? This check compares the return type specified in the `@return` annotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch. Loading history...
233			*/
234			public function tokenizeHTML($string, $config, $context)
235			{
236			trigger_error('Call to abstract class', E_USER_ERROR);
237			}
238
239			/**
240			* Translates CDATA sections into regular sections (through escaping).
241			* @param string $string HTML string to process.
242			* @return string HTML with CDATA sections escaped.
243			*/
244			protected static function escapeCDATA($string)
245			{
246			return preg_replace_callback(
247			'/<!\[CDATA\[(.+?)\]\]>/s',
248			array('HTMLPurifier_Lexer', 'CDATACallback'),
249			$string
250			);
251			}
252
253			/**
254			* Special CDATA case that is especially convoluted for <script>
255			* @param string $string HTML string to process.
256			* @return string HTML with CDATA sections escaped.
257			*/
258			protected static function escapeCommentedCDATA($string)
259			{
260			return preg_replace_callback(
261			'#<!--//--><!\[CDATA\[//><!--(.+?)//--><!\]\]>#s',
262			array('HTMLPurifier_Lexer', 'CDATACallback'),
263			$string
264			);
265			}
266
267			/**
268			* Special Internet Explorer conditional comments should be removed.
269			* @param string $string HTML string to process.
270			* @return string HTML with conditional comments removed.
271			*/
272			protected static function removeIEConditional($string)
273			{
274			return preg_replace(
275			'#<!--\[if [^>]+\]>.*?<!\[endif\]-->#si', // probably should generalize for all strings
276			'',
277			$string
278			);
279			}
280
281			/**
282			* Callback function for escapeCDATA() that does the work.
283			*
284			* @warning Though this is public in order to let the callback happen,
285			* calling it directly is not recommended.
286			* @param array $matches PCRE matches array, with index 0 the entire match
287			* and 1 the inside of the CDATA section.
288			* @return string Escaped internals of the CDATA section.
289			*/
290			protected static function CDATACallback($matches)
291			{
292			// not exactly sure why the character set is needed, but whatever
293			return htmlspecialchars($matches[1], ENT_COMPAT, 'UTF-8');
294			}
295
296			/**
297			* Takes a piece of HTML and normalizes it by converting entities, fixing
298			* encoding, extracting bits, and other good stuff.
299			* @param string $html HTML.
300			* @param HTMLPurifier_Config $config
301			* @param HTMLPurifier_Context $context
302			* @return string
303			* @todo Consider making protected
304			*/
305			public function normalize($html, $config, $context)
306			{
307			// normalize newlines to \n
308			if ($config->get('Core.NormalizeNewlines')) {
309			$html = str_replace("\r\n", "\n", $html);
310			$html = str_replace("\r", "\n", $html);
311			}
312
313			if ($config->get('HTML.Trusted')) {
314			// escape convoluted CDATA
315			$html = $this->escapeCommentedCDATA($html);
316			}
317
318			// escape CDATA
319			$html = $this->escapeCDATA($html);
320
321			$html = $this->removeIEConditional($html);
322
323			// extract body from document if applicable
324			if ($config->get('Core.ConvertDocumentToFragment')) {
325			$e = false;
326			if ($config->get('Core.CollectErrors')) {
327			$e =& $context->get('ErrorCollector');
328			}
329			$new_html = $this->extractBody($html);
330			if ($e && $new_html != $html) {
331			$e->send(E_WARNING, 'Lexer: Extracted body');
332			}
333			$html = $new_html;
334			}
335
336			// expand entities that aren't the big five
337			if ($config->get('Core.LegacyEntityDecoder')) {
338			$html = $this->_entity_parser->substituteNonSpecialEntities($html);
339			}
340
341			// clean into wellformed UTF-8 string for an SGML context: this has
342			// to be done after entity expansion because the entities sometimes
343			// represent non-SGML characters (horror, horror!)
344			$html = HTMLPurifier_Encoder::cleanUTF8($html);
345
346			// if processing instructions are to removed, remove them now
347			if ($config->get('Core.RemoveProcessingInstructions')) {
348			$html = preg_replace('#<\?.+?\?>#s', '', $html);
349			}
350
351			$hidden_elements = $config->get('Core.HiddenElements');
352			if ($config->get('Core.AggressivelyRemoveScript') &&
353			!($config->get('HTML.Trusted') \|\| !$config->get('Core.RemoveScriptContents')
354			\|\| empty($hidden_elements["script"]))) {
355			$html = preg_replace('#<script[^>]>.?</script>#i', '', $html);
356			}
357
358			return $html;
359			}
360
361			/**
362			* Takes a string of HTML (fragment or document) and returns the content
363			* @todo Consider making protected
364			*/
365			public function extractBody($html)
			0 ignored issues – show Documentation introduced 2016-03-07 05:16 UTC by Report Bug Copy Issue Report The return type could not be reliably inferred; please add a `@return` annotation. Our type inference engine in quite powerful, but sometimes the code does not provide enough clues to go by. In these cases we request you to add a `@return` annotation as described here. Loading history...
366			{
367			$matches = array();
368			$result = preg_match('\|(.?)<body[^>]>(.*)</body>\|is', $html, $matches);
369			if ($result) {
370			// Make sure it's not in a comment
371			$comment_start = strrpos($matches[1], '<!--');
372			$comment_end = strrpos($matches[1], '-->');
373			if ($comment_start === false \|\|
374			($comment_end !== false && $comment_end > $comment_start)) {
375			return $matches[2];
376			}
377			}
378			return $html;
379			}
380			}
381
382			// vim: et sw=4 sts=4
383

XOOPS / XoopsCore25

Push — master ( e162f1...95ce61 )

HTMLPurifier_Lexer B

Complexity

Size/Duplication

Coupling/Cohesion

Importance

12 Methods

How to fix Complexity

Complex Class

Duplication Side-by-Side

Filter issues like