Test Failed
Branch master (137376)
by Tymoteusz
20:39
created

RteHtmlParser   F

Complexity

Total Complexity 200

Size/Duplication

Total Lines 1281
Duplicated Lines 2.26 %

Importance

Changes 0
Metric Value
dl 29
loc 1281
rs 0.6314
c 0
b 0
f 0
wmc 200

25 Methods

Rating   Name   Duplication   Size   Complexity  
C setDivTags() 0 31 8
B TS_images_rte() 0 29 6
B processContentWithinParagraph() 0 30 6
C urlInfoForLinkTags() 0 56 15
B removeBrokenLinkMarkers() 0 26 6
B TS_links_db() 0 30 4
A resolveAppliedTransformationModes() 0 15 2
C getKeepTags() 0 57 10
C applyPlainImageModeSettings() 6 25 8
A streamlineLineBreaksForProcessing() 0 3 1
B TS_AtagToAbs() 0 21 5
C markBrokenLinks() 0 33 7
C TS_transform_rte() 3 49 17
D TS_images_db() 0 147 33
A HTMLcleaner_db() 0 6 1
F RTE_transform() 17 110 21
C TS_transform_db() 3 60 17
A runHtmlParserIfConfigured() 0 7 2
C divideIntoLines() 0 43 11
A sanitizeLineBreaksForContentOnly() 0 6 1
A streamlineLineBreaksAfterProcessing() 0 6 1
B transformStyledATags() 0 18 5
B TS_links_rte() 0 60 8
B getWHFromAttribs() 0 22 4
A init() 0 4 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like RteHtmlParser often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use RteHtmlParser, and based on these observations, apply Extract Interface, too.

1
<?php
2
namespace TYPO3\CMS\Core\Html;
3
4
/*
5
 * This file is part of the TYPO3 CMS project.
6
 *
7
 * It is free software; you can redistribute it and/or modify it under
8
 * the terms of the GNU General Public License, either version 2
9
 * of the License, or any later version.
10
 *
11
 * For the full copyright and license information, please read the
12
 * LICENSE.txt file that was distributed with this source code.
13
 *
14
 * The TYPO3 project - inspiring people to share!
15
 */
16
17
use Psr\Log\LoggerAwareInterface;
18
use Psr\Log\LoggerAwareTrait;
19
use TYPO3\CMS\Backend\Utility\BackendUtility;
20
use TYPO3\CMS\Core\LinkHandling\Exception\UnknownLinkHandlerException;
21
use TYPO3\CMS\Core\LinkHandling\LinkService;
22
use TYPO3\CMS\Core\Resource;
23
use TYPO3\CMS\Core\Utility\GeneralUtility;
24
use TYPO3\CMS\Frontend\Service\TypoLinkCodecService;
25
26
/**
27
 * Class for parsing HTML for the Rich Text Editor. (also called transformations)
28
 *
29
 * Concerning line breaks:
30
 * Regardless if LF (Unix-style) or CRLF (Windows) was put in, the HtmlParser works with LFs and migrates all
31
 * line breaks to LFs internally, however when all transformations are done, all LFs are transformed to CRLFs.
32
 * This means: RteHtmlParser always returns CRLFs to be maximum compatible with all formats.
33
 */
34
class RteHtmlParser extends HtmlParser implements LoggerAwareInterface
35
{
36
    use LoggerAwareTrait;
37
38
    /**
39
     * List of elements that are not wrapped into a "p" tag while doing the transformation.
40
     * @var string
41
     */
42
    public $blockElementList = 'DIV,TABLE,BLOCKQUOTE,PRE,UL,OL,H1,H2,H3,H4,H5,H6,ADDRESS,DL,DD,HEADER,SECTION,FOOTER,NAV,ARTICLE,ASIDE';
43
44
    /**
45
     * List of all tags that are allowed by default
46
     * @var string
47
     */
48
    protected $defaultAllowedTagsList = 'b,i,u,a,img,br,div,center,pre,font,hr,sub,sup,p,strong,em,li,ul,ol,blockquote,strike,span';
49
50
    /**
51
     * Set this to the pid of the record manipulated by the class.
52
     *
53
     * @var int
54
     */
55
    public $recPid = 0;
56
57
    /**
58
     * Element reference [table]:[field], eg. "tt_content:bodytext"
59
     *
60
     * @var string
61
     */
62
    public $elRef = '';
63
64
    /**
65
     * Current Page TSConfig
66
     *
67
     * @var array
68
     */
69
    public $tsConfig = [];
70
71
    /**
72
     * Set to the TSconfig options coming from Page TSconfig
73
     *
74
     * @var array
75
     */
76
    public $procOptions = [];
77
78
    /**
79
     * Run-away brake for recursive calls.
80
     *
81
     * @var int
82
     */
83
    public $TS_transform_db_safecounter = 100;
84
85
    /**
86
     * Data caching for processing function
87
     *
88
     * @var array
89
     */
90
    public $getKeepTags_cache = [];
91
92
    /**
93
     * Storage of the allowed CSS class names in the RTE
94
     *
95
     * @var array
96
     */
97
    public $allowedClasses = [];
98
99
    /**
100
     * A list of HTML attributes for <p> tags. Because <p> tags are wrapped currently in a special handling,
101
     * they have a special place for configuration via 'proc.keepPDIVattribs'
102
     *
103
     * @var array
104
     */
105
    protected $allowedAttributesForParagraphTags = [
106
        'class',
107
        'align',
108
        'id',
109
        'title',
110
        'dir',
111
        'lang',
112
        'xml:lang',
113
        'itemscope',
114
        'itemtype',
115
        'itemprop'
116
    ];
117
118
    /**
119
     * Any tags that are allowed outside of <p> sections - usually similar to the block elements
120
     * plus some special tags like <hr> and <img> (if images are allowed).
121
     * Completely overrideable via 'proc.allowTagsOutside'
122
     *
123
     * @var array
124
     */
125
    protected $allowedTagsOutsideOfParagraphs = [
126
        'address',
127
        'article',
128
        'aside',
129
        'blockquote',
130
        'div',
131
        'footer',
132
        'header',
133
        'hr',
134
        'nav',
135
        'section'
136
    ];
137
138
    /**
139
     * Initialize, setting element reference and record PID
140
     *
141
     * @param string $elRef Element reference, eg "tt_content:bodytext
142
     * @param int $recPid PID of the record (page id)
143
     */
144
    public function init($elRef = '', $recPid = 0)
145
    {
146
        $this->recPid = $recPid;
147
        $this->elRef = $elRef;
148
    }
149
150
    /**********************************************
151
     *
152
     * Main function
153
     *
154
     **********************************************/
155
    /**
156
     * Transform value for RTE based on specConf in the direction specified by $direction (rte/db)
157
     * This is the main function called from DataHandler and transfer data classes
158
     *
159
     * @param string $value Input value
160
     * @param null $_ unused
0 ignored issues
show
Documentation Bug introduced by
Are you sure the doc-type for parameter $_ is correct as it would always require null to be passed?
Loading history...
161
     * @param string $direction Direction of the transformation. Two keywords are allowed; "db" or "rte". If "db" it means the transformation will clean up content coming from the Rich Text Editor and goes into the database. The other direction, "rte", is of course when content is coming from database and must be transformed to fit the RTE.
162
     * @param array $thisConfig Parsed TypoScript content configuring the RTE, probably coming from Page TSconfig.
163
     * @return string Output value
164
     */
165
    public function RTE_transform($value, $_ = null, $direction = 'rte', $thisConfig = [])
0 ignored issues
show
Unused Code introduced by
The parameter $_ is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

165
    public function RTE_transform($value, /** @scrutinizer ignore-unused */ $_ = null, $direction = 'rte', $thisConfig = [])

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
166
    {
167
        $this->tsConfig = $thisConfig;
168
        $this->procOptions = (array)$thisConfig['proc.'];
169 View Code Duplication
        if (isset($this->procOptions['allowedClasses.'])) {
170
            $this->allowedClasses = (array)$this->procOptions['allowedClasses.'];
171
        } else {
172
            $this->allowedClasses = GeneralUtility::trimExplode(',', $this->procOptions['allowedClasses'], true);
173
        }
174
175
        // Dynamic configuration of blockElementList
176
        if ($this->procOptions['blockElementList']) {
177
            $this->blockElementList = $this->procOptions['blockElementList'];
178
        }
179
180
        // Define which attributes are allowed on <p> tags
181 View Code Duplication
        if (isset($this->procOptions['allowAttributes.'])) {
182
            $this->allowedAttributesForParagraphTags = $this->procOptions['allowAttributes.'];
183
        } elseif (isset($this->procOptions['keepPDIVattribs'])) {
184
            $this->allowedAttributesForParagraphTags = GeneralUtility::trimExplode(',', strtolower($this->procOptions['keepPDIVattribs']), true);
185
        }
186
        // Override tags which are allowed outside of <p> tags
187 View Code Duplication
        if (isset($this->procOptions['allowTagsOutside'])) {
188
            if (!isset($this->procOptions['allowTagsOutside.'])) {
189
                $this->allowedTagsOutsideOfParagraphs = GeneralUtility::trimExplode(',', strtolower($this->procOptions['allowTagsOutside']), true);
190
            } else {
191
                $this->allowedTagsOutsideOfParagraphs = (array)$this->procOptions['allowTagsOutside.'];
192
            }
193
        }
194
195
        // Setting modes / transformations to be called
196
        if ((string)$this->procOptions['overruleMode'] !== '') {
197
            $modes = GeneralUtility::trimExplode(',', $this->procOptions['overruleMode']);
198
        } else {
199
            $modes = [$this->procOptions['mode']];
200
        }
201
        $modes = $this->resolveAppliedTransformationModes($direction, $modes);
202
203
        $value = $this->streamlineLineBreaksForProcessing($value);
204
205
        // If an entry HTML cleaner was configured, pass the content through the HTMLcleaner
206
        $value = $this->runHtmlParserIfConfigured($value, 'entryHTMLparser_' . $direction);
207
208
        // Traverse modes
209
        foreach ($modes as $cmd) {
210
            if ($direction === 'db') {
211
                // Checking for user defined transformation:
212
                if ($className = $GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['transformation'][$cmd]) {
213
                    $_procObj = GeneralUtility::makeInstance($className);
214
                    $_procObj->pObj = $this;
215
                    $_procObj->transformationKey = $cmd;
216
                    $value = $_procObj->transform_db($value, $this);
217
                } else {
218
                    // ... else use defaults:
219
                    switch ($cmd) {
220
                        case 'detectbrokenlinks':
221
                            $value = $this->removeBrokenLinkMarkers($value);
222
                            break;
223
                        case 'ts_images':
224
                            $value = $this->TS_images_db($value);
225
                            break;
226
                        case 'ts_links':
227
                            $value = $this->TS_links_db($value);
228
                            break;
229
                        case 'css_transform':
230
                            // Transform empty paragraphs into spacing paragraphs
231
                            $value = str_replace('<p></p>', '<p>&nbsp;</p>', $value);
232
                            // Double any trailing spacing paragraph so that it does not get removed by divideIntoLines()
233
                            $value = preg_replace('/<p>&nbsp;<\/p>$/', '<p>&nbsp;</p>' . '<p>&nbsp;</p>', $value);
234
                            $value = $this->TS_transform_db($value);
235
                            break;
236
                        default:
237
                            // Do nothing
238
                    }
239
                }
240
            } elseif ($direction === 'rte') {
241
                // Checking for user defined transformation:
242
                if ($className = $GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['transformation'][$cmd]) {
243
                    $_procObj = GeneralUtility::makeInstance($className);
244
                    $_procObj->pObj = $this;
245
                    $value = $_procObj->transform_rte($value, $this);
246
                } else {
247
                    // ... else use defaults:
248
                    switch ($cmd) {
249
                        case 'detectbrokenlinks':
250
                            $value = $this->markBrokenLinks($value);
251
                            break;
252
                        case 'ts_images':
253
                            $value = $this->TS_images_rte($value);
254
                            break;
255
                        case 'ts_links':
256
                            $value = $this->TS_links_rte($value);
257
                            break;
258
                        case 'css_transform':
259
                            $value = $this->TS_transform_rte($value);
260
                            break;
261
                        default:
262
                            // Do nothing
263
                    }
264
                }
265
            }
266
        }
267
268
        // If an exit HTML cleaner was configured, pass the content through the HTMLcleaner
269
        $value = $this->runHtmlParserIfConfigured($value, 'exitHTMLparser_' . $direction);
270
271
        // Final clean up of linebreaks
272
        $value = $this->streamlineLineBreaksAfterProcessing($value);
273
274
        return $value;
275
    }
276
277
    /**
278
     * Ensures what transformation modes should be executed, and that they are only executed once.
279
     *
280
     * @param string $direction
281
     * @param array $modes
282
     * @return array the resolved transformation modes
283
     */
284
    protected function resolveAppliedTransformationModes(string $direction, array $modes)
285
    {
286
        $modeList = implode(',', $modes);
287
288
        // Replace the shortcut "default" with all custom modes
289
        $modeList = str_replace('default', 'detectbrokenlinks,css_transform,ts_images,ts_links', $modeList);
290
291
        // Make list unique
292
        $modes = array_unique(GeneralUtility::trimExplode(',', $modeList, true));
293
        // Reverse order if direction is "rte"
294
        if ($direction === 'rte') {
295
            $modes = array_reverse($modes);
296
        }
297
298
        return $modes;
299
    }
300
301
    /**
302
     * Runs the HTML parser if it is configured
303
     * Getting additional HTML cleaner configuration. These are applied either before or after the main transformation
304
     * is done and thus totally independent processing options you can set up.
305
     *
306
     * This is only possible via TSconfig (procOptions) currently.
307
     *
308
     * @param string $content
309
     * @param string $configurationDirective used to look up in the procOptions if enabled, and then fetch the
310
     * @return string the processed content
311
     */
312
    protected function runHtmlParserIfConfigured($content, $configurationDirective)
313
    {
314
        if ($this->procOptions[$configurationDirective]) {
315
            list($keepTags, $keepNonMatchedTags, $hscMode, $additionalConfiguration) = $this->HTMLparserConfig($this->procOptions[$configurationDirective . '.']);
316
            $content = $this->HTMLcleaner($content, $keepTags, $keepNonMatchedTags, $hscMode, $additionalConfiguration);
317
        }
318
        return $content;
319
    }
320
321
    /************************************
322
     *
323
     * Specific RTE TRANSFORMATION functions
324
     *
325
     *************************************/
326
    /**
327
     * Transformation handler: 'ts_images' / direction: "db"
328
     * Processing images inserted in the RTE.
329
     * This is used when content goes from the RTE to the database.
330
     * Images inserted in the RTE has an absolute URL applied to the src attribute. This URL is converted to a relative URL
331
     * If it turns out that the URL is from another website than the current the image is read from that external URL and moved to the local server.
332
     * Also "magic" images are processed here.
333
     *
334
     * @param string $value The content from RTE going to Database
335
     * @return string Processed content
336
     */
337
    public function TS_images_db($value)
338
    {
339
        // Split content by <img> tags and traverse the resulting array for processing:
340
        $imgSplit = $this->splitTags('img', $value);
341
        if (count($imgSplit) > 1) {
342
            $siteUrl = GeneralUtility::getIndpEnv('TYPO3_SITE_URL');
343
            $sitePath = str_replace(GeneralUtility::getIndpEnv('TYPO3_REQUEST_HOST'), '', $siteUrl);
344
            /** @var $resourceFactory Resource\ResourceFactory */
345
            $resourceFactory = Resource\ResourceFactory::getInstance();
346
            /** @var $magicImageService Resource\Service\MagicImageService */
347
            $magicImageService = GeneralUtility::makeInstance(Resource\Service\MagicImageService::class);
348
            $magicImageService->setMagicImageMaximumDimensions($this->tsConfig);
349
            foreach ($imgSplit as $k => $v) {
350
                // Image found, do processing:
351
                if ($k % 2) {
352
                    // Get attributes
353
                    list($attribArray) = $this->get_tag_attributes($v, true);
354
                    // It's always an absolute URL coming from the RTE into the Database.
355
                    $absoluteUrl = trim($attribArray['src']);
356
                    // Make path absolute if it is relative and we have a site path which is not '/'
357
                    $pI = pathinfo($absoluteUrl);
358
                    if ($sitePath && !$pI['scheme'] && GeneralUtility::isFirstPartOfStr($absoluteUrl, $sitePath)) {
359
                        // If site is in a subpath (eg. /~user_jim/) this path needs to be removed because it will be added with $siteUrl
360
                        $absoluteUrl = substr($absoluteUrl, strlen($sitePath));
361
                        $absoluteUrl = $siteUrl . $absoluteUrl;
0 ignored issues
show
Bug introduced by
Are you sure $absoluteUrl of type false|string can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

361
                        $absoluteUrl = $siteUrl . /** @scrutinizer ignore-type */ $absoluteUrl;
Loading history...
362
                    }
363
                    // Image dimensions set in the img tag, if any
364
                    $imgTagDimensions = $this->getWHFromAttribs($attribArray);
365
                    if ($imgTagDimensions[0]) {
366
                        $attribArray['width'] = $imgTagDimensions[0];
367
                    }
368
                    if ($imgTagDimensions[1]) {
369
                        $attribArray['height'] = $imgTagDimensions[1];
370
                    }
371
                    $originalImageFile = null;
372
                    if ($attribArray['data-htmlarea-file-uid']) {
373
                        // An original image file uid is available
374
                        try {
375
                            /** @var $originalImageFile Resource\File */
376
                            $originalImageFile = $resourceFactory->getFileObject((int)$attribArray['data-htmlarea-file-uid']);
377
                        } catch (Resource\Exception\FileDoesNotExistException $fileDoesNotExistException) {
378
                            // Log the fact the file could not be retrieved.
379
                            $message = sprintf('Could not find file with uid "%s"', $attribArray['data-htmlarea-file-uid']);
380
                            $this->logger->error($message);
381
                        }
382
                    }
383
                    if ($originalImageFile instanceof Resource\File) {
384
                        // Public url of local file is relative to the site url, absolute otherwise
385
                        if ($absoluteUrl == $originalImageFile->getPublicUrl() || $absoluteUrl == $siteUrl . $originalImageFile->getPublicUrl()) {
386
                            // This is a plain image, i.e. reference to the original image
387
                            if ($this->procOptions['plainImageMode']) {
388
                                // "plain image mode" is configured
389
                                // Find the dimensions of the original image
390
                                $imageInfo = [
391
                                    $originalImageFile->getProperty('width'),
392
                                    $originalImageFile->getProperty('height')
393
                                ];
394
                                if (!$imageInfo[0] || !$imageInfo[1]) {
395
                                    $filePath = $originalImageFile->getForLocalProcessing(false);
396
                                    $imageInfo = @getimagesize($filePath);
397
                                }
398
                                $attribArray = $this->applyPlainImageModeSettings($imageInfo, $attribArray);
399
                            }
400
                        } else {
401
                            // Magic image case: get a processed file with the requested configuration
402
                            $imageConfiguration = [
403
                                'width' => $imgTagDimensions[0],
404
                                'height' => $imgTagDimensions[1]
405
                            ];
406
                            $magicImage = $magicImageService->createMagicImage($originalImageFile, $imageConfiguration);
407
                            $attribArray['width'] = $magicImage->getProperty('width');
408
                            $attribArray['height'] = $magicImage->getProperty('height');
409
                            $attribArray['src'] = $magicImage->getPublicUrl();
410
                        }
411
                    } elseif (!GeneralUtility::isFirstPartOfStr($absoluteUrl, $siteUrl) && !$this->procOptions['dontFetchExtPictures'] && TYPO3_MODE === 'BE') {
412
                        // External image from another URL: in that case, fetch image, unless the feature is disabled or we are not in backend mode
413
                        // Fetch the external image
414
                        $externalFile = GeneralUtility::getUrl($absoluteUrl);
415
                        if ($externalFile) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $externalFile of type false|string is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
416
                            $pU = parse_url($absoluteUrl);
417
                            $pI = pathinfo($pU['path']);
418
                            $extension = strtolower($pI['extension']);
419
                            if ($extension === 'jpg' || $extension === 'jpeg' || $extension === 'gif' || $extension === 'png') {
420
                                $fileName = GeneralUtility::shortMD5($absoluteUrl) . '.' . $pI['extension'];
421
                                // We insert this image into the user default upload folder
422
                                list($table, $field) = explode(':', $this->elRef);
423
                                /** @var Resource\Folder $folder */
424
                                $folder = $GLOBALS['BE_USER']->getDefaultUploadFolder($this->recPid, $table, $field);
425
                                /** @var Resource\File $fileObject */
426
                                $fileObject = $folder->createFile($fileName)->setContents($externalFile);
427
                                $imageConfiguration = [
428
                                    'width' => $attribArray['width'],
429
                                    'height' => $attribArray['height']
430
                                ];
431
                                $magicImage = $magicImageService->createMagicImage($fileObject, $imageConfiguration);
432
                                $attribArray['width'] = $magicImage->getProperty('width');
433
                                $attribArray['height'] = $magicImage->getProperty('height');
434
                                $attribArray['data-htmlarea-file-uid'] = $fileObject->getUid();
435
                                $attribArray['src'] = $magicImage->getPublicUrl();
436
                            }
437
                        }
438
                    } elseif (GeneralUtility::isFirstPartOfStr($absoluteUrl, $siteUrl)) {
439
                        // Finally, check image as local file (siteURL equals the one of the image)
440
                        // Image has no data-htmlarea-file-uid attribute
441
                        // Relative path, rawurldecoded for special characters.
442
                        $path = rawurldecode(substr($absoluteUrl, strlen($siteUrl)));
0 ignored issues
show
Bug introduced by
It seems like substr($absoluteUrl, strlen($siteUrl)) can also be of type false; however, parameter $str of rawurldecode() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

442
                        $path = rawurldecode(/** @scrutinizer ignore-type */ substr($absoluteUrl, strlen($siteUrl)));
Loading history...
443
                        // Absolute filepath, locked to relative path of this project
444
                        $filepath = GeneralUtility::getFileAbsFileName($path);
445
                        // Check file existence (in relative directory to this installation!)
446
                        if ($filepath && @is_file($filepath)) {
447
                            // Treat it as a plain image
448
                            if ($this->procOptions['plainImageMode']) {
449
                                // If "plain image mode" has been configured
450
                                // Find the original dimensions of the image
451
                                $imageInfo = @getimagesize($filepath);
452
                                $attribArray = $this->applyPlainImageModeSettings($imageInfo, $attribArray);
453
                            }
454
                            // Let's try to find a file uid for this image
455
                            try {
456
                                $fileOrFolderObject = $resourceFactory->retrieveFileOrFolderObject($path);
457
                                if ($fileOrFolderObject instanceof Resource\FileInterface) {
458
                                    $fileIdentifier = $fileOrFolderObject->getIdentifier();
459
                                    /** @var Resource\AbstractFile $fileObject */
460
                                    $fileObject = $fileOrFolderObject->getStorage()->getFile($fileIdentifier);
461
                                    // @todo if the retrieved file is a processed file, get the original file...
462
                                    $attribArray['data-htmlarea-file-uid'] = $fileObject->getUid();
463
                                }
464
                            } catch (Resource\Exception\ResourceDoesNotExistException $resourceDoesNotExistException) {
465
                                // Nothing to be done if file/folder not found
466
                            }
467
                        }
468
                    }
469
                    // Remove width and height from style attribute
470
                    $attribArray['style'] = preg_replace('/(?:^|[^-])(\\s*(?:width|height)\\s*:[^;]*(?:$|;))/si', '', $attribArray['style']);
471
                    // Must have alt attribute
472
                    if (!isset($attribArray['alt'])) {
473
                        $attribArray['alt'] = '';
474
                    }
475
                    // Convert absolute to relative url
476
                    if (GeneralUtility::isFirstPartOfStr($attribArray['src'], $siteUrl)) {
477
                        $attribArray['src'] = substr($attribArray['src'], strlen($siteUrl));
478
                    }
479
                    $imgSplit[$k] = '<img ' . GeneralUtility::implodeAttributes($attribArray, true, true) . ' />';
480
                }
481
            }
482
        }
483
        return implode('', $imgSplit);
484
    }
485
486
    /**
487
     * Transformation handler: 'ts_images' / direction: "rte"
488
     * Processing images from database content going into the RTE.
489
     * Processing includes converting the src attribute to an absolute URL.
490
     *
491
     * @param string $value Content input
492
     * @return string Content output
493
     */
494
    public function TS_images_rte($value)
495
    {
496
        // Split content by <img> tags and traverse the resulting array for processing:
497
        $imgSplit = $this->splitTags('img', $value);
498
        if (count($imgSplit) > 1) {
499
            $siteUrl = GeneralUtility::getIndpEnv('TYPO3_SITE_URL');
500
            $sitePath = str_replace(GeneralUtility::getIndpEnv('TYPO3_REQUEST_HOST'), '', $siteUrl);
501
            foreach ($imgSplit as $k => $v) {
502
                // Image found
503
                if ($k % 2) {
504
                    // Get the attributes of the img tag
505
                    list($attribArray) = $this->get_tag_attributes($v, true);
506
                    $absoluteUrl = trim($attribArray['src']);
507
                    // Transform the src attribute into an absolute url, if it not already
508
                    if (strtolower(substr($absoluteUrl, 0, 4)) !== 'http') {
0 ignored issues
show
Bug introduced by
It seems like substr($absoluteUrl, 0, 4) can also be of type false; however, parameter $str of strtolower() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

508
                    if (strtolower(/** @scrutinizer ignore-type */ substr($absoluteUrl, 0, 4)) !== 'http') {
Loading history...
509
                        // If site is in a subpath (eg. /~user_jim/) this path needs to be removed because it will be added with $siteUrl
510
                        $attribArray['src'] = preg_replace('#^' . preg_quote($sitePath, '#') . '#', '', $attribArray['src']);
511
                        $attribArray['src'] = $siteUrl . $attribArray['src'];
512
                    }
513
                    // Must have alt attribute
514
                    if (!isset($attribArray['alt'])) {
515
                        $attribArray['alt'] = '';
516
                    }
517
                    $imgSplit[$k] = '<img ' . GeneralUtility::implodeAttributes($attribArray, true, true) . ' />';
518
                }
519
            }
520
        }
521
        // Return processed content:
522
        return implode('', $imgSplit);
523
    }
524
525
    /**
526
     * Transformation handler: 'ts_links' / direction: "db"
527
     * Processing anchor tags, and resolves them correctly again via the LinkService syntax
528
     *
529
     * Splits content into <a> tag blocks and processes each tag, and allows hooks to actually render
530
     * the result.
531
     *
532
     * @param string $value Content input
533
     * @return string Content output
534
     * @see TS_links_rte()
535
     */
536
    public function TS_links_db($value)
537
    {
538
        $blockSplit = $this->splitIntoBlock('A', $value);
539
        foreach ($blockSplit as $k => $v) {
540
            if ($k % 2) {
541
                list($tagAttributes) = $this->get_tag_attributes($this->getFirstTag($v), true);
542
                $linkService = GeneralUtility::makeInstance(LinkService::class);
543
                $linkInformation = $linkService->resolve($tagAttributes['href'] ?? '');
544
545
                $tagAttributes['href'] = $linkService->asString($linkInformation);
546
                $blockSplit[$k] = '<a ' . GeneralUtility::implodeAttributes($tagAttributes, true) . '>'
547
                    . $this->TS_links_db($this->removeFirstAndLastTag($blockSplit[$k])) . '</a>';
548
                $parameters = [
549
                    'currentBlock' => $v,
550
                    'linkInformation' => $linkInformation,
551
                    'url' => $linkInformation['href'],
552
                    'attributes' => $tagAttributes
553
                ];
554
                foreach ($GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['modifyParams_LinksDb_PostProc'] ?? [] as $className) {
555
                    $processor = GeneralUtility::makeInstance($className);
556
                    $blockSplit[$k] = $processor->modifyParamsLinksDb($parameters, $this);
557
                }
558
559
                // Otherwise store the link as <a> tag as default by TYPO3, with the new link service syntax
560
                $tagAttributes['href'] = $linkService->asString($linkInformation);
561
                $blockSplit[$k] = '<a ' . GeneralUtility::implodeAttributes($tagAttributes, true) . '>'
562
                    . $this->TS_links_db($this->removeFirstAndLastTag($blockSplit[$k])) . '</a>';
563
            }
564
        }
565
        return implode('', $blockSplit);
566
    }
567
568
    /**
569
     * Transformation handler: 'ts_links' / direction: "rte"
570
     * Converting TYPO3-specific <link> tags to <a> tags
571
     *
572
     * This functionality is only used to convert legacy <link> tags to the new linking syntax using <a> tags, and will
573
     * not be converted back to <link> tags anymore.
574
     *
575
     * @param string $value Content input
576
     * @return string Content output
577
     */
578
    public function TS_links_rte($value)
579
    {
580
        $value = $this->TS_AtagToAbs($value);
581
        // Split content by the TYPO3 pseudo tag "<link>"
582
        $blockSplit = $this->splitIntoBlock('link', $value, true);
583
        foreach ($blockSplit as $k => $v) {
584
            // Block
585
            if ($k % 2) {
586
                // Split away the first "<link " part
587
                $typoLinkData = explode(' ', substr($this->getFirstTag($v), 0, -1), 2)[1];
0 ignored issues
show
Bug introduced by
It seems like substr($this->getFirstTag($v), 0, -1) can also be of type false; however, parameter $string of explode() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

587
                $typoLinkData = explode(' ', /** @scrutinizer ignore-type */ substr($this->getFirstTag($v), 0, -1), 2)[1];
Loading history...
588
                $tagCode = GeneralUtility::makeInstance(TypoLinkCodecService::class)->decode($typoLinkData);
589
590
                // Parsing the TypoLink data. This parsing is done like in \TYPO3\CMS\Frontend\ContentObject->typoLink()
591
                $linkService = GeneralUtility::makeInstance(LinkService::class);
592
                $linkInformation = $linkService->resolve($tagCode['url']);
593
594
                try {
595
                    $href = $linkService->asString($linkInformation);
596
                } catch (UnknownLinkHandlerException $e) {
597
                    $href = '';
598
                }
599
600
                // Modify parameters by a hook
601
                if (is_array($GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['modifyParams_LinksRte_PostProc'] ?? false)) {
602
                    // backwards-compatibility: show an error message if the page is not found
603
                    $error = '';
604
                    if ($linkInformation['type'] === LinkService::TYPE_PAGE) {
605
                        $pageRecord = BackendUtility::getRecord('pages', $linkInformation['pageuid']);
606
                        // Page does not exist
607
                        if (!is_array($pageRecord)) {
608
                            $error = 'Page with ID ' . $linkInformation['pageuid'] . ' not found';
609
                        }
610
                    }
611
                    $parameters = [
612
                        'currentBlock' => $v,
613
                        'url' => $href,
614
                        'tagCode' => $tagCode,
615
                        'external' => $linkInformation['type'] === LinkService::TYPE_URL,
616
                        'error' => $error
617
                    ];
618
                    foreach ($GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['t3lib/class.t3lib_parsehtml_proc.php']['modifyParams_LinksRte_PostProc'] as $className) {
619
                        $processor = GeneralUtility::makeInstance($className);
620
                        $blockSplit[$k] = $processor->modifyParamsLinksRte($parameters, $this);
621
                    }
622
                } else {
623
                    $anchorAttributes = [
624
                        'href'   => $href,
625
                        'target' => $tagCode['target'],
626
                        'class'  => $tagCode['class'],
627
                        'title'  => $tagCode['title']
628
                    ];
629
630
                    // Setting the <a> tag
631
                    $blockSplit[$k] = '<a ' . GeneralUtility::implodeAttributes($anchorAttributes, true) . '>'
632
                        . $this->TS_links_rte($this->removeFirstAndLastTag($blockSplit[$k]))
633
                        . '</a>';
634
                }
635
            }
636
        }
637
        return implode('', $blockSplit);
638
    }
639
640
    /**
641
     * Transformation handler: 'css_transform' / direction: "db"
642
     * Cleaning (->db) for standard content elements (ts)
643
     *
644
     * @param string $value Content input
645
     * @return string Content output
646
     * @see TS_transform_rte()
647
     */
648
    public function TS_transform_db($value)
649
    {
650
        // Safety... so forever loops are avoided (they should not occur, but an error would potentially do this...)
651
        $this->TS_transform_db_safecounter--;
652
        if ($this->TS_transform_db_safecounter < 0) {
653
            return $value;
654
        }
655
        // Split the content from RTE by the occurrence of these blocks:
656
        $blockSplit = $this->splitIntoBlock($this->blockElementList, $value);
657
658
        // Avoid superfluous linebreaks by transform_db after ending headListTag
659
        while (count($blockSplit) > 0 && trim(end($blockSplit)) === '') {
660
            array_pop($blockSplit);
661
        }
662
663
        // Traverse the blocks
664
        foreach ($blockSplit as $k => $v) {
665
            if ($k % 2) {
666
                // Inside block:
667
                // Init:
668
                $tag = $this->getFirstTag($v);
669
                $tagName = strtolower($this->getFirstTagName($v));
670
                // Process based on the tag:
671
                switch ($tagName) {
672
                    case 'blockquote':
673
                    case 'dd':
674
                    case 'div':
675
                    case 'header':
676
                    case 'section':
677
                    case 'footer':
678
                    case 'nav':
679
                    case 'article':
680 View Code Duplication
                    case 'aside':
681
                        $blockSplit[$k] = $tag . $this->TS_transform_db($this->removeFirstAndLastTag($blockSplit[$k])) . '</' . $tagName . '>';
682
                        break;
683
                    case 'pre':
684
                        break;
685
                    default:
686
                        // usually <hx> tags and <table> tags where no other block elements are within the tags
687
                        // Eliminate true linebreaks inside block element tags
688
                        $blockSplit[$k] = preg_replace(('/[' . LF . ']+/'), ' ', $blockSplit[$k]);
689
                }
690
            } else {
691
                // NON-block:
692
                if (trim($blockSplit[$k]) !== '') {
693
                    $blockSplit[$k] = str_replace('<hr/>', '<hr />', $blockSplit[$k]);
694
                    // Remove linebreaks preceding hr tags
695
                    $blockSplit[$k] = preg_replace('/[' . LF . ']+<(hr)(\\s[^>\\/]*)?[[:space:]]*\\/?>/', '<$1$2/>', $blockSplit[$k]);
696
                    // Remove linebreaks following hr tags
697
                    $blockSplit[$k] = preg_replace('/<(hr)(\\s[^>\\/]*)?[[:space:]]*\\/?>[' . LF . ']+/', '<$1$2/>', $blockSplit[$k]);
698
                    // Replace other linebreaks with space
699
                    $blockSplit[$k] = preg_replace('/[' . LF . ']+/', ' ', $blockSplit[$k]);
700
                    $blockSplit[$k] = $this->divideIntoLines($blockSplit[$k]);
701
                } else {
702
                    unset($blockSplit[$k]);
703
                }
704
            }
705
        }
706
        $this->TS_transform_db_safecounter++;
707
        return implode(LF, $blockSplit);
708
    }
709
710
    /**
711
     * Wraps a-tags that contain a style attribute with a span-tag
712
     * This is not in use anymore, but was necessary before because <a> tags are transformed into <link> tags
713
     * in the database, but <link> tags cannot handle style attributes. However, this is considered a
714
     * bad approach as it leaves an ugly <span> tag in the database, if allowedTags=span with style attributes are
715
     * allowed.
716
     *
717
     * @param string $value Content input
718
     * @return string Content output
719
     */
720
    public function transformStyledATags($value)
721
    {
722
        $blockSplit = $this->splitIntoBlock('A', $value);
723
        foreach ($blockSplit as $k => $v) {
724
            // If an A-tag was found
725
            if ($k % 2) {
726
                list($attribArray) = $this->get_tag_attributes($this->getFirstTag($v), true);
727
                // If "style" attribute is set and rteerror is not set!
728
                if ($attribArray['style'] && !$attribArray['rteerror']) {
729
                    $attribArray_copy['style'] = $attribArray['style'];
730
                    unset($attribArray['style']);
731
                    $bTag = '<span ' . GeneralUtility::implodeAttributes($attribArray_copy, true) . '><a ' . GeneralUtility::implodeAttributes($attribArray, true) . '>';
732
                    $eTag = '</a></span>';
733
                    $blockSplit[$k] = $bTag . $this->removeFirstAndLastTag($blockSplit[$k]) . $eTag;
734
                }
735
            }
736
        }
737
        return implode('', $blockSplit);
738
    }
739
740
    /**
741
     * Transformation handler: css_transform / direction: "rte"
742
     * Set (->rte) for standard content elements (ts)
743
     *
744
     * @param string $value Content input
745
     * @return string Content output
746
     * @see TS_transform_db()
747
     */
748
    public function TS_transform_rte($value)
749
    {
750
        // Split the content from database by the occurrence of the block elements
751
        $blockSplit = $this->splitIntoBlock($this->blockElementList, $value);
752
        // Traverse the blocks
753
        foreach ($blockSplit as $k => $v) {
754
            if ($k % 2) {
755
                // Inside one of the blocks:
756
                // Init:
757
                $tag = $this->getFirstTag($v);
758
                $tagName = strtolower($this->getFirstTagName($v));
759
                // Based on tagname, we do transformations:
760
                switch ($tagName) {
761
                    case 'blockquote':
762
                    case 'dd':
763
                    case 'div':
764
                    case 'header':
765
                    case 'section':
766
                    case 'footer':
767
                    case 'nav':
768
                    case 'article':
769 View Code Duplication
                    case 'aside':
770
                        $blockSplit[$k] = $tag . $this->TS_transform_rte($this->removeFirstAndLastTag($blockSplit[$k])) . '</' . $tagName . '>';
771
                        break;
772
                }
773
                $blockSplit[$k + 1] = preg_replace('/^[ ]*' . LF . '/', '', $blockSplit[$k + 1]);
774
            } else {
775
                // NON-block:
776
                $nextFTN = $this->getFirstTagName($blockSplit[$k + 1]);
777
                $onlyLineBreaks = (preg_match('/^[ ]*' . LF . '+[ ]*$/', $blockSplit[$k]) == 1);
778
                // If the line is followed by a block or is the last line:
779
                if (GeneralUtility::inList($this->blockElementList, $nextFTN) || !isset($blockSplit[$k + 1])) {
780
                    // If the line contains more than just linebreaks, reduce the number of trailing linebreaks by 1
781
                    if (!$onlyLineBreaks) {
782
                        $blockSplit[$k] = preg_replace('/(' . LF . '*)' . LF . '[ ]*$/', '$1', $blockSplit[$k]);
783
                    } else {
784
                        // If the line contains only linebreaks, remove the leading linebreak
785
                        $blockSplit[$k] = preg_replace('/^[ ]*' . LF . '/', '', $blockSplit[$k]);
786
                    }
787
                }
788
                // If $blockSplit[$k] is blank then unset the line, unless the line only contained linebreaks
789
                if ((string)$blockSplit[$k] === '' && !$onlyLineBreaks) {
790
                    unset($blockSplit[$k]);
791
                } else {
792
                    $blockSplit[$k] = $this->setDivTags($blockSplit[$k]);
793
                }
794
            }
795
        }
796
        return implode(LF, $blockSplit);
797
    }
798
799
    /***************************************************************
800
     *
801
     * Generic RTE transformation, analysis and helper functions
802
     *
803
     **************************************************************/
804
805
    /**
806
     * Function for cleaning content going into the database.
807
     * Content is cleaned eg. by removing unallowed HTML and ds-HSC content
808
     * It is basically calling HTMLcleaner from the parent class with some preset configuration specifically set up for cleaning content going from the RTE into the db
809
     *
810
     * @param string $content Content to clean up
811
     * @return string Clean content
812
     * @see getKeepTags()
813
     */
814
    public function HTMLcleaner_db($content)
815
    {
816
        $keepTags = $this->getKeepTags('db');
817
        // Default: remove unknown tags.
818
        $keepUnknownTags = (bool)$this->procOptions['dontRemoveUnknownTags_db'];
819
        return $this->HTMLcleaner($content, $keepTags, $keepUnknownTags);
820
    }
821
822
    /**
823
     * Creates an array of configuration for the HTMLcleaner function based on whether content
824
     * go TO or FROM the Rich Text Editor ($direction)
825
     *
826
     * @param string $direction The direction of the content being processed by the output configuration; "db" (content going into the database FROM the rte) or "rte" (content going into the form)
827
     * @return array Configuration array
828
     * @see HTMLcleaner_db()
829
     */
830
    public function getKeepTags($direction = 'rte')
831
    {
832
        if (!is_array($this->getKeepTags_cache[$direction])) {
833
            // Setting up allowed tags:
834
            // Default is to get allowed/denied tags from internal array of processing options:
835
            // Construct default list of tags to keep:
836
            if (is_array($this->procOptions['allowTags.'])) {
837
                $keepTags = implode(',', $this->procOptions['allowTags.']);
838
            } else {
839
                $keepTags = $this->procOptions['allowTags'];
840
            }
841
            $keepTags = array_flip(GeneralUtility::trimExplode(',', $this->defaultAllowedTagsList . ',' . strtolower($keepTags), true));
842
            // For tags to deny, remove them from $keepTags array:
843
            $denyTags = GeneralUtility::trimExplode(',', $this->procOptions['denyTags'], true);
844
            foreach ($denyTags as $dKe) {
845
                unset($keepTags[$dKe]);
846
            }
847
            // Based on the direction of content, set further options:
848
            switch ($direction) {
849
                case 'rte':
850
                    // Transforming keepTags array so it can be understood by the HTMLcleaner function.
851
                    // This basically converts the format of the array from TypoScript (having dots) to plain multi-dimensional array.
852
                    list($keepTags) = $this->HTMLparserConfig($this->procOptions['HTMLparser_rte.'], $keepTags);
853
                    break;
854
                case 'db':
855
                    // Setting up span tags if they are allowed:
856
                    if (isset($keepTags['span'])) {
857
                        $keepTags['span'] = [
858
                            'allowedAttribs' => 'id,class,style,title,lang,xml:lang,dir,itemscope,itemtype,itemprop',
859
                            'fixAttrib' => [
860
                                'class' => [
861
                                    'removeIfFalse' => 1
862
                                ]
863
                            ],
864
                            'rmTagIfNoAttrib' => 1
865
                        ];
866
                        if (!empty($this->allowedClasses)) {
867
                            $keepTags['span']['fixAttrib']['class']['list'] = $this->allowedClasses;
868
                        }
869
                    }
870
                    // Setting further options, getting them from the processing options
871
                    $TSc = $this->procOptions['HTMLparser_db.'];
872
                    if (!$TSc['globalNesting']) {
873
                        $TSc['globalNesting'] = 'b,i,u,a,center,font,sub,sup,strong,em,strike,span';
874
                    }
875
                    if (!$TSc['noAttrib']) {
876
                        $TSc['noAttrib'] = 'b,i,u,br,center,hr,sub,sup,strong,em,li,ul,ol,blockquote,strike';
877
                    }
878
                    // Transforming the array from TypoScript to regular array:
879
                    list($keepTags) = $this->HTMLparserConfig($TSc, $keepTags);
880
                    break;
881
            }
882
            // Caching (internally, in object memory) the result
883
            $this->getKeepTags_cache[$direction] = $keepTags;
884
        }
885
        // Return result:
886
        return $this->getKeepTags_cache[$direction];
887
    }
888
889
    /**
890
     * This resolves the $value into parts based on <p>-sections. These are returned as lines separated by LF.
891
     * This point is to resolve the HTML-code returned from RTE into ordinary lines so it's 'human-readable'
892
     * The function ->setDivTags does the opposite.
893
     * This function processes content to go into the database.
894
     *
895
     * @param string $value Value to process.
896
     * @param int $count Recursion brake. Decremented on each recursion down to zero. Default is 5 (which equals the allowed nesting levels of p tags).
897
     * @param bool $returnArray If TRUE, an array with the lines is returned, otherwise a string of the processed input value.
898
     * @return string Processed input value.
899
     * @see setDivTags()
900
     */
901
    public function divideIntoLines($value, $count = 5, $returnArray = false)
902
    {
903
        // Setting the third param will eliminate false end-tags. Maybe this is a good thing to do...?
904
        $paragraphBlocks = $this->splitIntoBlock('p', $value, true);
905
        // Returns plainly the content if there was no p sections in it
906
        if (count($paragraphBlocks) <= 1 || $count <= 0) {
907
            return $this->sanitizeLineBreaksForContentOnly($value);
908
        }
909
910
        // Traverse the splitted sections
911
        foreach ($paragraphBlocks as $k => $v) {
912
            if ($k % 2) {
913
                // Inside a <p> section
914
                $v = $this->removeFirstAndLastTag($v);
915
                // Fetching 'sub-lines' - which will explode any further p nesting recursively
916
                $subLines = $this->divideIntoLines($v, $count - 1, true);
917
                // So, if there happened to be sub-nesting of p, this is written directly as the new content of THIS section. (This would be considered 'an error')
918
                if (is_array($subLines)) {
919
                    $paragraphBlocks[$k] = implode(LF, $subLines);
920
                } else {
921
                    //... but if NO subsection was found, we process it as a TRUE line without erroneous content:
922
                    $paragraphBlocks[$k] = $this->processContentWithinParagraph($subLines, $paragraphBlocks[$k]);
923
                }
924
                // If it turns out the line is just blank (containing a &nbsp; possibly) then just make it pure blank.
925
                // But, prevent filtering of lines that are blank in sense above, but whose tags contain attributes.
926
                // Those attributes should have been filtered before; if they are still there they must be considered as possible content.
927
                if (trim(strip_tags($paragraphBlocks[$k])) === '&nbsp;' && !preg_match('/\\<(img)(\\s[^>]*)?\\/?>/si', $paragraphBlocks[$k]) && !preg_match('/\\<([^>]*)?( align| class| style| id| title| dir| lang| xml:lang)([^>]*)?>/si', trim($paragraphBlocks[$k]))) {
928
                    $paragraphBlocks[$k] = '';
929
                }
930
            } else {
931
                // Outside a paragraph, if there is still something in there, just add a <p> tag
932
                // Remove positions which are outside <p> tags and without content
933
                $paragraphBlocks[$k] = trim(strip_tags($paragraphBlocks[$k], '<' . implode('><', $this->allowedTagsOutsideOfParagraphs) . '>'));
934
                $paragraphBlocks[$k] = $this->sanitizeLineBreaksForContentOnly($paragraphBlocks[$k]);
935
                if ((string)$paragraphBlocks[$k] === '') {
936
                    unset($paragraphBlocks[$k]);
937
                } else {
938
                    // add <p> tags around the content
939
                    $paragraphBlocks[$k] = str_replace(strip_tags($paragraphBlocks[$k]), '<p>' . strip_tags($paragraphBlocks[$k]) . '</p>', $paragraphBlocks[$k]);
940
                }
941
            }
942
        }
943
        return $returnArray ? $paragraphBlocks : implode(LF, $paragraphBlocks);
944
    }
945
946
    /**
947
     * Converts all lines into <p></p>-sections (unless the line has a p - tag already)
948
     * For processing of content going FROM database TO RTE.
949
     *
950
     * @param string $value Value to convert
951
     * @return string Processed value.
952
     * @see divideIntoLines()
953
     */
954
    public function setDivTags($value)
955
    {
956
        // First, setting configuration for the HTMLcleaner function. This will process each line between the <div>/<p> section on their way to the RTE
957
        $keepTags = $this->getKeepTags('rte');
958
        // Divide the content into lines
959
        $parts = explode(LF, $value);
960
        foreach ($parts as $k => $v) {
961
            // Processing of line content:
962
            // If the line is blank, set it to &nbsp;
963
            if (trim($parts[$k]) === '') {
964
                $parts[$k] = '&nbsp;';
965
            } else {
966
                // Clean the line content, keeping unknown tags (as they can be removed in the entryHTMLparser)
967
                $parts[$k] = $this->HTMLcleaner($parts[$k], $keepTags, 'protect');
968
                // convert double-encoded &nbsp; into regular &nbsp; however this could also be reversed via the exitHTMLparser
969
                // This was previously an option to disable called "dontConvAmpInNBSP_rte"
970
                $parts[$k] = str_replace('&amp;nbsp;', '&nbsp;', $parts[$k]);
971
            }
972
            // Wrapping the line in <p> tags if not already wrapped and does not contain an hr tag
973
            if (!preg_match('/<(hr)(\\s[^>\\/]*)?[[:space:]]*\\/?>/i', $parts[$k])) {
974
                $testStr = strtolower(trim($parts[$k]));
975
                if (substr($testStr, 0, 4) !== '<div' || substr($testStr, -6) !== '</div>') {
976
                    if (substr($testStr, 0, 2) !== '<p' || substr($testStr, -4) !== '</p>') {
977
                        // Only set p-tags if there is not already div or p tags:
978
                        $parts[$k] = '<p>' . $parts[$k] . '</p>';
979
                    }
980
                }
981
            }
982
        }
983
        // Implode result:
984
        return implode(LF, $parts);
985
    }
986
987
    /**
988
     * Used for transformation from RTE to DB
989
     *
990
     * Works on a single line within a <p> tag when storing into the database
991
     * This always adds <p> tags and validates the arguments,
992
     * additionally the content is cleaned up via the HTMLcleaner.
993
     *
994
     * @param string $content the content within the <p> tag
995
     * @param string $fullContentWithTag the whole <p> tag surrounded as well
996
     *
997
     * @return string the full <p> tag with cleaned content
998
     */
999
    protected function processContentWithinParagraph(string $content, string $fullContentWithTag)
1000
    {
1001
        // clean up the content
1002
        $content = $this->HTMLcleaner_db($content);
1003
        // Get the <p> tag, and validate the attributes
1004
        $fTag = $this->getFirstTag($fullContentWithTag);
1005
        // Check which attributes of the <p> tag to keep attributes
1006
        if (!empty($this->allowedAttributesForParagraphTags)) {
1007
            list($tagAttributes) = $this->get_tag_attributes($fTag);
1008
            // Make sure the tag attributes only contain the ones that are defined to be allowed
1009
            $tagAttributes = array_intersect_key($tagAttributes, array_flip($this->allowedAttributesForParagraphTags));
1010
1011
            // Only allow classes that are whitelisted in $this->allowedClasses
1012
            if (trim($tagAttributes['class']) !== '' && !empty($this->allowedClasses) && !in_array($tagAttributes['class'], $this->allowedClasses, true)) {
1013
                $classes = GeneralUtility::trimExplode(' ', $tagAttributes['class'], true);
1014
                $classes = array_intersect($classes, $this->allowedClasses);
1015
                if (!empty($classes)) {
1016
                    $tagAttributes['class'] = implode(' ', $classes);
1017
                } else {
1018
                    unset($tagAttributes['class']);
1019
                }
1020
            }
1021
        } else {
1022
            $tagAttributes = [];
1023
        }
1024
        // Remove any line break
1025
        $content = str_replace(LF, '', $content);
1026
        // Compile the surrounding <p> tag
1027
        $content = '<' . rtrim('p ' . $this->compileTagAttribs($tagAttributes)) . '>' . $content . '</p>';
1028
        return $content;
1029
    }
1030
1031
    /**
1032
     * Wrap <hr> tags with LFs, and also remove double LFs, used when transforming from RTE to DB
1033
     *
1034
     * @param string $content
1035
     * @return string the modified content
1036
     */
1037
    protected function sanitizeLineBreaksForContentOnly(string $content)
1038
    {
1039
        $content = preg_replace('/<(hr)(\\s[^>\\/]*)?[[:space:]]*\\/?>/i', LF . '<$1$2/>' . LF, $content);
1040
        $content = str_replace(LF . LF, LF, $content);
1041
        $content = preg_replace('/(^' . LF . ')|(' . LF . '$)/i', '', $content);
1042
        return $content;
1043
    }
1044
1045
    /**
1046
     * Finds width and height from attrib-array
1047
     * If the width and height is found in the style-attribute, use that!
1048
     *
1049
     * @param array $attribArray Array of attributes from tag in which to search. More specifically the content of the key "style" is used to extract "width:xxx / height:xxx" information
1050
     * @return array Integer w/h in key 0/1. Zero is returned if not found.
1051
     */
1052
    public function getWHFromAttribs($attribArray)
1053
    {
1054
        $style = trim($attribArray['style']);
1055
        $w = 0;
1056
        $h = 0;
1057
        if ($style) {
1058
            $regex = '[[:space:]]*:[[:space:]]*([0-9]*)[[:space:]]*px';
1059
            // Width
1060
            $reg = [];
1061
            preg_match('/width' . $regex . '/i', $style, $reg);
1062
            $w = (int)$reg[1];
1063
            // Height
1064
            preg_match('/height' . $regex . '/i', $style, $reg);
1065
            $h = (int)$reg[1];
1066
        }
1067
        if (!$w) {
1068
            $w = $attribArray['width'];
1069
        }
1070
        if (!$h) {
1071
            $h = $attribArray['height'];
1072
        }
1073
        return [(int)$w, (int)$h];
1074
    }
1075
1076
    /**
1077
     * Parse <A>-tag href and return status of email,external,file or page
1078
     * This functionality is not in use anymore
1079
     *
1080
     * @param string $url URL to analyse.
1081
     * @return array Information in an array about the URL
1082
     */
1083
    public function urlInfoForLinkTags($url)
1084
    {
1085
        $info = [];
1086
        $url = trim($url);
1087
        if (substr(strtolower($url), 0, 7) === 'mailto:') {
1088
            $info['url'] = trim(substr($url, 7));
0 ignored issues
show
Bug introduced by
It seems like substr($url, 7) can also be of type false; however, parameter $str of trim() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1088
            $info['url'] = trim(/** @scrutinizer ignore-type */ substr($url, 7));
Loading history...
1089
            $info['type'] = 'email';
1090
        } elseif (strpos($url, '?file:') !== false) {
1091
            $info['type'] = 'file';
1092
            $info['url'] = rawurldecode(substr($url, strpos($url, '?file:') + 1));
0 ignored issues
show
Bug introduced by
It seems like substr($url, strpos($url, '?file:') + 1) can also be of type false; however, parameter $str of rawurldecode() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1092
            $info['url'] = rawurldecode(/** @scrutinizer ignore-type */ substr($url, strpos($url, '?file:') + 1));
Loading history...
1093
        } else {
1094
            $curURL = GeneralUtility::getIndpEnv('TYPO3_SITE_URL');
1095
            $urlLength = strlen($url);
1096
            $a = 0;
1097
            for (; $a < $urlLength; $a++) {
1098
                if ($url[$a] != $curURL[$a]) {
1099
                    break;
1100
                }
1101
            }
1102
            $info['relScriptPath'] = substr($curURL, $a);
1103
            $info['relUrl'] = substr($url, $a);
1104
            $info['url'] = $url;
1105
            $info['type'] = 'ext';
1106
            $siteUrl_parts = parse_url($url);
1107
            $curUrl_parts = parse_url($curURL);
1108
            // Hosts should match
1109
            if ($siteUrl_parts['host'] == $curUrl_parts['host'] && (!$info['relScriptPath'] || defined('TYPO3_mainDir') && substr($info['relScriptPath'], 0, strlen(TYPO3_mainDir)) == TYPO3_mainDir)) {
1110
                // If the script path seems to match or is empty (FE-EDIT)
1111
                // New processing order 100502
1112
                $uP = parse_url($info['relUrl']);
0 ignored issues
show
Bug introduced by
It seems like $info['relUrl'] can also be of type false; however, parameter $url of parse_url() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

1112
                $uP = parse_url(/** @scrutinizer ignore-type */ $info['relUrl']);
Loading history...
1113
                if ($info['relUrl'] === '#' . $siteUrl_parts['fragment']) {
1114
                    $info['url'] = $info['relUrl'];
1115
                    $info['type'] = 'anchor';
1116
                } elseif (!trim($uP['path']) || $uP['path'] === 'index.php') {
1117
                    // URL is a page (id parameter)
1118
                    $pp = preg_split('/^id=/', $uP['query']);
1119
                    $pp[1] = preg_replace('/&id=[^&]*/', '', $pp[1]);
1120
                    $parameters = explode('&', $pp[1]);
1121
                    $id = array_shift($parameters);
1122
                    if ($id) {
1123
                        $info['pageid'] = $id;
1124
                        $info['cElement'] = $uP['fragment'];
1125
                        $info['url'] = $id . ($info['cElement'] ? '#' . $info['cElement'] : '');
1126
                        $info['type'] = 'page';
1127
                        $info['query'] = $parameters[0] ? '&' . implode('&', $parameters) : '';
1128
                    }
1129
                } else {
1130
                    $info['url'] = $info['relUrl'];
1131
                    $info['type'] = 'file';
1132
                }
1133
            } else {
1134
                unset($info['relScriptPath']);
1135
                unset($info['relUrl']);
1136
            }
1137
        }
1138
        return $info;
1139
    }
1140
1141
    /**
1142
     * Converting <A>-tags to absolute URLs (+ setting rtekeep attribute)
1143
     *
1144
     * @param string $value Content input
1145
     * @param bool $dontSetRTEKEEP If TRUE, then the "rtekeep" attribute will not be set. (not in use anymore)
1146
     * @return string Content output
1147
     */
1148
    public function TS_AtagToAbs($value, $dontSetRTEKEEP = false)
0 ignored issues
show
Unused Code introduced by
The parameter $dontSetRTEKEEP is not used and could be removed. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-unused  annotation

1148
    public function TS_AtagToAbs($value, /** @scrutinizer ignore-unused */ $dontSetRTEKEEP = false)

This check looks for parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
1149
    {
1150
        $blockSplit = $this->splitIntoBlock('A', $value);
1151
        foreach ($blockSplit as $k => $v) {
1152
            // Block
1153
            if ($k % 2) {
1154
                list($attribArray) = $this->get_tag_attributes($this->getFirstTag($v), true);
1155
                // Checking if there is a scheme, and if not, prepend the current url.
1156
                // ONLY do this if href has content - the <a> tag COULD be an anchor and if so, it should be preserved...
1157
                if ($attribArray['href'] !== '') {
1158
                    $uP = parse_url(strtolower($attribArray['href']));
1159
                    if (!$uP['scheme']) {
1160
                        $attribArray['href'] = GeneralUtility::getIndpEnv('TYPO3_SITE_URL') . $attribArray['href'];
1161
                    }
1162
                }
1163
                $bTag = '<a ' . GeneralUtility::implodeAttributes($attribArray, true) . '>';
1164
                $eTag = '</a>';
1165
                $blockSplit[$k] = $bTag . $this->TS_AtagToAbs($this->removeFirstAndLastTag($blockSplit[$k])) . $eTag;
1166
            }
1167
        }
1168
        return implode('', $blockSplit);
1169
    }
1170
1171
    /**
1172
     * Apply plain image settings to the dimensions of the image
1173
     *
1174
     * @param array $imageInfo: info array of the image
1175
     * @param array $attribArray: array of attributes of an image tag
1176
     *
1177
     * @return array a modified attributes array
1178
     */
1179
    protected function applyPlainImageModeSettings($imageInfo, $attribArray)
1180
    {
1181
        if ($this->procOptions['plainImageMode']) {
1182
            // Perform corrections to aspect ratio based on configuration
1183
            switch ((string)$this->procOptions['plainImageMode']) {
1184
                case 'lockDimensions':
1185
                    $attribArray['width'] = $imageInfo[0];
1186
                    $attribArray['height'] = $imageInfo[1];
1187
                    break;
1188
                case 'lockRatioWhenSmaller':
1189
                    if ($attribArray['width'] > $imageInfo[0]) {
1190
                        $attribArray['width'] = $imageInfo[0];
1191
                    }
1192 View Code Duplication
                    if ($imageInfo[0] > 0) {
1193
                        $attribArray['height'] = round($attribArray['width'] * ($imageInfo[1] / $imageInfo[0]));
1194
                    }
1195
                    break;
1196
                case 'lockRatio':
1197 View Code Duplication
                    if ($imageInfo[0] > 0) {
1198
                        $attribArray['height'] = round($attribArray['width'] * ($imageInfo[1] / $imageInfo[0]));
1199
                    }
1200
                    break;
1201
            }
1202
        }
1203
        return $attribArray;
1204
    }
1205
1206
    /**
1207
     * Called before any processing / transformation is made
1208
     * Removing any CRs (char 13) and only deal with LFs (char 10) internally.
1209
     * CR has a very disturbing effect, so just remove all CR and rely on LF
1210
     *
1211
     * Historical note: Previously it was possible to disable this functionality via disableUnifyLineBreaks.
1212
     *
1213
     * @param string $content the content to process
1214
     * @return string the modified content
1215
     */
1216
    protected function streamlineLineBreaksForProcessing(string $content)
1217
    {
1218
        return str_replace(CR, '', $content);
1219
    }
1220
1221
    /**
1222
     * Called after any processing / transformation was made
1223
     * just before the content is returned by the RTE parser all line breaks
1224
     * get unified to be "CRLF"s again.
1225
     *
1226
     * Historical note: Previously it was possible to disable this functionality via disableUnifyLineBreaks.
1227
     *
1228
     * @param string $content the content to process
1229
     * @return string the modified content
1230
     */
1231
    protected function streamlineLineBreaksAfterProcessing(string $content)
1232
    {
1233
        // Make sure no \r\n sequences has entered in the meantime
1234
        $content = $this->streamlineLineBreaksForProcessing($content);
1235
        // ... and then change all \n into \r\n
1236
        return str_replace(LF, CRLF, $content);
1237
    }
1238
1239
    /**
1240
     * Content Transformation from DB to RTE
1241
     * Checks all <a> tags which reference a t3://page and checks if the page is available
1242
     * If not, some offensive styling is added.
1243
     *
1244
     * @param string $content
1245
     * @return string the modified content
1246
     */
1247
    protected function markBrokenLinks(string $content): string
1248
    {
1249
        $blocks = $this->splitIntoBlock('A', $content);
1250
        $linkService = GeneralUtility::makeInstance(LinkService::class);
1251
        foreach ($blocks as $position => $value) {
1252
            if ($position % 2 === 0) {
1253
                continue;
1254
            }
1255
            list($attributes) = $this->get_tag_attributes($this->getFirstTag($value), true);
1256
            if (empty($attributes['href'])) {
1257
                continue;
1258
            }
1259
            $hrefInformation = $linkService->resolve($attributes['href']);
1260
            if ($hrefInformation['type'] === LinkService::TYPE_PAGE) {
1261
                $pageRecord = BackendUtility::getRecord('pages', $hrefInformation['pageuid']);
1262
                if (!is_array($pageRecord)) {
1263
                    // Page does not exist
1264
                    $attributes['data-rte-error'] = 'Page with ID ' . $hrefInformation['pageuid'] . ' not found';
1265
                    $styling = 'background-color: yellow; border:2px red solid; color: black;';
1266
                    if (empty($attributes['style'])) {
1267
                        $attributes['style'] = $styling;
1268
                    } else {
1269
                        $attributes['style'] .= ' ' . $styling;
1270
                    }
1271
                }
1272
            }
1273
            // Always rewrite the block to allow the nested calling even if a page is found
1274
            $blocks[$position] =
1275
                '<a ' . GeneralUtility::implodeAttributes($attributes, true, true) . '>'
1276
                . $this->markBrokenLinks($this->removeFirstAndLastTag($blocks[$position]))
1277
                . '</a>';
1278
        }
1279
        return implode('', $blocks);
1280
    }
1281
1282
    /**
1283
     * Content Transformation from RTE to DB
1284
     * Removes link information error attributes from <a> tags that are added to broken links
1285
     *
1286
     * @param string $content the content to process
1287
     * @return string the modified content
1288
     */
1289
    protected function removeBrokenLinkMarkers(string $content): string
1290
    {
1291
        $blocks = $this->splitIntoBlock('A', $content);
1292
        foreach ($blocks as $position => $value) {
1293
            if ($position % 2 === 0) {
1294
                continue;
1295
            }
1296
            list($attributes) = $this->get_tag_attributes($this->getFirstTag($value), true);
1297
            if (empty($attributes['href'])) {
1298
                continue;
1299
            }
1300
            // Always remove the styling again (regardless of the page was found or not)
1301
            // so the database does not contain ugly stuff
1302
            unset($attributes['data-rte-error']);
1303
            if (isset($attributes['style'])) {
1304
                $attributes['style'] = trim(str_replace('background-color: yellow; border:2px red solid; color: black;', '', $attributes['style']));
1305
                if (empty($attributes['style'])) {
1306
                    unset($attributes['style']);
1307
                }
1308
            }
1309
            $blocks[$position] =
1310
                '<a ' . GeneralUtility::implodeAttributes($attributes, true, true) . '>'
1311
                . $this->removeBrokenLinkMarkers($this->removeFirstAndLastTag($blocks[$position]))
1312
                . '</a>';
1313
        }
1314
        return implode('', $blocks);
1315
    }
1316
}
1317