Passed
Push — master ( 9e5bc2...a1c165 )
by Konrad
08:00
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 58
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 58
        $this->document = $document;
81 58
        $this->header = null !== $header ? $header : new Header();
82 58
        $this->content = $content;
83 58
        $this->config = $config;
84 58
    }
85
86 45
    public function init()
87
    {
88 45
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 45
    public function getHeader(): ?Header
96
    {
97 45
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 46
    public function get(string $name)
109
    {
110 46
        return $this->header->get($name);
111
    }
112
113 43
    public function has(string $name): bool
114
    {
115 43
        return $this->header->has($name);
116
    }
117
118 2
    public function getDetails(bool $deep = true): array
119
    {
120 2
        return $this->header->getDetails($deep);
121
    }
122
123 35
    public function getContent(): ?string
124
    {
125 35
        return $this->content;
126
    }
127
128 29
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 29
        $char = $char[0];
131 29
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 29
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 29
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 29
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 29
        foreach ($matches[1] as $part) {
142 20
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 29
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 29
        foreach ($matches[1] as $part) {
148 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 29
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 29
            $content = '';
154 29
            $level = 0;
155 29
            foreach ($parts as $part) {
156 29
                if ('<' == $part) {
157 16
                    ++$level;
158
                }
159
160 29
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 29
                if ('>' == $part) {
163 16
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 29
        preg_match_all(
170 29
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 29
            \PREG_OFFSET_CAPTURE
174
        );
175 29
        foreach ($matches[1] as $part) {
176 5
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 29
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 29
        foreach ($matches[1] as $part) {
181 9
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 29
        return $content;
185
    }
186
187 28
    public function getSectionsText(?string $content): array
188
    {
189 28
        $sections = [];
190 28
        $content = ' '.$content.' ';
191 28
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 28
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 26
            foreach ($matches[2] as $pos => $part) {
196 26
                $text = $part[0];
197 26
                if ('' === $text) {
198
                    continue;
199
                }
200 26
                $offset = $part[1];
201 26
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 26
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Q and q flags if detected around BT/ET.
207
                // @see: https://github.com/smalot/pdfparser/issues/387
208 26
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
209
210 26
                $sections[] = $section;
211
            }
212
        }
213
214
        // Extract 'do' commands.
215 28
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
216 4
            foreach ($matches[1] as $part) {
217 4
                $text = $part[0];
218 4
                $offset = $part[1];
219 4
                $section = substr($content, $offset, \strlen($text));
220
221 4
                $sections[] = $section;
222
            }
223
        }
224
225 28
        return $sections;
226
    }
227
228 17
    private function getDefaultFont(Page $page = null): Font
229
    {
230 17
        $fonts = [];
231 17
        if (null !== $page) {
232 16
            $fonts = $page->getFonts();
233
        }
234
235 17
        $firstFont = $this->document->getFirstFont();
236 17
        if (null !== $firstFont) {
237 15
            $fonts[] = $firstFont;
238
        }
239
240 17
        if (\count($fonts) > 0) {
241 15
            return reset($fonts);
242
        }
243
244 2
        return new Font($this->document, null, null, $this->config);
245
    }
246
247
    /**
248
     * @throws \Exception
249
     */
250 17
    public function getText(?Page $page = null): string
251
    {
252 17
        $result = '';
253 17
        $sections = $this->getSectionsText($this->content);
254 17
        $current_font = $this->getDefaultFont($page);
255 17
        $clipped_font = $current_font;
256
257 17
        $current_position_td = ['x' => false, 'y' => false];
258 17
        $current_position_tm = ['x' => false, 'y' => false];
259
260 17
        self::$recursionStack[] = $this->getUniqueId();
261
262 17
        foreach ($sections as $section) {
263 15
            $commands = $this->getCommandsText($section);
264 15
            $reverse_text = false;
265 15
            $text = '';
266
267 15
            foreach ($commands as $command) {
268 15
                switch ($command[self::OPERATOR]) {
269 15
                    case 'BMC':
270 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
271 1
                            $reverse_text = true;
272
                        }
273 1
                        break;
274
275
                    // set character spacing
276 15
                    case 'Tc':
277 3
                        break;
278
279
                    // move text current point
280 15
                    case 'Td':
281 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
282 12
                        $y = array_pop($args);
283 12
                        $x = array_pop($args);
284 12
                        if (((float) $x <= 0) ||
285 12
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
286
                        ) {
287
                            // vertical offset
288 8
                            $text .= "\n";
289 12
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
290 12
                                $current_position_td['x']
291
                            )
292
                        ) {
293 9
                            $text .= $this->config->getHorizontalOffset();
294
                        }
295 12
                        $current_position_td = ['x' => $x, 'y' => $y];
296 12
                        break;
297
298
                    // move text current point and set leading
299 15
                    case 'TD':
300 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
301 1
                        $y = array_pop($args);
302 1
                        $x = array_pop($args);
303 1
                        if ((float) $y < 0) {
304 1
                            $text .= "\n";
305
                        } elseif ((float) $x <= 0) {
306
                            $text .= ' ';
307
                        }
308 1
                        break;
309
310 15
                    case 'Tf':
311 15
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
312 15
                        $id = trim($id, '/');
313 15
                        if (null !== $page) {
314 15
                            $new_font = $page->getFont($id);
315
                            // If an invalid font ID is given, do not update the font.
316
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
317
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
318
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
319
                            // But we want to make sure that malformed PDFs do not simply crash.
320 15
                            if (null !== $new_font) {
321 14
                                $current_font = $new_font;
322
                            }
323
                        }
324 15
                        break;
325
326 15
                    case 'Q':
327
                        // Use clip: restore font.
328 4
                        $current_font = $clipped_font;
329 4
                        break;
330
331 15
                    case 'q':
332
                        // Use clip: save font.
333 4
                        $clipped_font = $current_font;
334 4
                        break;
335
336 15
                    case "'":
337 15
                    case 'Tj':
338 10
                        $command[self::COMMAND] = [$command];
339
                        // no break
340 14
                    case 'TJ':
341 15
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
342 15
                        $text .= $sub_text;
343 15
                        break;
344
345
                    // set leading
346 12
                    case 'TL':
347 1
                        $text .= ' ';
348 1
                        break;
349
350 12
                    case 'Tm':
351 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
352 12
                        $y = array_pop($args);
353 12
                        $x = array_pop($args);
354 12
                        if (false !== $current_position_tm['x']) {
355 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
356 12
                            if ($delta > 10) {
357 10
                                $text .= "\t";
358
                            }
359
                        }
360 12
                        if (false !== $current_position_tm['y']) {
361 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
362 12
                            if ($delta > 10) {
363 8
                                $text .= "\n";
364
                            }
365
                        }
366 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
367 12
                        break;
368
369
                    // set super/subscripting text rise
370 9
                    case 'Ts':
371
                        break;
372
373
                    // set word spacing
374 9
                    case 'Tw':
375 2
                        break;
376
377
                    // set horizontal scaling
378 9
                    case 'Tz':
379
                        $text .= "\n";
380
                        break;
381
382
                    // move to start of next line
383 9
                    case 'T*':
384 2
                        $text .= "\n";
385 2
                        break;
386
387 8
                    case 'Da':
388
                        break;
389
390 8
                    case 'Do':
391 4
                        if (null !== $page) {
392 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
393 4
                            $id = trim(array_pop($args), '/ ');
394 4
                            $xobject = $page->getXObject($id);
395
396
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
397 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
398
                                // Not a circular reference.
399 4
                                $text .= $xobject->getText($page);
400
                            }
401
                        }
402 4
                        break;
403
404 6
                    case 'rg':
405 6
                    case 'RG':
406 1
                        break;
407
408 6
                    case 're':
409
                        break;
410
411 6
                    case 'co':
412
                        break;
413
414 6
                    case 'cs':
415 1
                        break;
416
417 6
                    case 'gs':
418 3
                        break;
419
420 5
                    case 'en':
421
                        break;
422
423 5
                    case 'sc':
424 5
                    case 'SC':
425
                        break;
426
427 5
                    case 'g':
428 5
                    case 'G':
429 1
                        break;
430
431 4
                    case 'V':
432
                        break;
433
434 4
                    case 'vo':
435 4
                    case 'Vo':
436
                        break;
437
438
                    default:
439
                }
440
            }
441
442
            // Fix Hebrew and other reverse text oriented languages.
443
            // @see: https://github.com/smalot/pdfparser/issues/398
444 15
            if ($reverse_text) {
445 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

445
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
446 1
                $text = implode('', array_reverse($chars));
447
            }
448
449 15
            $result .= $text;
450
        }
451
452 17
        return $result.' ';
453
    }
454
455
    /**
456
     * @throws \Exception
457
     */
458 6
    public function getTextArray(?Page $page = null): array
459
    {
460 6
        $text = [];
461 6
        $sections = $this->getSectionsText($this->content);
462 6
        $current_font = new Font($this->document, null, null, $this->config);
463
464 6
        foreach ($sections as $section) {
465 6
            $commands = $this->getCommandsText($section);
466
467 6
            foreach ($commands as $command) {
468 6
                switch ($command[self::OPERATOR]) {
469
                    // set character spacing
470 6
                    case 'Tc':
471 3
                        break;
472
473
                    // move text current point
474 6
                    case 'Td':
475 6
                        break;
476
477
                    // move text current point and set leading
478 6
                    case 'TD':
479
                        break;
480
481 6
                    case 'Tf':
482 6
                        if (null !== $page) {
483 6
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
484 6
                            $id = trim($id, '/');
485 6
                            $current_font = $page->getFont($id);
486
                        }
487 6
                        break;
488
489 6
                    case "'":
490 6
                    case 'Tj':
491 5
                        $command[self::COMMAND] = [$command];
492
                        // no break
493 6
                    case 'TJ':
494 6
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
495 6
                        $text[] = $sub_text;
496 6
                        break;
497
498
                    // set leading
499 5
                    case 'TL':
500 4
                        break;
501
502 5
                    case 'Tm':
503 4
                        break;
504
505
                    // set super/subscripting text rise
506 5
                    case 'Ts':
507
                        break;
508
509
                    // set word spacing
510 5
                    case 'Tw':
511 2
                        break;
512
513
                    // set horizontal scaling
514 5
                    case 'Tz':
515
                        //$text .= "\n";
516
                        break;
517
518
                    // move to start of next line
519 5
                    case 'T*':
520
                        //$text .= "\n";
521 4
                        break;
522
523 4
                    case 'Da':
524
                        break;
525
526 4
                    case 'Do':
527
                        if (null !== $page) {
528
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
529
                            $id = trim(array_pop($args), '/ ');
530
                            if ($xobject = $page->getXObject($id)) {
531
                                $text[] = $xobject->getText($page);
532
                            }
533
                        }
534
                        break;
535
536 4
                    case 'rg':
537 4
                    case 'RG':
538 2
                        break;
539
540 4
                    case 're':
541
                        break;
542
543 4
                    case 'co':
544
                        break;
545
546 4
                    case 'cs':
547
                        break;
548
549 4
                    case 'gs':
550 1
                        break;
551
552 4
                    case 'en':
553
                        break;
554
555 4
                    case 'sc':
556 4
                    case 'SC':
557
                        break;
558
559 4
                    case 'g':
560 4
                    case 'G':
561 2
                        break;
562
563 2
                    case 'V':
564
                        break;
565
566 2
                    case 'vo':
567 2
                    case 'Vo':
568
                        break;
569
570
                    default:
571
                }
572
            }
573
        }
574
575 6
        return $text;
576
    }
577
578 26
    public function getCommandsText(string $text_part, int &$offset = 0): array
579
    {
580 26
        $commands = $matches = [];
581
582 26
        while ($offset < \strlen($text_part)) {
583 26
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
584 26
            $char = $text_part[$offset];
585
586 26
            $operator = '';
587 26
            $type = '';
588 26
            $command = false;
589
590 26
            switch ($char) {
591 26
                case '/':
592 26
                    $type = $char;
593 26
                    if (preg_match(
594 26
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
595 26
                        substr($text_part, $offset),
596
                        $matches
597
                    )
598
                    ) {
599 26
                        $operator = $matches[2];
600 26
                        $command = $matches[1];
601 26
                        $offset += \strlen($matches[0]);
602 9
                    } elseif (preg_match(
603 9
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
604 9
                        substr($text_part, $offset),
605
                        $matches
606
                    )
607
                    ) {
608 9
                        $operator = $matches[2];
609 9
                        $command = $matches[1];
610 9
                        $offset += \strlen($matches[0]);
611
                    }
612 26
                    break;
613
614 26
                case '[':
615 26
                case ']':
616
                    // array object
617 23
                    $type = $char;
618 23
                    if ('[' == $char) {
619 23
                        ++$offset;
620
                        // get elements
621 23
                        $command = $this->getCommandsText($text_part, $offset);
622
623 23
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
624 23
                            $operator = trim($matches[0]);
625 23
                            $offset += \strlen($matches[0]);
626
                        }
627
                    } else {
628 23
                        ++$offset;
629 23
                        break;
630
                    }
631 23
                    break;
632
633 26
                case '<':
634 26
                case '>':
635
                    // array object
636 12
                    $type = $char;
637 12
                    ++$offset;
638 12
                    if ('<' == $char) {
639 12
                        $strpos = strpos($text_part, '>', $offset);
640 12
                        $command = substr($text_part, $offset, ($strpos - $offset));
641 12
                        $offset = $strpos + 1;
642
                    }
643
644 12
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
645 9
                        $operator = trim($matches[0]);
646 9
                        $offset += \strlen($matches[0]);
647
                    }
648 12
                    break;
649
650 26
                case '(':
651 26
                case ')':
652 19
                    ++$offset;
653 19
                    $type = $char;
654 19
                    $strpos = $offset;
655 19
                    if ('(' == $char) {
656 19
                        $open_bracket = 1;
657 19
                        while ($open_bracket > 0) {
658 19
                            if (!isset($text_part[$strpos])) {
659
                                break;
660
                            }
661 19
                            $ch = $text_part[$strpos];
662 19
                            switch ($ch) {
663 19
                                case '\\':
664
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
665
                                    // skip next character
666 13
                                    ++$strpos;
667 13
                                    break;
668
669 19
                                case '(':
670
                                 // LEFT PARENHESIS (28h)
671
                                    ++$open_bracket;
672
                                    break;
673
674 19
                                case ')':
675
                                 // RIGHT PARENTHESIS (29h)
676 19
                                    --$open_bracket;
677 19
                                    break;
678
                            }
679 19
                            ++$strpos;
680
                        }
681 19
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
682 19
                        $offset = $strpos;
683
684 19
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
685 15
                            $operator = $matches[1];
686 15
                            $offset += \strlen($matches[0]);
687
                        }
688
                    }
689 19
                    break;
690
691
                default:
692 26
                    if ('ET' == substr($text_part, $offset, 2)) {
693 1
                        break;
694 26
                    } elseif (preg_match(
695 26
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
696 26
                        substr($text_part, $offset),
697
                        $matches
698
                    )
699
                    ) {
700 26
                        $operator = trim($matches['id']);
701 26
                        $command = trim($matches['data']);
702 26
                        $offset += \strlen($matches[0]);
703 21
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
704 20
                        $type = 'n';
705 20
                        $command = trim($matches[0]);
706 20
                        $offset += \strlen($matches[0]);
707 14
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
708 14
                        $type = '';
709 14
                        $operator = $matches[1];
710 14
                        $command = '';
711 14
                        $offset += \strlen($matches[0]);
712
                    }
713
            }
714
715 26
            if (false !== $command) {
716 26
                $commands[] = [
717 26
                    self::TYPE => $type,
718 26
                    self::OPERATOR => $operator,
719 26
                    self::COMMAND => $command,
720
                ];
721
            } else {
722 23
                break;
723
            }
724
        }
725
726 26
        return $commands;
727
    }
728
729 38
    public static function factory(
730
        Document $document,
731
        Header $header,
732
        ?string $content,
733
        ?Config $config = null
734
    ): self {
735 38
        switch ($header->get('Type')->getContent()) {
736 38
            case 'XObject':
737 8
                switch ($header->get('Subtype')->getContent()) {
738 8
                    case 'Image':
739 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

739
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
740
741 6
                    case 'Form':
742 6
                        return new Form($document, $header, $content, $config);
743
                }
744
745
                return new self($document, $header, $content, $config);
746
747 38
            case 'Pages':
748 37
                return new Pages($document, $header, $content, $config);
749
750 38
            case 'Page':
751 37
                return new Page($document, $header, $content, $config);
752
753 38
            case 'Encoding':
754 5
                return new Encoding($document, $header, $content, $config);
755
756 38
            case 'Font':
757 37
                $subtype = $header->get('Subtype')->getContent();
758 37
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
759
760 37
                if (class_exists($classname)) {
761 37
                    return new $classname($document, $header, $content, $config);
762
                }
763
764
                return new Font($document, $header, $content, $config);
765
766
            default:
767 38
                return new self($document, $header, $content, $config);
768
        }
769
    }
770
771
    /**
772
     * Returns unique id identifying the object.
773
     */
774 17
    protected function getUniqueId(): string
775
    {
776 17
        return spl_object_hash($this);
777
    }
778
}
779