Passed
Push — fix/invalid-characters ( f93ec3...37d7f1 )
by Jeremy
02:18
created

PDFObject::getText()   F

Complexity

Conditions 54
Paths 235

Size

Total Lines 203
Code Lines 128

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 116
CRAP Score 56.9865

Importance

Changes 1
Bugs 1 Features 0
Metric Value
cc 54
eloc 128
c 1
b 1
f 0
nc 235
nop 1
dl 0
loc 203
ccs 116
cts 129
cp 0.8992
crap 56.9865
rs 2.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 57
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 57
        $this->document = $document;
81 57
        $this->header = null !== $header ? $header : new Header();
82 57
        $this->content = $content;
83 57
        $this->config = $config;
84 57
    }
85
86 44
    public function init()
87
    {
88 44
    }
89
90 44
    public function getHeader(): ?Header
91
    {
92 44
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 45
    public function get(string $name)
99
    {
100 45
        return $this->header->get($name);
101
    }
102
103 42
    public function has(string $name): bool
104
    {
105 42
        return $this->header->has($name);
106
    }
107
108 2
    public function getDetails(bool $deep = true): array
109
    {
110 2
        return $this->header->getDetails($deep);
111
    }
112
113 33
    public function getContent(): ?string
114
    {
115 33
        return $this->content;
116
    }
117
118 27
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 27
        $char = $char[0];
121 27
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 27
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 27
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 27
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 27
        foreach ($matches[1] as $part) {
132 19
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 27
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 27
        foreach ($matches[1] as $part) {
138 16
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 27
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 27
            $content = '';
144 27
            $level = 0;
145 27
            foreach ($parts as $part) {
146 27
                if ('<' == $part) {
147 15
                    ++$level;
148
                }
149
150 27
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 27
                if ('>' == $part) {
153 15
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 27
        preg_match_all(
160 27
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 27
            \PREG_OFFSET_CAPTURE
164
        );
165 27
        foreach ($matches[1] as $part) {
166 4
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 27
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 27
        foreach ($matches[1] as $part) {
171 8
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 27
        return $content;
175
    }
176
177 26
    public function getSectionsText(?string $content): array
178
    {
179 26
        $sections = [];
180 26
        $content = ' '.$content.' ';
181 26
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 26
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 24
            foreach ($matches[2] as $pos => $part) {
186 24
                $text = $part[0];
187 24
                if ('' === $text) {
188
                    continue;
189
                }
190 24
                $offset = $part[1];
191 24
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 24
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 24
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 24
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 26
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 4
            foreach ($matches[1] as $part) {
207 4
                $text = $part[0];
208 4
                $offset = $part[1];
209 4
                $section = substr($content, $offset, \strlen($text));
210
211 4
                $sections[] = $section;
212
            }
213
        }
214
215 26
        return $sections;
216
    }
217
218 16
    private function getDefaultFont(Page $page = null): Font
219
    {
220 16
        $fonts = [];
221 16
        if (null !== $page) {
222 15
            $fonts = $page->getFonts();
223
        }
224
225 16
        $firstFont = $this->document->getFirstFont();
226 16
        if (null !== $firstFont) {
227 14
            $fonts[] = $firstFont;
228
        }
229
230 16
        if (\count($fonts) > 0) {
231 14
            return reset($fonts);
232
        }
233
234 2
        return new Font($this->document, null, null, $this->config);
235
    }
236
237
    /**
238
     * @throws \Exception
239
     */
240 16
    public function getText(?Page $page = null): string
241
    {
242 16
        $result = '';
243 16
        $sections = $this->getSectionsText($this->content);
244 16
        $current_font = $this->getDefaultFont($page);
245 16
        $clipped_font = $current_font;
246
247 16
        $current_position_td = ['x' => false, 'y' => false];
248 16
        $current_position_tm = ['x' => false, 'y' => false];
249
250 16
        self::$recursionStack[] = $this->getUniqueId();
251
252 16
        foreach ($sections as $section) {
253 14
            $commands = $this->getCommandsText($section);
254 14
            $reverse_text = false;
255 14
            $text = '';
256
257 14
            foreach ($commands as $command) {
258 14
                switch ($command[self::OPERATOR]) {
259 14
                    case 'BMC':
260 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
261 1
                            $reverse_text = true;
262
                        }
263 1
                        break;
264
265
                    // set character spacing
266 14
                    case 'Tc':
267 3
                        break;
268
269
                    // move text current point
270 14
                    case 'Td':
271 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
272 11
                        $y = array_pop($args);
273 11
                        $x = array_pop($args);
274 11
                        if (((float) $x <= 0) ||
275 11
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
276
                        ) {
277
                            // vertical offset
278 7
                            $text .= "\n";
279 11
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
280 11
                                $current_position_td['x']
281
                            )
282
                        ) {
283 8
                            $text .= $this->config->getHorizontalOffset();
284
                        }
285 11
                        $current_position_td = ['x' => $x, 'y' => $y];
286 11
                        break;
287
288
                    // move text current point and set leading
289 14
                    case 'TD':
290 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
291 1
                        $y = array_pop($args);
292 1
                        $x = array_pop($args);
293 1
                        if ((float) $y < 0) {
294 1
                            $text .= "\n";
295
                        } elseif ((float) $x <= 0) {
296
                            $text .= ' ';
297
                        }
298 1
                        break;
299
300 14
                    case 'Tf':
301 14
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
302 14
                        $id = trim($id, '/');
303 14
                        if (null !== $page) {
304 14
                            $new_font = $page->getFont($id);
305
                            // If an invalid font ID is given, do not update the font.
306
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
307
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
308
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
309
                            // But we want to make sure that malformed PDFs do not simply crash.
310 14
                            if (null !== $new_font) {
311 13
                                $current_font = $new_font;
312
                            }
313
                        }
314 14
                        break;
315
316 14
                    case 'Q':
317
                        // Use clip: restore font.
318 4
                        $current_font = $clipped_font;
319 4
                        break;
320
321 14
                    case 'q':
322
                        // Use clip: save font.
323 4
                        $clipped_font = $current_font;
324 4
                        break;
325
326 14
                    case "'":
327 14
                    case 'Tj':
328 9
                        $command[self::COMMAND] = [$command];
329
                        // no break
330 14
                    case 'TJ':
331 14
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
332 14
                        $text .= $sub_text;
333 14
                        break;
334
335
                    // set leading
336 12
                    case 'TL':
337 1
                        $text .= ' ';
338 1
                        break;
339
340 12
                    case 'Tm':
341 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
342 12
                        $y = array_pop($args);
343 12
                        $x = array_pop($args);
344 12
                        if (false !== $current_position_tm['x']) {
345 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
346 12
                            if ($delta > 10) {
347 10
                                $text .= "\t";
348
                            }
349
                        }
350 12
                        if (false !== $current_position_tm['y']) {
351 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
352 12
                            if ($delta > 10) {
353 8
                                $text .= "\n";
354
                            }
355
                        }
356 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
357 12
                        break;
358
359
                    // set super/subscripting text rise
360 9
                    case 'Ts':
361
                        break;
362
363
                    // set word spacing
364 9
                    case 'Tw':
365 2
                        break;
366
367
                    // set horizontal scaling
368 9
                    case 'Tz':
369
                        $text .= "\n";
370
                        break;
371
372
                    // move to start of next line
373 9
                    case 'T*':
374 2
                        $text .= "\n";
375 2
                        break;
376
377 8
                    case 'Da':
378
                        break;
379
380 8
                    case 'Do':
381 4
                        if (null !== $page) {
382 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
383 4
                            $id = trim(array_pop($args), '/ ');
384 4
                            $xobject = $page->getXObject($id);
385
386
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
387 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
388
                                // Not a circular reference.
389 4
                                $text .= $xobject->getText($page);
390
                            }
391
                        }
392 4
                        break;
393
394 6
                    case 'rg':
395 6
                    case 'RG':
396 1
                        break;
397
398 6
                    case 're':
399
                        break;
400
401 6
                    case 'co':
402
                        break;
403
404 6
                    case 'cs':
405 1
                        break;
406
407 6
                    case 'gs':
408 3
                        break;
409
410 5
                    case 'en':
411
                        break;
412
413 5
                    case 'sc':
414 5
                    case 'SC':
415
                        break;
416
417 5
                    case 'g':
418 5
                    case 'G':
419 1
                        break;
420
421 4
                    case 'V':
422
                        break;
423
424 4
                    case 'vo':
425 4
                    case 'Vo':
426
                        break;
427
428
                    default:
429
                }
430
            }
431
432
            // Fix Hebrew and other reverse text oriented languages.
433
            // @see: https://github.com/smalot/pdfparser/issues/398
434 14
            if ($reverse_text) {
435 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

435
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
436 1
                $text = implode('', array_reverse($chars));
437
            }
438
439 14
            $result .= $text;
440
        }
441
442 16
        return $result.' ';
443
    }
444
445
    /**
446
     * @throws \Exception
447
     */
448 5
    public function getTextArray(?Page $page = null): array
449
    {
450 5
        $text = [];
451 5
        $sections = $this->getSectionsText($this->content);
452 5
        $current_font = new Font($this->document, null, null, $this->config);
453
454 5
        foreach ($sections as $section) {
455 5
            $commands = $this->getCommandsText($section);
456
457 5
            foreach ($commands as $command) {
458 5
                switch ($command[self::OPERATOR]) {
459
                    // set character spacing
460 5
                    case 'Tc':
461 2
                        break;
462
463
                    // move text current point
464 5
                    case 'Td':
465 5
                        break;
466
467
                    // move text current point and set leading
468 5
                    case 'TD':
469
                        break;
470
471 5
                    case 'Tf':
472 5
                        if (null !== $page) {
473 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
474 5
                            $id = trim($id, '/');
475 5
                            $current_font = $page->getFont($id);
476
                        }
477 5
                        break;
478
479 5
                    case "'":
480 5
                    case 'Tj':
481 4
                        $command[self::COMMAND] = [$command];
482
                        // no break
483 5
                    case 'TJ':
484 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
485 5
                        $text[] = $sub_text;
486 5
                        break;
487
488
                    // set leading
489 4
                    case 'TL':
490 3
                        break;
491
492 4
                    case 'Tm':
493 3
                        break;
494
495
                    // set super/subscripting text rise
496 4
                    case 'Ts':
497
                        break;
498
499
                    // set word spacing
500 4
                    case 'Tw':
501 1
                        break;
502
503
                    // set horizontal scaling
504 4
                    case 'Tz':
505
                        //$text .= "\n";
506
                        break;
507
508
                    // move to start of next line
509 4
                    case 'T*':
510
                        //$text .= "\n";
511 3
                        break;
512
513 3
                    case 'Da':
514
                        break;
515
516 3
                    case 'Do':
517
                        if (null !== $page) {
518
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
519
                            $id = trim(array_pop($args), '/ ');
520
                            if ($xobject = $page->getXObject($id)) {
521
                                $text[] = $xobject->getText($page);
522
                            }
523
                        }
524
                        break;
525
526 3
                    case 'rg':
527 3
                    case 'RG':
528 2
                        break;
529
530 3
                    case 're':
531
                        break;
532
533 3
                    case 'co':
534
                        break;
535
536 3
                    case 'cs':
537
                        break;
538
539 3
                    case 'gs':
540
                        break;
541
542 3
                    case 'en':
543
                        break;
544
545 3
                    case 'sc':
546 3
                    case 'SC':
547
                        break;
548
549 3
                    case 'g':
550 3
                    case 'G':
551 2
                        break;
552
553 1
                    case 'V':
554
                        break;
555
556 1
                    case 'vo':
557 1
                    case 'Vo':
558
                        break;
559
560
                    default:
561
                }
562
            }
563
        }
564
565 5
        return $text;
566
    }
567
568 24
    public function getCommandsText(string $text_part, int &$offset = 0): array
569
    {
570 24
        $commands = $matches = [];
571
572 24
        while ($offset < \strlen($text_part)) {
573 24
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
574 24
            $char = $text_part[$offset];
575
576 24
            $operator = '';
577 24
            $type = '';
578 24
            $command = false;
579
580 24
            switch ($char) {
581 24
                case '/':
582 24
                    $type = $char;
583 24
                    if (preg_match(
584 24
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
585 24
                        substr($text_part, $offset),
586
                        $matches
587
                    )
588
                    ) {
589 24
                        $operator = $matches[2];
590 24
                        $command = $matches[1];
591 24
                        $offset += \strlen($matches[0]);
592 8
                    } elseif (preg_match(
593 8
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
594 8
                        substr($text_part, $offset),
595
                        $matches
596
                    )
597
                    ) {
598 8
                        $operator = $matches[2];
599 8
                        $command = $matches[1];
600 8
                        $offset += \strlen($matches[0]);
601
                    }
602 24
                    break;
603
604 24
                case '[':
605 24
                case ']':
606
                    // array object
607 22
                    $type = $char;
608 22
                    if ('[' == $char) {
609 22
                        ++$offset;
610
                        // get elements
611 22
                        $command = $this->getCommandsText($text_part, $offset);
612
613 22
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
614 22
                            $operator = trim($matches[0]);
615 22
                            $offset += \strlen($matches[0]);
616
                        }
617
                    } else {
618 22
                        ++$offset;
619 22
                        break;
620
                    }
621 22
                    break;
622
623 24
                case '<':
624 24
                case '>':
625
                    // array object
626 11
                    $type = $char;
627 11
                    ++$offset;
628 11
                    if ('<' == $char) {
629 11
                        $strpos = strpos($text_part, '>', $offset);
630 11
                        $command = substr($text_part, $offset, ($strpos - $offset));
631 11
                        $offset = $strpos + 1;
632
                    }
633
634 11
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
635 8
                        $operator = trim($matches[0]);
636 8
                        $offset += \strlen($matches[0]);
637
                    }
638 11
                    break;
639
640 24
                case '(':
641 24
                case ')':
642 17
                    ++$offset;
643 17
                    $type = $char;
644 17
                    $strpos = $offset;
645 17
                    if ('(' == $char) {
646 17
                        $open_bracket = 1;
647 17
                        while ($open_bracket > 0) {
648 17
                            if (!isset($text_part[$strpos])) {
649
                                break;
650
                            }
651 17
                            $ch = $text_part[$strpos];
652 17
                            switch ($ch) {
653 17
                                case '\\':
654
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
655
                                    // skip next character
656 12
                                    ++$strpos;
657 12
                                    break;
658
659 17
                                case '(':
660
                                 // LEFT PARENHESIS (28h)
661
                                    ++$open_bracket;
662
                                    break;
663
664 17
                                case ')':
665
                                 // RIGHT PARENTHESIS (29h)
666 17
                                    --$open_bracket;
667 17
                                    break;
668
                            }
669 17
                            ++$strpos;
670
                        }
671 17
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
672 17
                        $offset = $strpos;
673
674 17
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
675 13
                            $operator = $matches[1];
676 13
                            $offset += \strlen($matches[0]);
677
                        }
678
                    }
679 17
                    break;
680
681
                default:
682 24
                    if ('ET' == substr($text_part, $offset, 2)) {
683 1
                        break;
684 24
                    } elseif (preg_match(
685 24
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
686 24
                        substr($text_part, $offset),
687
                        $matches
688
                    )
689
                    ) {
690 24
                        $operator = trim($matches['id']);
691 24
                        $command = trim($matches['data']);
692 24
                        $offset += \strlen($matches[0]);
693 20
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
694 19
                        $type = 'n';
695 19
                        $command = trim($matches[0]);
696 19
                        $offset += \strlen($matches[0]);
697 13
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
698 13
                        $type = '';
699 13
                        $operator = $matches[1];
700 13
                        $command = '';
701 13
                        $offset += \strlen($matches[0]);
702
                    }
703
            }
704
705 24
            if (false !== $command) {
706 24
                $commands[] = [
707 24
                    self::TYPE => $type,
708 24
                    self::OPERATOR => $operator,
709 24
                    self::COMMAND => $command,
710
                ];
711
            } else {
712 22
                break;
713
            }
714
        }
715
716 24
        return $commands;
717
    }
718
719 37
    public static function factory(
720
        Document $document,
721
        Header $header,
722
        ?string $content,
723
        ?Config $config = null
724
    ): self {
725 37
        switch ($header->get('Type')->getContent()) {
726 37
            case 'XObject':
727 9
                switch ($header->get('Subtype')->getContent()) {
728 9
                    case 'Image':
729 4
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

729
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
730
731 6
                    case 'Form':
732 6
                        return new Form($document, $header, $content, $config);
733
                }
734
735
                return new self($document, $header, $content, $config);
736
737 37
            case 'Pages':
738 36
                return new Pages($document, $header, $content, $config);
739
740 37
            case 'Page':
741 36
                return new Page($document, $header, $content, $config);
742
743 37
            case 'Encoding':
744 5
                return new Encoding($document, $header, $content, $config);
745
746 37
            case 'Font':
747 36
                $subtype = $header->get('Subtype')->getContent();
748 36
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
749
750 36
                if (class_exists($classname)) {
751 36
                    return new $classname($document, $header, $content, $config);
752
                }
753
754
                return new Font($document, $header, $content, $config);
755
756
            default:
757 37
                return new self($document, $header, $content, $config);
758
        }
759
    }
760
761
    /**
762
     * Returns unique id identifying the object.
763
     */
764 16
    protected function getUniqueId(): string
765
    {
766 16
        return spl_object_hash($this);
767
    }
768
}
769