Test Failed
Push — master ( 5c8274...ce434c )
by Konrad
01:59
created

PDFObject::factory()   B

Complexity

Conditions 10
Paths 9

Size

Total Lines 39
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 110

Importance

Changes 0
Metric Value
cc 10
eloc 22
c 0
b 0
f 0
nc 9
nop 4
dl 0
loc 39
ccs 0
cts 0
cp 0
crap 110
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 *
9
 * @date    2017-01-03
10
 *
11
 * @license LGPLv3
12
 *
13
 * @url     <https://github.com/smalot/pdfparser>
14
 *
15
 *  PdfParser is a pdf library written in PHP, extraction oriented.
16
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
17
 *
18
 *  This program is free software: you can redistribute it and/or modify
19
 *  it under the terms of the GNU Lesser General Public License as published by
20
 *  the Free Software Foundation, either version 3 of the License, or
21
 *  (at your option) any later version.
22
 *
23
 *  This program is distributed in the hope that it will be useful,
24
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
25
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
26
 *  GNU Lesser General Public License for more details.
27
 *
28
 *  You should have received a copy of the GNU Lesser General Public License
29
 *  along with this program.
30
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
31
 */
32
33
namespace Smalot\PdfParser;
34
35
use Smalot\PdfParser\XObject\Form;
36
use Smalot\PdfParser\XObject\Image;
37
38
/**
39
 * Class PDFObject
40
 */
41
class PDFObject
42
{
43
    public const TYPE = 't';
44
45
    public const OPERATOR = 'o';
46
47
    public const COMMAND = 'c';
48
49
    /**
50
     * The recursion stack.
51
     *
52
     * @var array
53
     */
54
    public static $recursionStack = [];
55
56
    /**
57
     * @var Document
58
     */
59
    protected $document;
60
61
    /**
62
     * @var Header
63
     */
64
    protected $header;
65
66
    /**
67
     * @var string
68
     */
69
    protected $content;
70
71
    /**
72
     * @var Config
73
     */
74
    protected $config;
75
76 62
    public function __construct(
77
        Document $document,
78
        Header $header = null,
79
        string $content = null,
80
        Config $config = null
81
    ) {
82 62
        $this->document = $document;
83 62
        $this->header = $header ?? new Header();
84 62
        $this->content = $content;
85 62
        $this->config = $config;
86 62
    }
87
88 49
    public function init()
89
    {
90 49
    }
91
92 3
    public function getDocument(): Document
93
    {
94 3
        return $this->document;
95
    }
96
97 49
    public function getHeader(): ?Header
98
    {
99 49
        return $this->header;
100
    }
101
102 3
    public function getConfig(): ?Config
103
    {
104 3
        return $this->config;
105
    }
106
107
    /**
108
     * @return Element|PDFObject|Header
109
     */
110 50
    public function get(string $name)
111
    {
112 50
        return $this->header->get($name);
113
    }
114
115 47
    public function has(string $name): bool
116
    {
117 47
        return $this->header->has($name);
118
    }
119
120 3
    public function getDetails(bool $deep = true): array
121
    {
122 3
        return $this->header->getDetails($deep);
123
    }
124
125 38
    public function getContent(): ?string
126
    {
127 38
        return $this->content;
128
    }
129
130 32
    public function cleanContent(string $content, string $char = 'X')
131
    {
132 32
        $char = $char[0];
133 32
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
134
135
        // Remove image bloc with binary content
136 32
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 32
        foreach ($matches[0] as $part) {
138
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean content in square brackets [.....]
142 32
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

142
        /** @scrutinizer ignore-call */ 
143
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
143 32
        foreach ($matches[1] as $part) {
144 22
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
145
        }
146
147
        // Clean content in round brackets (.....)
148 32
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
149 32
        foreach ($matches[1] as $part) {
150 21
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
151
        }
152
153
        // Clean structure
154 32
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

154
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
155 32
            $content = '';
156 32
            $level = 0;
157 32
            foreach ($parts as $part) {
158 32
                if ('<' == $part) {
159 18
                    ++$level;
160
                }
161
162 32
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
163
164 32
                if ('>' == $part) {
165 18
                    --$level;
166
                }
167
            }
168
        }
169
170
        // Clean BDC and EMC markup
171 32
        preg_match_all(
172 32
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
173
            $content,
174
            $matches,
175 32
            \PREG_OFFSET_CAPTURE
176
        );
177 32
        foreach ($matches[1] as $part) {
178 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
179
        }
180
181 32
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
182 32
        foreach ($matches[1] as $part) {
183 11
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
184
        }
185
186 32
        return $content;
187
    }
188
189 31
    public function getSectionsText(?string $content): array
190
    {
191 31
        $sections = [];
192 31
        $content = ' '.$content.' ';
193 31
        $textCleaned = $this->cleanContent($content, '_');
194
195
        // Extract text blocks.
196 31
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

196
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
197 29
            foreach ($matches[2] as $pos => $part) {
198 29
                $text = $part[0];
199 29
                if ('' === $text) {
200
                    continue;
201
                }
202 29
                $offset = $part[1];
203 29
                $section = substr($content, $offset, \strlen($text));
204
205
                // Removes BDC and EMC markup.
206 29
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
207
208
                // Add Q and q flags if detected around BT/ET.
209
                // @see: https://github.com/smalot/pdfparser/issues/387
210 29
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
211
212 29
                $sections[] = $section;
213
            }
214
        }
215
216
        // Extract 'do' commands.
217 31
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
218 4
            foreach ($matches[1] as $part) {
219 4
                $text = $part[0];
220 4
                $offset = $part[1];
221 4
                $section = substr($content, $offset, \strlen($text));
222
223 4
                $sections[] = $section;
224
            }
225
        }
226
227 31
        return $sections;
228
    }
229
230 20
    private function getDefaultFont(Page $page = null): Font
231
    {
232 20
        $fonts = [];
233 20
        if (null !== $page) {
234 19
            $fonts = $page->getFonts();
235
        }
236
237 20
        $firstFont = $this->document->getFirstFont();
238 20
        if (null !== $firstFont) {
239 18
            $fonts[] = $firstFont;
240
        }
241
242 20
        if (\count($fonts) > 0) {
243 18
            return reset($fonts);
244
        }
245
246 2
        return new Font($this->document, null, null, $this->config);
247
    }
248
249
    /**
250
     * @param array<int,array<string,string|bool>> $command
251
     */
252 20
    private function getTJUsingFontFallback(Font $font, array $command, Page $page = null): string
253
    {
254 20
        $orig_text = $font->decodeText($command);
255 20
        $text = $orig_text;
256 20
257 20
        // If we make this a Config option, we can add a check if it's
258
        // enabled here.
259 20
        if (null !== $page) {
260 20
            $font_ids = array_keys($page->getFonts());
261
262 20
            // If the decoded text contains UTF-8 control characters
263
            // then the font page being used is probably the wrong one.
264 20
            // Loop through the rest of the fonts to see if we can get
265 18
            // a good decode.
266 18
            while (preg_match('/[\x00-\x1f\x7f]/u', $text) || false !== strpos(bin2hex($text), '00')) {
267 18
                // If we're out of font IDs, then give up and use the
268
                // original string
269 18
                if (0 == \count($font_ids)) {
270 18
                    return $orig_text;
271 18
                }
272 1
273 1
                // Try the next font ID
274
                $font = $page->getFont(array_shift($font_ids));
275 1
                $text = $font->decodeText($command);
276
            }
277
        }
278 18
279 5
        return $text;
280
    }
281
282 18
    /**
283 15
     * @throws \Exception
284 15
     */
285 15
    public function getText(Page $page = null): string
286 15
    {
287 15
        $result = '';
288
        $sections = $this->getSectionsText($this->content);
289
        $current_font = $this->getDefaultFont($page);
290 11
        $clipped_font = $current_font;
291 15
292 15
        $current_position_td = ['x' => false, 'y' => false];
293
        $current_position_tm = ['x' => false, 'y' => false];
294 12
295
        self::$recursionStack[] = $this->getUniqueId();
296 15
297 15
        foreach ($sections as $section) {
298
            $commands = $this->getCommandsText($section);
299
            $reverse_text = false;
300 18
            $text = '';
301 3
302 3
            foreach ($commands as $command) {
303 3
                switch ($command[self::OPERATOR]) {
304 3
                    case 'BMC':
305 3
                        if ('ReversedChars' == $command[self::COMMAND]) {
306
                            $reverse_text = true;
307
                        }
308
                        break;
309 3
310
                        // set character spacing
311 18
                    case 'Tc':
312 18
                        break;
313 18
314 18
                        // move text current point
315 18
                    case 'Td':
316
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
317
                        $y = array_pop($args);
318
                        $x = array_pop($args);
319
                        if (((float) $x <= 0)
320
                            || (false !== $current_position_td['y'] && (float) $y < (float) $current_position_td['y'])
321 18
                        ) {
322 16
                            // vertical offset
323
                            $text .= "\n";
324
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float)
325 18
                            $current_position_td['x']
326
                        ) {
327 18
                            $text .= $this->config->getHorizontalOffset();
328
                        }
329 5
                        $current_position_td = ['x' => $x, 'y' => $y];
330 5
                        break;
331
332 18
                        // move text current point and set leading
333
                    case 'TD':
334 6
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
335 6
                        $y = array_pop($args);
336
                        $x = array_pop($args);
337 18
                        if ((float) $y < 0) {
338 18
                            $text .= "\n";
339 13
                        } elseif ((float) $x <= 0) {
340
                            $text .= ' ';
341 17
                        }
342 18
                        break;
343 18
344 18
                    case 'Tf':
345
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
346
                        $id = trim($id, '/');
347 15
                        if (null !== $page) {
348 1
                            $new_font = $page->getFont($id);
349 1
                            // If an invalid font ID is given, do not update the font.
350
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
351 15
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
352 14
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
353 14
                            // But we want to make sure that malformed PDFs do not simply crash.
354 14
                            if (null !== $new_font) {
355 14
                                $current_font = $new_font;
356 14
                            }
357 14
                        }
358 12
                        break;
359
360
                    case 'Q':
361 14
                        // Use clip: restore font.
362 14
                        $current_font = $clipped_font;
363 14
                        break;
364 10
365
                    case 'q':
366
                        // Use clip: save font.
367 14
                        $clipped_font = $current_font;
368 14
                        break;
369
370
                    case "'":
371 12
                    case 'Tj':
372
                        $command[self::COMMAND] = [$command];
373
                        // no break
374
                    case 'TJ':
375 12
                        $text .= $this->getTJUsingFontFallback(
376 4
                            $current_font,
377
                            $command[self::COMMAND],
378
                            $page
379 12
                        );
380
                        break;
381
382
                        // set leading
383
                    case 'TL':
384 12
                        $text .= ' ';
385 4
                        break;
386 4
387
                    case 'Tm':
388 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
389
                        $y = array_pop($args);
390
                        $x = array_pop($args);
391 11
                        if (false !== $current_position_tm['x']) {
392 4
                            $delta = abs((float) $x - (float) $current_position_tm['x']);
393 4
                            if ($delta > 10) {
394 4
                                $text .= "\t";
395 4
                            }
396
                        }
397
                        if (false !== $current_position_tm['y']) {
398 4
                            $delta = abs((float) $y - (float) $current_position_tm['y']);
399
                            if ($delta > 10) {
400 4
                                $text .= "\n";
401
                            }
402
                        }
403 4
                        $current_position_tm = ['x' => $x, 'y' => $y];
404
                        break;
405 9
406 8
                        // set super/subscripting text rise
407 2
                    case 'Ts':
408
                        break;
409 8
410
                        // set word spacing
411
                    case 'Tw':
412 8
                        break;
413
414
                        // set horizontal scaling
415 8
                    case 'Tz':
416 3
                        $text .= "\n";
417
                        break;
418 8
419 3
                        // move to start of next line
420
                    case 'T*':
421 7
                        $text .= "\n";
422
                        break;
423
424 7
                    case 'Da':
425 7
                        break;
426
427
                    case 'Do':
428 7
                        if (null !== $page) {
429 7
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
430 1
                            $id = trim(array_pop($args), '/ ');
431
                            $xobject = $page->getXObject($id);
432 6
433
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
434
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
435 6
                                // Not a circular reference.
436 6
                                $text .= $xobject->getText($page);
437
                            }
438
                        }
439
                        break;
440
441
                    case 'rg':
442
                    case 'RG':
443
                        break;
444
445 18
                    case 're':
446 1
                        break;
447 1
448
                    case 'co':
449
                        break;
450 18
451
                    case 'cs':
452
                        break;
453 20
454
                    case 'gs':
455
                        break;
456
457
                    case 'en':
458
                        break;
459 6
460
                    case 'sc':
461 6
                    case 'SC':
462 6
                        break;
463 6
464
                    case 'g':
465 6
                    case 'G':
466 6
                        break;
467
468 6
                    case 'V':
469 6
                        break;
470
471 6
                    case 'vo':
472 3
                    case 'Vo':
473
                        break;
474
475 6
                    default:
476 6
                }
477
            }
478
479 6
            // Fix Hebrew and other reverse text oriented languages.
480
            // @see: https://github.com/smalot/pdfparser/issues/398
481
            if ($reverse_text) {
482 6
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

482
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
483 6
                $text = implode('', array_reverse($chars));
484 6
            }
485 6
486 6
            $result .= $text;
487
        }
488 6
489
        return $result.' ';
490 6
    }
491 6
492 5
    /**
493
     * @throws \Exception
494 6
     */
495 6
    public function getTextArray(Page $page = null): array
496 6
    {
497 6
        $text = [];
498
        $sections = $this->getSectionsText($this->content);
499
        $current_font = new Font($this->document, null, null, $this->config);
500 5
501 4
        foreach ($sections as $section) {
502
            $commands = $this->getCommandsText($section);
503 5
504 4
            foreach ($commands as $command) {
505
                switch ($command[self::OPERATOR]) {
506
                    // set character spacing
507 5
                    case 'Tc':
508
                        break;
509
510
                        // move text current point
511 5
                    case 'Td':
512 2
                        break;
513
514
                        // move text current point and set leading
515 5
                    case 'TD':
516
                        break;
517
518
                    case 'Tf':
519
                        if (null !== $page) {
520 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
521
                            $id = trim($id, '/');
522 4
                            $current_font = $page->getFont($id);
523
                        }
524 4
                        break;
525
526
                    case "'":
527 4
                    case 'Tj':
528
                        $command[self::COMMAND] = [$command];
529
                        // no break
530
                    case 'TJ':
531
                        $text[] = $this->getTJUsingFontFallback(
532
                            $current_font,
533
                            $command[self::COMMAND],
534
                            $page
535
                        );
536
                        break;
537 4
538 4
                        // set leading
539 2
                    case 'TL':
540
                        break;
541 4
542
                    case 'Tm':
543
                        break;
544 4
545
                        // set super/subscripting text rise
546
                    case 'Ts':
547 4
                        break;
548
549
                        // set word spacing
550 4
                    case 'Tw':
551 1
                        break;
552
553 4
                        // set horizontal scaling
554
                    case 'Tz':
555
                        // $text .= "\n";
556 4
                        break;
557 4
558
                        // move to start of next line
559
                    case 'T*':
560 4
                        // $text .= "\n";
561 4
                        break;
562 2
563
                    case 'Da':
564 2
                        break;
565
566
                    case 'Do':
567 2
                        if (null !== $page) {
568 2
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
569
                            $id = trim(array_pop($args), '/ ');
570
                            if ($xobject = $page->getXObject($id)) {
571
                                $text[] = $xobject->getText($page);
572
                            }
573
                        }
574
                        break;
575
576 6
                    case 'rg':
577
                    case 'RG':
578
                        break;
579 29
580
                    case 're':
581 29
                        break;
582
583 29
                    case 'co':
584 29
                        break;
585 29
586
                    case 'cs':
587 29
                        break;
588 29
589 29
                    case 'gs':
590
                        break;
591 29
592 29
                    case 'en':
593 29
                        break;
594 29
595 29
                    case 'sc':
596 29
                    case 'SC':
597
                        break;
598
599
                    case 'g':
600 29
                    case 'G':
601 29
                        break;
602 29
603 11
                    case 'V':
604 11
                        break;
605 11
606
                    case 'vo':
607
                    case 'Vo':
608
                        break;
609 11
610 11
                    default:
611 11
                }
612
            }
613 29
        }
614
615 29
        return $text;
616 29
    }
617
618 25
    public function getCommandsText(string $text_part, int &$offset = 0): array
619 25
    {
620 25
        $commands = $matches = [];
621
622 25
        while ($offset < \strlen($text_part)) {
623
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
624 25
            $char = $text_part[$offset];
625 25
626 25
            $operator = '';
627
            $type = '';
628
            $command = false;
629 25
630 25
            switch ($char) {
631
                case '/':
632 25
                    $type = $char;
633
                    if (preg_match(
634 29
                        '/\G\/([A-Z0-9\._,\+-]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
635 29
                        $text_part,
636
                        $matches,
637 14
                        0,
638 14
                        $offset
639 14
                    )
640 14
                    ) {
641 14
                        $operator = $matches[2];
642 14
                        $command = $matches[1];
643
                        $offset += \strlen($matches[0]);
644
                    } elseif (preg_match(
645 14
                        '/\G\/([A-Z0-9\._,\+-]+)\s+([A-Z]+)\s*/si',
646 9
                        $text_part,
647 9
                        $matches,
648
                        0,
649 14
                        $offset
650
                    )
651 29
                    ) {
652 29
                        $operator = $matches[2];
653 22
                        $command = $matches[1];
654 22
                        $offset += \strlen($matches[0]);
655 22
                    }
656 22
                    break;
657 22
658 22
                case '[':
659 22
                case ']':
660
                    // array object
661
                    $type = $char;
662 22
                    if ('[' == $char) {
663 22
                        ++$offset;
664 22
                        // get elements
665
                        $command = $this->getCommandsText($text_part, $offset);
666
667 16
                        if (preg_match(
668 16
                            '/\G\s*[A-Z]{1,2}\s*/si',
669
                            $text_part,
670 22
                            $matches,
671
                            0,
672
                            $offset
673
                        )
674
                        ) {
675 22
                            $operator = trim($matches[0]);
676
                            $offset += \strlen($matches[0]);
677 22
                        }
678 22
                    } else {
679
                        ++$offset;
680 22
                        break;
681
                    }
682 22
                    break;
683 22
684
                case '<':
685 22
                case '>':
686 18
                    // array object
687 18
                    $type = $char;
688
                    ++$offset;
689
                    if ('<' == $char) {
690 22
                        $strpos = strpos($text_part, '>', $offset);
691
                        $command = substr($text_part, $offset, $strpos - $offset);
692
                        $offset = $strpos + 1;
693 29
                    }
694 1
695 29
                    if (preg_match(
696 29
                        '/\G\s*[A-Z]{1,2}\s*/si',
697 29
                        $text_part,
698
                        $matches,
699
                        0,
700
                        $offset
701 29
                    )
702 29
                    ) {
703 29
                        $operator = trim($matches[0]);
704 24
                        $offset += \strlen($matches[0]);
705 22
                    }
706 22
                    break;
707 22
708 17
                case '(':
709 17
                case ')':
710 17
                    ++$offset;
711 17
                    $type = $char;
712 17
                    $strpos = $offset;
713
                    if ('(' == $char) {
714
                        $open_bracket = 1;
715
                        while ($open_bracket > 0) {
716 29
                            if (!isset($text_part[$strpos])) {
717 29
                                break;
718 29
                            }
719 29
                            $ch = $text_part[$strpos];
720 29
                            switch ($ch) {
721
                                case '\\':
722
                                    // REVERSE SOLIDUS (5Ch) (Backslash)
723 25
                                    // skip next character
724
                                    ++$strpos;
725
                                    break;
726
727 29
                                case '(':
728
                                    // LEFT PARENHESIS (28h)
729
                                    ++$open_bracket;
730 42
                                    break;
731
732
                                case ')':
733
                                    // RIGHT PARENTHESIS (29h)
734
                                    --$open_bracket;
735
                                    break;
736 42
                            }
737 42
                            ++$strpos;
738 8
                        }
739 8
                        $command = substr($text_part, $offset, $strpos - $offset - 1);
740 3
                        $offset = $strpos;
741
742 6
                        if (preg_match(
743 6
                            '/\G\s*([A-Z\']{1,2})\s*/si',
744
                            $text_part,
745
                            $matches,
746
                            0,
747
                            $offset
748 42
                        )
749 41
                        ) {
750
                            $operator = $matches[1];
751 42
                            $offset += \strlen($matches[0]);
752 41
                        }
753
                    }
754 42
                    break;
755 6
756
                default:
757 42
                    if ('ET' == substr($text_part, $offset, 2)) {
758 41
                        break;
759 41
                    } elseif (preg_match(
760
                        '/\G\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
761 41
                        $text_part,
762 41
                        $matches,
763
                        0,
764
                        $offset
765
                    )
766
                    ) {
767
                        $operator = trim($matches['id']);
768 42
                        $command = trim($matches['data']);
769
                        $offset += \strlen($matches[0]);
770
                    } elseif (preg_match(
771
                        '/\G\s*([0-9\.\-]+\s*?)+\s*/si',
772
                        $text_part,
773
                        $matches,
774
                        0,
775 20
                        $offset
776
                    )
777 20
                    ) {
778
                        $type = 'n';
779
                        $command = trim($matches[0]);
780
                        $offset += \strlen($matches[0]);
781
                    } elseif (preg_match(
782
                        '/\G\s*([A-Z\*]+)\s*/si',
783
                        $text_part,
784
                        $matches,
785
                        0,
786
                        $offset
787
                    )
788
                    ) {
789
                        $type = '';
790
                        $operator = $matches[1];
791
                        $command = '';
792
                        $offset += \strlen($matches[0]);
793
                    }
794
            }
795
796
            if (false !== $command) {
797
                $commands[] = [
798
                    self::TYPE => $type,
799
                    self::OPERATOR => $operator,
800
                    self::COMMAND => $command,
801
                ];
802
            } else {
803
                break;
804
            }
805
        }
806
807
        return $commands;
808
    }
809
810
    public static function factory(
811
        Document $document,
812
        Header $header,
813
        ?string $content,
814
        Config $config = null
815
    ): self {
816
        switch ($header->get('Type')->getContent()) {
817
            case 'XObject':
818
                switch ($header->get('Subtype')->getContent()) {
819
                    case 'Image':
820
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

820
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
821
822
                    case 'Form':
823
                        return new Form($document, $header, $content, $config);
824
                }
825
826
                return new self($document, $header, $content, $config);
827
828
            case 'Pages':
829
                return new Pages($document, $header, $content, $config);
830
831
            case 'Page':
832
                return new Page($document, $header, $content, $config);
833
834
            case 'Encoding':
835
                return new Encoding($document, $header, $content, $config);
836
837
            case 'Font':
838
                $subtype = $header->get('Subtype')->getContent();
839
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
840
841
                if (class_exists($classname)) {
842
                    return new $classname($document, $header, $content, $config);
843
                }
844
845
                return new Font($document, $header, $content, $config);
846
847
            default:
848
                return new self($document, $header, $content, $config);
849
        }
850
    }
851
852
    /**
853
     * Returns unique id identifying the object.
854
     */
855
    protected function getUniqueId(): string
856
    {
857
        return spl_object_hash($this);
858
    }
859
}
860