Test Failed
Push — master ( 5c8274...ce434c )
by Konrad
01:59
created

PDFObject::getCommandsText()   F

Complexity

Conditions 27
Paths 65

Size

Total Lines 190
Code Lines 128

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 93
CRAP Score 27.0967

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 27
eloc 128
c 1
b 0
f 0
nc 65
nop 2
dl 0
loc 190
ccs 93
cts 98
cp 0.949
crap 27.0967
rs 3.3333

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 *
9
 * @date    2017-01-03
10
 *
11
 * @license LGPLv3
12
 *
13
 * @url     <https://github.com/smalot/pdfparser>
14
 *
15
 *  PdfParser is a pdf library written in PHP, extraction oriented.
16
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
17
 *
18
 *  This program is free software: you can redistribute it and/or modify
19
 *  it under the terms of the GNU Lesser General Public License as published by
20
 *  the Free Software Foundation, either version 3 of the License, or
21
 *  (at your option) any later version.
22
 *
23
 *  This program is distributed in the hope that it will be useful,
24
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
25
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
26
 *  GNU Lesser General Public License for more details.
27
 *
28
 *  You should have received a copy of the GNU Lesser General Public License
29
 *  along with this program.
30
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
31
 */
32
33
namespace Smalot\PdfParser;
34
35
use Smalot\PdfParser\XObject\Form;
36
use Smalot\PdfParser\XObject\Image;
37
38
/**
39
 * Class PDFObject
40
 */
41
class PDFObject
42
{
43
    public const TYPE = 't';
44
45
    public const OPERATOR = 'o';
46
47
    public const COMMAND = 'c';
48
49
    /**
50
     * The recursion stack.
51
     *
52
     * @var array
53
     */
54
    public static $recursionStack = [];
55
56
    /**
57
     * @var Document
58
     */
59
    protected $document;
60
61
    /**
62
     * @var Header
63
     */
64
    protected $header;
65
66
    /**
67
     * @var string
68
     */
69
    protected $content;
70
71
    /**
72
     * @var Config
73
     */
74
    protected $config;
75
76 62
    public function __construct(
77
        Document $document,
78
        Header $header = null,
79
        string $content = null,
80
        Config $config = null
81
    ) {
82 62
        $this->document = $document;
83 62
        $this->header = $header ?? new Header();
84 62
        $this->content = $content;
85 62
        $this->config = $config;
86 62
    }
87
88 49
    public function init()
89
    {
90 49
    }
91
92 3
    public function getDocument(): Document
93
    {
94 3
        return $this->document;
95
    }
96
97 49
    public function getHeader(): ?Header
98
    {
99 49
        return $this->header;
100
    }
101
102 3
    public function getConfig(): ?Config
103
    {
104 3
        return $this->config;
105
    }
106
107
    /**
108
     * @return Element|PDFObject|Header
109
     */
110 50
    public function get(string $name)
111
    {
112 50
        return $this->header->get($name);
113
    }
114
115 47
    public function has(string $name): bool
116
    {
117 47
        return $this->header->has($name);
118
    }
119
120 3
    public function getDetails(bool $deep = true): array
121
    {
122 3
        return $this->header->getDetails($deep);
123
    }
124
125 38
    public function getContent(): ?string
126
    {
127 38
        return $this->content;
128
    }
129
130 32
    public function cleanContent(string $content, string $char = 'X')
131
    {
132 32
        $char = $char[0];
133 32
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
134
135
        // Remove image bloc with binary content
136 32
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 32
        foreach ($matches[0] as $part) {
138
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean content in square brackets [.....]
142 32
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

142
        /** @scrutinizer ignore-call */ 
143
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
143 32
        foreach ($matches[1] as $part) {
144 22
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
145
        }
146
147
        // Clean content in round brackets (.....)
148 32
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
149 32
        foreach ($matches[1] as $part) {
150 21
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
151
        }
152
153
        // Clean structure
154 32
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

154
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
155 32
            $content = '';
156 32
            $level = 0;
157 32
            foreach ($parts as $part) {
158 32
                if ('<' == $part) {
159 18
                    ++$level;
160
                }
161
162 32
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
163
164 32
                if ('>' == $part) {
165 18
                    --$level;
166
                }
167
            }
168
        }
169
170
        // Clean BDC and EMC markup
171 32
        preg_match_all(
172 32
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
173
            $content,
174
            $matches,
175 32
            \PREG_OFFSET_CAPTURE
176
        );
177 32
        foreach ($matches[1] as $part) {
178 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
179
        }
180
181 32
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
182 32
        foreach ($matches[1] as $part) {
183 11
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
184
        }
185
186 32
        return $content;
187
    }
188
189 31
    public function getSectionsText(?string $content): array
190
    {
191 31
        $sections = [];
192 31
        $content = ' '.$content.' ';
193 31
        $textCleaned = $this->cleanContent($content, '_');
194
195
        // Extract text blocks.
196 31
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

196
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
197 29
            foreach ($matches[2] as $pos => $part) {
198 29
                $text = $part[0];
199 29
                if ('' === $text) {
200
                    continue;
201
                }
202 29
                $offset = $part[1];
203 29
                $section = substr($content, $offset, \strlen($text));
204
205
                // Removes BDC and EMC markup.
206 29
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
207
208
                // Add Q and q flags if detected around BT/ET.
209
                // @see: https://github.com/smalot/pdfparser/issues/387
210 29
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
211
212 29
                $sections[] = $section;
213
            }
214
        }
215
216
        // Extract 'do' commands.
217 31
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
218 4
            foreach ($matches[1] as $part) {
219 4
                $text = $part[0];
220 4
                $offset = $part[1];
221 4
                $section = substr($content, $offset, \strlen($text));
222
223 4
                $sections[] = $section;
224
            }
225
        }
226
227 31
        return $sections;
228
    }
229
230 20
    private function getDefaultFont(Page $page = null): Font
231
    {
232 20
        $fonts = [];
233 20
        if (null !== $page) {
234 19
            $fonts = $page->getFonts();
235
        }
236
237 20
        $firstFont = $this->document->getFirstFont();
238 20
        if (null !== $firstFont) {
239 18
            $fonts[] = $firstFont;
240
        }
241
242 20
        if (\count($fonts) > 0) {
243 18
            return reset($fonts);
244
        }
245
246 2
        return new Font($this->document, null, null, $this->config);
247
    }
248
249
    /**
250
     * @param array<int,array<string,string|bool>> $command
251
     */
252 20
    private function getTJUsingFontFallback(Font $font, array $command, Page $page = null): string
253
    {
254 20
        $orig_text = $font->decodeText($command);
255 20
        $text = $orig_text;
256 20
257 20
        // If we make this a Config option, we can add a check if it's
258
        // enabled here.
259 20
        if (null !== $page) {
260 20
            $font_ids = array_keys($page->getFonts());
261
262 20
            // If the decoded text contains UTF-8 control characters
263
            // then the font page being used is probably the wrong one.
264 20
            // Loop through the rest of the fonts to see if we can get
265 18
            // a good decode.
266 18
            while (preg_match('/[\x00-\x1f\x7f]/u', $text) || false !== strpos(bin2hex($text), '00')) {
267 18
                // If we're out of font IDs, then give up and use the
268
                // original string
269 18
                if (0 == \count($font_ids)) {
270 18
                    return $orig_text;
271 18
                }
272 1
273 1
                // Try the next font ID
274
                $font = $page->getFont(array_shift($font_ids));
275 1
                $text = $font->decodeText($command);
276
            }
277
        }
278 18
279 5
        return $text;
280
    }
281
282 18
    /**
283 15
     * @throws \Exception
284 15
     */
285 15
    public function getText(Page $page = null): string
286 15
    {
287 15
        $result = '';
288
        $sections = $this->getSectionsText($this->content);
289
        $current_font = $this->getDefaultFont($page);
290 11
        $clipped_font = $current_font;
291 15
292 15
        $current_position_td = ['x' => false, 'y' => false];
293
        $current_position_tm = ['x' => false, 'y' => false];
294 12
295
        self::$recursionStack[] = $this->getUniqueId();
296 15
297 15
        foreach ($sections as $section) {
298
            $commands = $this->getCommandsText($section);
299
            $reverse_text = false;
300 18
            $text = '';
301 3
302 3
            foreach ($commands as $command) {
303 3
                switch ($command[self::OPERATOR]) {
304 3
                    case 'BMC':
305 3
                        if ('ReversedChars' == $command[self::COMMAND]) {
306
                            $reverse_text = true;
307
                        }
308
                        break;
309 3
310
                        // set character spacing
311 18
                    case 'Tc':
312 18
                        break;
313 18
314 18
                        // move text current point
315 18
                    case 'Td':
316
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
317
                        $y = array_pop($args);
318
                        $x = array_pop($args);
319
                        if (((float) $x <= 0)
320
                            || (false !== $current_position_td['y'] && (float) $y < (float) $current_position_td['y'])
321 18
                        ) {
322 16
                            // vertical offset
323
                            $text .= "\n";
324
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float)
325 18
                            $current_position_td['x']
326
                        ) {
327 18
                            $text .= $this->config->getHorizontalOffset();
328
                        }
329 5
                        $current_position_td = ['x' => $x, 'y' => $y];
330 5
                        break;
331
332 18
                        // move text current point and set leading
333
                    case 'TD':
334 6
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
335 6
                        $y = array_pop($args);
336
                        $x = array_pop($args);
337 18
                        if ((float) $y < 0) {
338 18
                            $text .= "\n";
339 13
                        } elseif ((float) $x <= 0) {
340
                            $text .= ' ';
341 17
                        }
342 18
                        break;
343 18
344 18
                    case 'Tf':
345
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
346
                        $id = trim($id, '/');
347 15
                        if (null !== $page) {
348 1
                            $new_font = $page->getFont($id);
349 1
                            // If an invalid font ID is given, do not update the font.
350
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
351 15
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
352 14
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
353 14
                            // But we want to make sure that malformed PDFs do not simply crash.
354 14
                            if (null !== $new_font) {
355 14
                                $current_font = $new_font;
356 14
                            }
357 14
                        }
358 12
                        break;
359
360
                    case 'Q':
361 14
                        // Use clip: restore font.
362 14
                        $current_font = $clipped_font;
363 14
                        break;
364 10
365
                    case 'q':
366
                        // Use clip: save font.
367 14
                        $clipped_font = $current_font;
368 14
                        break;
369
370
                    case "'":
371 12
                    case 'Tj':
372
                        $command[self::COMMAND] = [$command];
373
                        // no break
374
                    case 'TJ':
375 12
                        $text .= $this->getTJUsingFontFallback(
376 4
                            $current_font,
377
                            $command[self::COMMAND],
378
                            $page
379 12
                        );
380
                        break;
381
382
                        // set leading
383
                    case 'TL':
384 12
                        $text .= ' ';
385 4
                        break;
386 4
387
                    case 'Tm':
388 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
389
                        $y = array_pop($args);
390
                        $x = array_pop($args);
391 11
                        if (false !== $current_position_tm['x']) {
392 4
                            $delta = abs((float) $x - (float) $current_position_tm['x']);
393 4
                            if ($delta > 10) {
394 4
                                $text .= "\t";
395 4
                            }
396
                        }
397
                        if (false !== $current_position_tm['y']) {
398 4
                            $delta = abs((float) $y - (float) $current_position_tm['y']);
399
                            if ($delta > 10) {
400 4
                                $text .= "\n";
401
                            }
402
                        }
403 4
                        $current_position_tm = ['x' => $x, 'y' => $y];
404
                        break;
405 9
406 8
                        // set super/subscripting text rise
407 2
                    case 'Ts':
408
                        break;
409 8
410
                        // set word spacing
411
                    case 'Tw':
412 8
                        break;
413
414
                        // set horizontal scaling
415 8
                    case 'Tz':
416 3
                        $text .= "\n";
417
                        break;
418 8
419 3
                        // move to start of next line
420
                    case 'T*':
421 7
                        $text .= "\n";
422
                        break;
423
424 7
                    case 'Da':
425 7
                        break;
426
427
                    case 'Do':
428 7
                        if (null !== $page) {
429 7
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
430 1
                            $id = trim(array_pop($args), '/ ');
431
                            $xobject = $page->getXObject($id);
432 6
433
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
434
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
435 6
                                // Not a circular reference.
436 6
                                $text .= $xobject->getText($page);
437
                            }
438
                        }
439
                        break;
440
441
                    case 'rg':
442
                    case 'RG':
443
                        break;
444
445 18
                    case 're':
446 1
                        break;
447 1
448
                    case 'co':
449
                        break;
450 18
451
                    case 'cs':
452
                        break;
453 20
454
                    case 'gs':
455
                        break;
456
457
                    case 'en':
458
                        break;
459 6
460
                    case 'sc':
461 6
                    case 'SC':
462 6
                        break;
463 6
464
                    case 'g':
465 6
                    case 'G':
466 6
                        break;
467
468 6
                    case 'V':
469 6
                        break;
470
471 6
                    case 'vo':
472 3
                    case 'Vo':
473
                        break;
474
475 6
                    default:
476 6
                }
477
            }
478
479 6
            // Fix Hebrew and other reverse text oriented languages.
480
            // @see: https://github.com/smalot/pdfparser/issues/398
481
            if ($reverse_text) {
482 6
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

482
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
483 6
                $text = implode('', array_reverse($chars));
484 6
            }
485 6
486 6
            $result .= $text;
487
        }
488 6
489
        return $result.' ';
490 6
    }
491 6
492 5
    /**
493
     * @throws \Exception
494 6
     */
495 6
    public function getTextArray(Page $page = null): array
496 6
    {
497 6
        $text = [];
498
        $sections = $this->getSectionsText($this->content);
499
        $current_font = new Font($this->document, null, null, $this->config);
500 5
501 4
        foreach ($sections as $section) {
502
            $commands = $this->getCommandsText($section);
503 5
504 4
            foreach ($commands as $command) {
505
                switch ($command[self::OPERATOR]) {
506
                    // set character spacing
507 5
                    case 'Tc':
508
                        break;
509
510
                        // move text current point
511 5
                    case 'Td':
512 2
                        break;
513
514
                        // move text current point and set leading
515 5
                    case 'TD':
516
                        break;
517
518
                    case 'Tf':
519
                        if (null !== $page) {
520 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
521
                            $id = trim($id, '/');
522 4
                            $current_font = $page->getFont($id);
523
                        }
524 4
                        break;
525
526
                    case "'":
527 4
                    case 'Tj':
528
                        $command[self::COMMAND] = [$command];
529
                        // no break
530
                    case 'TJ':
531
                        $text[] = $this->getTJUsingFontFallback(
532
                            $current_font,
533
                            $command[self::COMMAND],
534
                            $page
535
                        );
536
                        break;
537 4
538 4
                        // set leading
539 2
                    case 'TL':
540
                        break;
541 4
542
                    case 'Tm':
543
                        break;
544 4
545
                        // set super/subscripting text rise
546
                    case 'Ts':
547 4
                        break;
548
549
                        // set word spacing
550 4
                    case 'Tw':
551 1
                        break;
552
553 4
                        // set horizontal scaling
554
                    case 'Tz':
555
                        // $text .= "\n";
556 4
                        break;
557 4
558
                        // move to start of next line
559
                    case 'T*':
560 4
                        // $text .= "\n";
561 4
                        break;
562 2
563
                    case 'Da':
564 2
                        break;
565
566
                    case 'Do':
567 2
                        if (null !== $page) {
568 2
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
569
                            $id = trim(array_pop($args), '/ ');
570
                            if ($xobject = $page->getXObject($id)) {
571
                                $text[] = $xobject->getText($page);
572
                            }
573
                        }
574
                        break;
575
576 6
                    case 'rg':
577
                    case 'RG':
578
                        break;
579 29
580
                    case 're':
581 29
                        break;
582
583 29
                    case 'co':
584 29
                        break;
585 29
586
                    case 'cs':
587 29
                        break;
588 29
589 29
                    case 'gs':
590
                        break;
591 29
592 29
                    case 'en':
593 29
                        break;
594 29
595 29
                    case 'sc':
596 29
                    case 'SC':
597
                        break;
598
599
                    case 'g':
600 29
                    case 'G':
601 29
                        break;
602 29
603 11
                    case 'V':
604 11
                        break;
605 11
606
                    case 'vo':
607
                    case 'Vo':
608
                        break;
609 11
610 11
                    default:
611 11
                }
612
            }
613 29
        }
614
615 29
        return $text;
616 29
    }
617
618 25
    public function getCommandsText(string $text_part, int &$offset = 0): array
619 25
    {
620 25
        $commands = $matches = [];
621
622 25
        while ($offset < \strlen($text_part)) {
623
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
624 25
            $char = $text_part[$offset];
625 25
626 25
            $operator = '';
627
            $type = '';
628
            $command = false;
629 25
630 25
            switch ($char) {
631
                case '/':
632 25
                    $type = $char;
633
                    if (preg_match(
634 29
                        '/\G\/([A-Z0-9\._,\+-]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
635 29
                        $text_part,
636
                        $matches,
637 14
                        0,
638 14
                        $offset
639 14
                    )
640 14
                    ) {
641 14
                        $operator = $matches[2];
642 14
                        $command = $matches[1];
643
                        $offset += \strlen($matches[0]);
644
                    } elseif (preg_match(
645 14
                        '/\G\/([A-Z0-9\._,\+-]+)\s+([A-Z]+)\s*/si',
646 9
                        $text_part,
647 9
                        $matches,
648
                        0,
649 14
                        $offset
650
                    )
651 29
                    ) {
652 29
                        $operator = $matches[2];
653 22
                        $command = $matches[1];
654 22
                        $offset += \strlen($matches[0]);
655 22
                    }
656 22
                    break;
657 22
658 22
                case '[':
659 22
                case ']':
660
                    // array object
661
                    $type = $char;
662 22
                    if ('[' == $char) {
663 22
                        ++$offset;
664 22
                        // get elements
665
                        $command = $this->getCommandsText($text_part, $offset);
666
667 16
                        if (preg_match(
668 16
                            '/\G\s*[A-Z]{1,2}\s*/si',
669
                            $text_part,
670 22
                            $matches,
671
                            0,
672
                            $offset
673
                        )
674
                        ) {
675 22
                            $operator = trim($matches[0]);
676
                            $offset += \strlen($matches[0]);
677 22
                        }
678 22
                    } else {
679
                        ++$offset;
680 22
                        break;
681
                    }
682 22
                    break;
683 22
684
                case '<':
685 22
                case '>':
686 18
                    // array object
687 18
                    $type = $char;
688
                    ++$offset;
689
                    if ('<' == $char) {
690 22
                        $strpos = strpos($text_part, '>', $offset);
691
                        $command = substr($text_part, $offset, $strpos - $offset);
692
                        $offset = $strpos + 1;
693 29
                    }
694 1
695 29
                    if (preg_match(
696 29
                        '/\G\s*[A-Z]{1,2}\s*/si',
697 29
                        $text_part,
698
                        $matches,
699
                        0,
700
                        $offset
701 29
                    )
702 29
                    ) {
703 29
                        $operator = trim($matches[0]);
704 24
                        $offset += \strlen($matches[0]);
705 22
                    }
706 22
                    break;
707 22
708 17
                case '(':
709 17
                case ')':
710 17
                    ++$offset;
711 17
                    $type = $char;
712 17
                    $strpos = $offset;
713
                    if ('(' == $char) {
714
                        $open_bracket = 1;
715
                        while ($open_bracket > 0) {
716 29
                            if (!isset($text_part[$strpos])) {
717 29
                                break;
718 29
                            }
719 29
                            $ch = $text_part[$strpos];
720 29
                            switch ($ch) {
721
                                case '\\':
722
                                    // REVERSE SOLIDUS (5Ch) (Backslash)
723 25
                                    // skip next character
724
                                    ++$strpos;
725
                                    break;
726
727 29
                                case '(':
728
                                    // LEFT PARENHESIS (28h)
729
                                    ++$open_bracket;
730 42
                                    break;
731
732
                                case ')':
733
                                    // RIGHT PARENTHESIS (29h)
734
                                    --$open_bracket;
735
                                    break;
736 42
                            }
737 42
                            ++$strpos;
738 8
                        }
739 8
                        $command = substr($text_part, $offset, $strpos - $offset - 1);
740 3
                        $offset = $strpos;
741
742 6
                        if (preg_match(
743 6
                            '/\G\s*([A-Z\']{1,2})\s*/si',
744
                            $text_part,
745
                            $matches,
746
                            0,
747
                            $offset
748 42
                        )
749 41
                        ) {
750
                            $operator = $matches[1];
751 42
                            $offset += \strlen($matches[0]);
752 41
                        }
753
                    }
754 42
                    break;
755 6
756
                default:
757 42
                    if ('ET' == substr($text_part, $offset, 2)) {
758 41
                        break;
759 41
                    } elseif (preg_match(
760
                        '/\G\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
761 41
                        $text_part,
762 41
                        $matches,
763
                        0,
764
                        $offset
765
                    )
766
                    ) {
767
                        $operator = trim($matches['id']);
768 42
                        $command = trim($matches['data']);
769
                        $offset += \strlen($matches[0]);
770
                    } elseif (preg_match(
771
                        '/\G\s*([0-9\.\-]+\s*?)+\s*/si',
772
                        $text_part,
773
                        $matches,
774
                        0,
775 20
                        $offset
776
                    )
777 20
                    ) {
778
                        $type = 'n';
779
                        $command = trim($matches[0]);
780
                        $offset += \strlen($matches[0]);
781
                    } elseif (preg_match(
782
                        '/\G\s*([A-Z\*]+)\s*/si',
783
                        $text_part,
784
                        $matches,
785
                        0,
786
                        $offset
787
                    )
788
                    ) {
789
                        $type = '';
790
                        $operator = $matches[1];
791
                        $command = '';
792
                        $offset += \strlen($matches[0]);
793
                    }
794
            }
795
796
            if (false !== $command) {
797
                $commands[] = [
798
                    self::TYPE => $type,
799
                    self::OPERATOR => $operator,
800
                    self::COMMAND => $command,
801
                ];
802
            } else {
803
                break;
804
            }
805
        }
806
807
        return $commands;
808
    }
809
810
    public static function factory(
811
        Document $document,
812
        Header $header,
813
        ?string $content,
814
        Config $config = null
815
    ): self {
816
        switch ($header->get('Type')->getContent()) {
817
            case 'XObject':
818
                switch ($header->get('Subtype')->getContent()) {
819
                    case 'Image':
820
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

820
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
821
822
                    case 'Form':
823
                        return new Form($document, $header, $content, $config);
824
                }
825
826
                return new self($document, $header, $content, $config);
827
828
            case 'Pages':
829
                return new Pages($document, $header, $content, $config);
830
831
            case 'Page':
832
                return new Page($document, $header, $content, $config);
833
834
            case 'Encoding':
835
                return new Encoding($document, $header, $content, $config);
836
837
            case 'Font':
838
                $subtype = $header->get('Subtype')->getContent();
839
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
840
841
                if (class_exists($classname)) {
842
                    return new $classname($document, $header, $content, $config);
843
                }
844
845
                return new Font($document, $header, $content, $config);
846
847
            default:
848
                return new self($document, $header, $content, $config);
849
        }
850
    }
851
852
    /**
853
     * Returns unique id identifying the object.
854
     */
855
    protected function getUniqueId(): string
856
    {
857
        return spl_object_hash($this);
858
    }
859
}
860