Passed
Pull Request — master (#615)
by Jeffrey
02:30
created

PDFObject::getTextArray()   D

Complexity

Conditions 35
Paths 85

Size

Total Lines 118
Code Lines 73

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 56
CRAP Score 52.6208

Importance

Changes 0
Metric Value
cc 35
eloc 73
c 0
b 0
f 0
nc 85
nop 1
dl 0
loc 118
ccs 56
cts 74
cp 0.7568
crap 52.6208
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 *
9
 * @date    2017-01-03
10
 *
11
 * @license LGPLv3
12
 *
13
 * @url     <https://github.com/smalot/pdfparser>
14
 *
15
 *  PdfParser is a pdf library written in PHP, extraction oriented.
16
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
17
 *
18
 *  This program is free software: you can redistribute it and/or modify
19
 *  it under the terms of the GNU Lesser General Public License as published by
20
 *  the Free Software Foundation, either version 3 of the License, or
21
 *  (at your option) any later version.
22
 *
23
 *  This program is distributed in the hope that it will be useful,
24
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
25
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
26
 *  GNU Lesser General Public License for more details.
27
 *
28
 *  You should have received a copy of the GNU Lesser General Public License
29
 *  along with this program.
30
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
31
 */
32
33
namespace Smalot\PdfParser;
34
35
use Smalot\PdfParser\XObject\Form;
36
use Smalot\PdfParser\XObject\Image;
37
38
/**
39
 * Class PDFObject
40
 */
41
class PDFObject
42
{
43
    public const TYPE = 't';
44
45
    public const OPERATOR = 'o';
46
47
    public const COMMAND = 'c';
48
49
    /**
50
     * The recursion stack.
51
     *
52
     * @var array
53
     */
54
    public static $recursionStack = [];
55
56
    /**
57
     * @var Document
58
     */
59
    protected $document;
60
61
    /**
62
     * @var Header
63
     */
64
    protected $header;
65
66
    /**
67
     * @var string
68
     */
69
    protected $content;
70
71
    /**
72
     * @var Config
73
     */
74
    protected $config;
75
76 65
    public function __construct(
77
        Document $document,
78
        Header $header = null,
79
        string $content = null,
80
        Config $config = null
81
    ) {
82 65
        $this->document = $document;
83 65
        $this->header = $header ?? new Header();
84 65
        $this->content = $content;
85 65
        $this->config = $config;
86 65
    }
87
88 51
    public function init()
89
    {
90 51
    }
91
92 3
    public function getDocument(): Document
93
    {
94 3
        return $this->document;
95
    }
96
97 51
    public function getHeader(): ?Header
98
    {
99 51
        return $this->header;
100
    }
101
102 3
    public function getConfig(): ?Config
103
    {
104 3
        return $this->config;
105
    }
106
107
    /**
108
     * @return Element|PDFObject|Header
109
     */
110 53
    public function get(string $name)
111
    {
112 53
        return $this->header->get($name);
113
    }
114
115 50
    public function has(string $name): bool
116
    {
117 50
        return $this->header->has($name);
118
    }
119
120 3
    public function getDetails(bool $deep = true): array
121
    {
122 3
        return $this->header->getDetails($deep);
123
    }
124
125 40
    public function getContent(): ?string
126
    {
127 40
        return $this->content;
128
    }
129
130 32
    public function cleanContent(string $content, string $char = 'X')
131
    {
132 32
        $char = $char[0];
133 32
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
134
135
        // Remove image bloc with binary content
136 32
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 32
        foreach ($matches[0] as $part) {
138
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean content in square brackets [.....]
142 32
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

142
        /** @scrutinizer ignore-call */ 
143
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
143 32
        foreach ($matches[1] as $part) {
144 22
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
145
        }
146
147
        // Clean content in round brackets (.....)
148 32
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
149 32
        foreach ($matches[1] as $part) {
150 21
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
151
        }
152
153
        // Clean structure
154 32
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

154
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
155 32
            $content = '';
156 32
            $level = 0;
157 32
            foreach ($parts as $part) {
158 32
                if ('<' == $part) {
159 18
                    ++$level;
160
                }
161
162 32
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
163
164 32
                if ('>' == $part) {
165 18
                    --$level;
166
                }
167
            }
168
        }
169
170
        // Clean BDC and EMC markup
171 32
        preg_match_all(
172 32
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
173
            $content,
174
            $matches,
175 32
            \PREG_OFFSET_CAPTURE
176
        );
177 32
        foreach ($matches[1] as $part) {
178 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
179
        }
180
181 32
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
182 32
        foreach ($matches[1] as $part) {
183 11
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
184
        }
185
186 32
        return $content;
187
    }
188
189 31
    public function getSectionsText(?string $content): array
190
    {
191 31
        $sections = [];
192 31
        $content = ' '.$content.' ';
193 31
        $textCleaned = $this->cleanContent($content, '_');
194
195
        // Extract text blocks.
196 31
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

196
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
197 29
            foreach ($matches[2] as $pos => $part) {
198 29
                $text = $part[0];
199 29
                if ('' === $text) {
200
                    continue;
201
                }
202 29
                $offset = $part[1];
203 29
                $section = substr($content, $offset, \strlen($text));
204
205
                // Removes BDC and EMC markup.
206 29
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
207
208
                // Add Q and q flags if detected around BT/ET.
209
                // @see: https://github.com/smalot/pdfparser/issues/387
210 29
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
211
212 29
                $sections[] = $section;
213
            }
214
        }
215
216
        // Extract 'do' commands.
217 31
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
218 4
            foreach ($matches[1] as $part) {
219 4
                $text = $part[0];
220 4
                $offset = $part[1];
221 4
                $section = substr($content, $offset, \strlen($text));
222
223 4
                $sections[] = $section;
224
            }
225
        }
226
227 31
        return $sections;
228
    }
229
230 20
    private function getDefaultFont(Page $page = null): Font
231
    {
232 20
        $fonts = [];
233 20
        if (null !== $page) {
234 19
            $fonts = $page->getFonts();
235
        }
236
237 20
        $firstFont = $this->document->getFirstFont();
238 20
        if (null !== $firstFont) {
239 18
            $fonts[] = $firstFont;
240
        }
241
242 20
        if (\count($fonts) > 0) {
243 18
            return reset($fonts);
244
        }
245
246 2
        return new Font($this->document, null, null, $this->config);
247
    }
248
249
    /**
250
     * @throws \Exception
251
     */
252 20
    public function getText(Page $page = null): string
253
    {
254 20
        $result = '';
255 20
        $sections = $this->getSectionsText($this->content);
256 20
        $current_font = $this->getDefaultFont($page);
257 20
        $clipped_font = $current_font;
258
259 20
        $current_position_td = ['x' => false, 'y' => false];
260 20
        $current_position_tm = ['x' => false, 'y' => false];
261
262 20
        self::$recursionStack[] = $this->getUniqueId();
263
264 20
        foreach ($sections as $section) {
265 18
            $commands = $this->getCommandsText($section);
266 18
            $reverse_text = false;
267 18
            $text = '';
268
269 18
            foreach ($commands as $command) {
270 18
                switch ($command[self::OPERATOR]) {
271 18
                    case 'BMC':
272 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
273 1
                            $reverse_text = true;
274
                        }
275 1
                        break;
276
277
                        // set character spacing
278 18
                    case 'Tc':
279 5
                        break;
280
281
                        // move text current point
282 18
                    case 'Td':
283 15
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
284 15
                        $y = array_pop($args);
285 15
                        $x = array_pop($args);
286 15
                        if (((float) $x <= 0)
287 15
                            || (false !== $current_position_td['y'] && (float) $y < (float) $current_position_td['y'])
288
                        ) {
289
                            // vertical offset
290 11
                            $text .= "\n";
291 15
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float)
292 15
                            $current_position_td['x']
293
                        ) {
294 12
                            $text .= $this->config->getHorizontalOffset();
295
                        }
296 15
                        $current_position_td = ['x' => $x, 'y' => $y];
297 15
                        break;
298
299
                        // move text current point and set leading
300 18
                    case 'TD':
301 3
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
302 3
                        $y = array_pop($args);
303 3
                        $x = array_pop($args);
304 3
                        if ((float) $y < 0) {
305 3
                            $text .= "\n";
306
                        } elseif ((float) $x <= 0) {
307
                            $text .= ' ';
308
                        }
309 3
                        break;
310
311 18
                    case 'Tf':
312 18
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
313 18
                        $id = trim($id, '/');
314 18
                        if (null !== $page) {
315 18
                            $new_font = $page->getFont($id);
316
                            // If an invalid font ID is given, do not update the font.
317
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
318
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
319
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
320
                            // But we want to make sure that malformed PDFs do not simply crash.
321 18
                            if (null !== $new_font) {
322 16
                                $current_font = $new_font;
323
                            }
324
                        }
325 18
                        break;
326
327 18
                    case 'Q':
328
                        // Use clip: restore font.
329 5
                        $current_font = $clipped_font;
330 5
                        break;
331
332 18
                    case 'q':
333
                        // Use clip: save font.
334 6
                        $clipped_font = $current_font;
335 6
                        break;
336
337 18
                    case "'":
338 18
                    case 'Tj':
339 13
                        $command[self::COMMAND] = [$command];
340
                        // no break
341 17
                    case 'TJ':
342 18
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
343 18
                        $text .= $sub_text;
344 18
                        break;
345
346
                        // set leading
347 15
                    case 'TL':
348 1
                        $text .= ' ';
349 1
                        break;
350
351 15
                    case 'Tm':
352 14
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
353 14
                        $y = array_pop($args);
354 14
                        $x = array_pop($args);
355 14
                        if (false !== $current_position_tm['x']) {
356 14
                            $delta = abs((float) $x - (float) $current_position_tm['x']);
357 14
                            if ($delta > 10) {
358 12
                                $text .= "\t";
359
                            }
360
                        }
361 14
                        if (false !== $current_position_tm['y']) {
362 14
                            $delta = abs((float) $y - (float) $current_position_tm['y']);
363 14
                            if ($delta > 10) {
364 10
                                $text .= "\n";
365
                            }
366
                        }
367 14
                        $current_position_tm = ['x' => $x, 'y' => $y];
368 14
                        break;
369
370
                        // set super/subscripting text rise
371 12
                    case 'Ts':
372
                        break;
373
374
                        // set word spacing
375 12
                    case 'Tw':
376 4
                        break;
377
378
                        // set horizontal scaling
379 12
                    case 'Tz':
380
                        $text .= "\n";
381
                        break;
382
383
                        // move to start of next line
384 12
                    case 'T*':
385 4
                        $text .= "\n";
386 4
                        break;
387
388 11
                    case 'Da':
389
                        break;
390
391 11
                    case 'Do':
392 4
                        if (null !== $page) {
393 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
394 4
                            $id = trim(array_pop($args), '/ ');
395 4
                            $xobject = $page->getXObject($id);
396
397
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
398 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
399
                                // Not a circular reference.
400 4
                                $text .= $xobject->getText($page);
401
                            }
402
                        }
403 4
                        break;
404
405 9
                    case 'rg':
406 8
                    case 'RG':
407 2
                        break;
408
409 8
                    case 're':
410
                        break;
411
412 8
                    case 'co':
413
                        break;
414
415 8
                    case 'cs':
416 3
                        break;
417
418 8
                    case 'gs':
419 3
                        break;
420
421 7
                    case 'en':
422
                        break;
423
424 7
                    case 'sc':
425 7
                    case 'SC':
426
                        break;
427
428 7
                    case 'g':
429 7
                    case 'G':
430 1
                        break;
431
432 6
                    case 'V':
433
                        break;
434
435 6
                    case 'vo':
436 6
                    case 'Vo':
437
                        break;
438
439
                    default:
440
                }
441
            }
442
443
            // Fix Hebrew and other reverse text oriented languages.
444
            // @see: https://github.com/smalot/pdfparser/issues/398
445 18
            if ($reverse_text) {
446 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

446
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
447 1
                $text = implode('', array_reverse($chars));
448
            }
449
450 18
            $result .= $text;
451
        }
452
453 20
        return $result.' ';
454
    }
455
456
    /**
457
     * @throws \Exception
458
     */
459 6
    public function getTextArray(Page $page = null): array
460
    {
461 6
        $text = [];
462 6
        $sections = $this->getSectionsText($this->content);
463 6
        $current_font = new Font($this->document, null, null, $this->config);
464
465 6
        foreach ($sections as $section) {
466 6
            $commands = $this->getCommandsText($section);
467
468 6
            foreach ($commands as $command) {
469 6
                switch ($command[self::OPERATOR]) {
470
                    // set character spacing
471 6
                    case 'Tc':
472 3
                        break;
473
474
                        // move text current point
475 6
                    case 'Td':
476 6
                        break;
477
478
                        // move text current point and set leading
479 6
                    case 'TD':
480
                        break;
481
482 6
                    case 'Tf':
483 6
                        if (null !== $page) {
484 6
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
485 6
                            $id = trim($id, '/');
486 6
                            $current_font = $page->getFont($id);
487
                        }
488 6
                        break;
489
490 6
                    case "'":
491 6
                    case 'Tj':
492 5
                        $command[self::COMMAND] = [$command];
493
                        // no break
494 6
                    case 'TJ':
495 6
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
496 6
                        $text[] = $sub_text;
497 6
                        break;
498
499
                        // set leading
500 5
                    case 'TL':
501 4
                        break;
502
503 5
                    case 'Tm':
504 4
                        break;
505
506
                        // set super/subscripting text rise
507 5
                    case 'Ts':
508
                        break;
509
510
                        // set word spacing
511 5
                    case 'Tw':
512 2
                        break;
513
514
                        // set horizontal scaling
515 5
                    case 'Tz':
516
                        // $text .= "\n";
517
                        break;
518
519
                        // move to start of next line
520 5
                    case 'T*':
521
                        // $text .= "\n";
522 4
                        break;
523
524 4
                    case 'Da':
525
                        break;
526
527 4
                    case 'Do':
528
                        if (null !== $page) {
529
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
530
                            $id = trim(array_pop($args), '/ ');
531
                            if ($xobject = $page->getXObject($id)) {
532
                                $text[] = $xobject->getText($page);
533
                            }
534
                        }
535
                        break;
536
537 4
                    case 'rg':
538 4
                    case 'RG':
539 2
                        break;
540
541 4
                    case 're':
542
                        break;
543
544 4
                    case 'co':
545
                        break;
546
547 4
                    case 'cs':
548
                        break;
549
550 4
                    case 'gs':
551 1
                        break;
552
553 4
                    case 'en':
554
                        break;
555
556 4
                    case 'sc':
557 4
                    case 'SC':
558
                        break;
559
560 4
                    case 'g':
561 4
                    case 'G':
562 2
                        break;
563
564 2
                    case 'V':
565
                        break;
566
567 2
                    case 'vo':
568 2
                    case 'Vo':
569
                        break;
570
571
                    default:
572
                }
573
            }
574
        }
575
576 6
        return $text;
577
    }
578
579 29
    public function getCommandsText(string $text_part, int &$offset = 0): array
580
    {
581 29
        $commands = $matches = [];
582
583 29
        while ($offset < \strlen($text_part)) {
584 29
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
585 29
            $char = $text_part[$offset];
586
587 29
            $operator = '';
588 29
            $type = '';
589 29
            $command = false;
590
591 29
            switch ($char) {
592 29
                case '/':
593 29
                    $type = $char;
594 29
                    if (preg_match(
595 29
                        '/\G\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
596
                        $text_part,
597
                        $matches,
598 29
                        0,
599
                        $offset
600
                    )
601
                    ) {
602 29
                        $operator = $matches[2];
603 29
                        $command = $matches[1];
604 29
                        $offset += \strlen($matches[0]);
605 11
                    } elseif (preg_match(
606 11
                        '/\G\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
607
                        $text_part,
608
                        $matches,
609 11
                        0,
610
                        $offset
611
                    )
612
                    ) {
613 11
                        $operator = $matches[2];
614 11
                        $command = $matches[1];
615 11
                        $offset += \strlen($matches[0]);
616
                    }
617 29
                    break;
618
619 29
                case '[':
620 29
                case ']':
621
                    // array object
622 25
                    $type = $char;
623 25
                    if ('[' == $char) {
624 25
                        ++$offset;
625
                        // get elements
626 25
                        $command = $this->getCommandsText($text_part, $offset);
627
628 25
                        if (preg_match(
629 25
                            '/\G\s*[A-Z]{1,2}\s*/si',
630
                            $text_part,
631
                            $matches,
632 25
                            0,
633
                            $offset
634
                        )
635
                        ) {
636 25
                            $operator = trim($matches[0]);
637 25
                            $offset += \strlen($matches[0]);
638
                        }
639
                    } else {
640 25
                        ++$offset;
641 25
                        break;
642
                    }
643 25
                    break;
644
645 29
                case '<':
646 29
                case '>':
647
                    // array object
648 14
                    $type = $char;
649 14
                    ++$offset;
650 14
                    if ('<' == $char) {
651 14
                        $strpos = strpos($text_part, '>', $offset);
652 14
                        $command = substr($text_part, $offset, $strpos - $offset);
653 14
                        $offset = $strpos + 1;
654
                    }
655
656 14
                    if (preg_match(
657 14
                        '/\G\s*[A-Z]{1,2}\s*/si',
658
                        $text_part,
659
                        $matches,
660 14
                        0,
661
                        $offset
662
                    )
663
                    ) {
664 9
                        $operator = trim($matches[0]);
665 9
                        $offset += \strlen($matches[0]);
666
                    }
667 14
                    break;
668
669 29
                case '(':
670 29
                case ')':
671 22
                    ++$offset;
672 22
                    $type = $char;
673 22
                    $strpos = $offset;
674 22
                    if ('(' == $char) {
675 22
                        $open_bracket = 1;
676 22
                        while ($open_bracket > 0) {
677 22
                            if (!isset($text_part[$strpos])) {
678
                                break;
679
                            }
680 22
                            $ch = $text_part[$strpos];
681 22
                            switch ($ch) {
682 22
                                case '\\':
683
                                    // REVERSE SOLIDUS (5Ch) (Backslash)
684
                                    // skip next character
685 16
                                    ++$strpos;
686 16
                                    break;
687
688 22
                                case '(':
689
                                    // LEFT PARENHESIS (28h)
690
                                    ++$open_bracket;
691
                                    break;
692
693 22
                                case ')':
694
                                    // RIGHT PARENTHESIS (29h)
695 22
                                    --$open_bracket;
696 22
                                    break;
697
                            }
698 22
                            ++$strpos;
699
                        }
700 22
                        $command = substr($text_part, $offset, $strpos - $offset - 1);
701 22
                        $offset = $strpos;
702
703 22
                        if (preg_match(
704 22
                            '/\G\s*([A-Z\']{1,2})\s*/si',
705
                            $text_part,
706
                            $matches,
707 22
                            0,
708
                            $offset
709
                        )
710
                        ) {
711 18
                            $operator = $matches[1];
712 18
                            $offset += \strlen($matches[0]);
713
                        }
714
                    }
715 22
                    break;
716
717
                default:
718 29
                    if ('ET' == substr($text_part, $offset, 2)) {
719 1
                        break;
720 29
                    } elseif (preg_match(
721 29
                        '/\G\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
722
                        $text_part,
723
                        $matches,
724 29
                        0,
725
                        $offset
726
                    )
727
                    ) {
728 29
                        $operator = trim($matches['id']);
729 29
                        $command = trim($matches['data']);
730 29
                        $offset += \strlen($matches[0]);
731 24
                    } elseif (preg_match(
732 24
                        '/\G\s*([0-9\.\-]+\s*?)+\s*/si',
733
                        $text_part,
734
                        $matches,
735 24
                        0,
736
                        $offset
737
                    )
738
                    ) {
739 22
                        $type = 'n';
740 22
                        $command = trim($matches[0]);
741 22
                        $offset += \strlen($matches[0]);
742 17
                    } elseif (preg_match(
743 17
                        '/\G\s*([A-Z\*]+)\s*/si',
744
                        $text_part,
745
                        $matches,
746 17
                        0,
747
                        $offset
748
                    )
749
                    ) {
750 17
                        $type = '';
751 17
                        $operator = $matches[1];
752 17
                        $command = '';
753 17
                        $offset += \strlen($matches[0]);
754
                    }
755
            }
756
757 29
            if (false !== $command) {
758 29
                $commands[] = [
759 29
                    self::TYPE => $type,
760 29
                    self::OPERATOR => $operator,
761 29
                    self::COMMAND => $command,
762
                ];
763
            } else {
764 25
                break;
765
            }
766
        }
767
768 29
        return $commands;
769
    }
770
771 44
    public static function factory(
772
        Document $document,
773
        Header $header,
774
        ?string $content,
775
        Config $config = null
776
    ): self {
777 44
        switch ($header->get('Type')->getContent()) {
778 44
            case 'XObject':
779 8
                switch ($header->get('Subtype')->getContent()) {
780 8
                    case 'Image':
781 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

781
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
782
783 6
                    case 'Form':
784 6
                        return new Form($document, $header, $content, $config);
785
                }
786
787
                return new self($document, $header, $content, $config);
788
789 44
            case 'Pages':
790 43
                return new Pages($document, $header, $content, $config);
791
792 44
            case 'Page':
793 43
                return new Page($document, $header, $content, $config);
794
795 44
            case 'Encoding':
796 6
                return new Encoding($document, $header, $content, $config);
797
798 44
            case 'Font':
799 43
                $subtype = $header->get('Subtype')->getContent();
800 43
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
801
802 43
                if (class_exists($classname)) {
803 43
                    return new $classname($document, $header, $content, $config);
804
                }
805
806
                return new Font($document, $header, $content, $config);
807
808
            default:
809 44
                return new self($document, $header, $content, $config);
810
        }
811
    }
812
813
    /**
814
     * Returns unique id identifying the object.
815
     */
816 20
    protected function getUniqueId(): string
817
    {
818 20
        return spl_object_hash($this);
819
    }
820
}
821