Test Failed
Pull Request — master (#500)
by
unknown
01:54
created

PDFObject::getTextArray()   D

Complexity

Conditions 35
Paths 85

Size

Total Lines 118
Code Lines 73

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 55
CRAP Score 55.7453

Importance

Changes 0
Metric Value
cc 35
eloc 73
c 0
b 0
f 0
nc 85
nop 1
dl 0
loc 118
ccs 55
cts 74
cp 0.7432
crap 55.7453
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 55
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 55
        $this->document = $document;
81 55
        $this->header = null !== $header ? $header : new Header();
82 55
        $this->content = $content;
83 55
        $this->config = $config;
84 55
    }
85
86 42
    public function init()
87
    {
88 42
    }
89
90 2
    public function getDocument(): Document
91
    {
92 2
        return $this->document;
93
    }
94
95 42
    public function getHeader(): ?Header
96
    {
97 42
        return $this->header;
98
    }
99
100 2
    public function getConfig(): ?Config
101
    {
102 2
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 43
    public function get(string $name)
109
    {
110 43
        return $this->header->get($name);
111
    }
112
113 40
    public function has(string $name): bool
114
    {
115 40
        return $this->header->has($name);
116
    }
117
118 2
    public function getDetails(bool $deep = true): array
119
    {
120 2
        return $this->header->getDetails($deep);
121
    }
122
123 32
    public function getContent(): ?string
124
    {
125 32
        return $this->content;
126
    }
127
128 26
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 26
        $char = $char[0];
131 26
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 26
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 26
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 26
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 26
        foreach ($matches[1] as $part) {
142 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 26
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 26
        foreach ($matches[1] as $part) {
148 15
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 26
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 26
            $content = '';
154 26
            $level = 0;
155 26
            foreach ($parts as $part) {
156 26
                if ('<' == $part) {
157 14
                    ++$level;
158
                }
159
160 26
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 26
                if ('>' == $part) {
163 14
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 26
        preg_match_all(
170 26
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 26
            \PREG_OFFSET_CAPTURE
174
        );
175 26
        foreach ($matches[1] as $part) {
176 3
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 26
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 26
        foreach ($matches[1] as $part) {
181 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 26
        return $content;
185
    }
186
187 25
    public function getSectionsText(?string $content): array
188
    {
189 25
        $sections = [];
190 25
        $content = ' '.$content.' ';
191 25
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 25
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 23
            foreach ($matches[2] as $pos => $part) {
196 23
                $text = $part[0];
197 23
                if ('' === $text) {
198
                    continue;
199
                }
200 23
                $offset = $part[1];
201 23
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 23
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Q and q flags if detected around BT/ET.
207
                // @see: https://github.com/smalot/pdfparser/issues/387
208 23
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
209
210 23
                $sections[] = $section;
211
            }
212
        }
213
214
        // Extract 'do' commands.
215 25
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
216 4
            foreach ($matches[1] as $part) {
217 4
                $text = $part[0];
218 4
                $offset = $part[1];
219 4
                $section = substr($content, $offset, \strlen($text));
220
221 4
                $sections[] = $section;
222
            }
223
        }
224
225 25
        return $sections;
226
    }
227
228 15
    private function getDefaultFont(Page $page = null): Font
229
    {
230 15
        $fonts = [];
231 15
        if (null !== $page) {
232 14
            $fonts = $page->getFonts();
233
        }
234
235 15
        $firstFont = $this->document->getFirstFont();
236 15
        if (null !== $firstFont) {
237 13
            $fonts[] = $firstFont;
238
        }
239
240 15
        if (\count($fonts) > 0) {
241 13
            return reset($fonts);
242
        }
243
244 2
        return new Font($this->document, null, null, $this->config);
245
    }
246
247
    /**
248
     * @throws \Exception
249
     */
250 15
    public function getText(?Page $page = null): string
251
    {
252 15
        $result = '';
253 15
        $sections = $this->getSectionsText($this->content);
254 15
        $current_font = $this->getDefaultFont($page);
255 15
        $clipped_font = $current_font;
256
257 15
        $current_position_td = ['x' => false, 'y' => false];
258 15
        $current_position_tm = ['x' => false, 'y' => false];
259
260 15
        self::$recursionStack[] = $this->getUniqueId();
261
262 15
        foreach ($sections as $section) {
263 13
            $commands = $this->getCommandsText($section);
264 13
            $reverse_text = false;
265 13
            $text = '';
266
267 13
            foreach ($commands as $command) {
268 13
                switch ($command[self::OPERATOR]) {
269 13
                    case 'BMC':
270 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
271 1
                            $reverse_text = true;
272
                        }
273 1
                        break;
274
275
                    // set character spacing
276 13
                    case 'Tc':
277 2
                        break;
278
279
                    // move text current point
280 13
                    case 'Td':
281 10
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
282 10
                        $y = array_pop($args);
283 10
                        $x = array_pop($args);
284 10
                        if (((float) $x <= 0) ||
285 10
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
286
                        ) {
287
                            // vertical offset
288 6
                            $text .= "\n";
289 10
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
290 10
                                $current_position_td['x']
291
                            )
292
                        ) {
293
                            // horizontal offset
294 7
                            $text .= ' ';
295
                        }
296 10
                        $current_position_td = ['x' => $x, 'y' => $y];
297 10
                        break;
298
299
                    // move text current point and set leading
300 13
                    case 'TD':
301 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
302 1
                        $y = array_pop($args);
303 1
                        $x = array_pop($args);
304 1
                        if ((float) $y < 0) {
305 1
                            $text .= "\n";
306
                        } elseif ((float) $x <= 0) {
307
                            $text .= ' ';
308
                        }
309 1
                        break;
310
311 13
                    case 'Tf':
312 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
313 13
                        $id = trim($id, '/');
314 13
                        if (null !== $page) {
315 13
                            $new_font = $page->getFont($id);
316
                            // If an invalid font ID is given, do not update the font.
317
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
318
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
319
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
320
                            // But we want to make sure that malformed PDFs do not simply crash.
321 13
                            if (null !== $new_font) {
322 12
                                $current_font = $new_font;
323
                            }
324
                        }
325 13
                        break;
326
327 13
                    case 'Q':
328
                        // Use clip: restore font.
329 3
                        $current_font = $clipped_font;
330 3
                        break;
331
332 13
                    case 'q':
333
                        // Use clip: save font.
334 3
                        $clipped_font = $current_font;
335 3
                        break;
336
337 13
                    case "'":
338 13
                    case 'Tj':
339 8
                        $command[self::COMMAND] = [$command];
340
                        // no break
341 13
                    case 'TJ':
342 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
343 13
                        $text .= $sub_text;
344 13
                        break;
345
346
                    // set leading
347 11
                    case 'TL':
348 1
                        $text .= ' ';
349 1
                        break;
350
351 11
                    case 'Tm':
352 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
353 11
                        $y = array_pop($args);
354 11
                        $x = array_pop($args);
355 11
                        if (false !== $current_position_tm['x']) {
356 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
357 11
                            if ($delta > 10) {
358 9
                                $text .= "\t";
359
                            }
360
                        }
361 11
                        if (false !== $current_position_tm['y']) {
362 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
363 11
                            if ($delta > 10) {
364 7
                                $text .= "\n";
365
                            }
366
                        }
367 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
368 11
                        break;
369
370
                    // set super/subscripting text rise
371 8
                    case 'Ts':
372
                        break;
373
374
                    // set word spacing
375 8
                    case 'Tw':
376 1
                        break;
377
378
                    // set horizontal scaling
379 8
                    case 'Tz':
380
                        $text .= "\n";
381
                        break;
382
383
                    // move to start of next line
384 8
                    case 'T*':
385 2
                        $text .= "\n";
386 2
                        break;
387
388 7
                    case 'Da':
389
                        break;
390
391 7
                    case 'Do':
392 4
                        if (null !== $page) {
393 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
394 4
                            $id = trim(array_pop($args), '/ ');
395 4
                            $xobject = $page->getXObject($id);
396
397
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
398 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
399
                                // Not a circular reference.
400 4
                                $text .= $xobject->getText($page);
401
                            }
402
                        }
403 4
                        break;
404
405 5
                    case 'rg':
406 5
                    case 'RG':
407 1
                        break;
408
409 5
                    case 're':
410
                        break;
411
412 5
                    case 'co':
413
                        break;
414
415 5
                    case 'cs':
416
                        break;
417
418 5
                    case 'gs':
419 3
                        break;
420
421 4
                    case 'en':
422
                        break;
423
424 4
                    case 'sc':
425 4
                    case 'SC':
426
                        break;
427
428 4
                    case 'g':
429 4
                    case 'G':
430 1
                        break;
431
432 3
                    case 'V':
433
                        break;
434
435 3
                    case 'vo':
436 3
                    case 'Vo':
437
                        break;
438
439
                    default:
440
                }
441
            }
442
443
            // Fix Hebrew and other reverse text oriented languages.
444
            // @see: https://github.com/smalot/pdfparser/issues/398
445 13
            if ($reverse_text) {
446 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

446
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
447 1
                $text = implode('', array_reverse($chars));
448
            }
449
450 13
            $result .= $text;
451
        }
452
453 15
        return $result.' ';
454
    }
455
456
    /**
457
     * @throws \Exception
458
     */
459 5
    public function getTextArray(?Page $page = null): array
460
    {
461 5
        $text = [];
462 5
        $sections = $this->getSectionsText($this->content);
463 5
        $current_font = new Font($this->document, null, null, $this->config);
464
465 5
        foreach ($sections as $section) {
466 5
            $commands = $this->getCommandsText($section);
467
468 5
            foreach ($commands as $command) {
469 5
                switch ($command[self::OPERATOR]) {
470
                    // set character spacing
471 5
                    case 'Tc':
472 2
                        break;
473
474
                    // move text current point
475 5
                    case 'Td':
476 5
                        break;
477
478
                    // move text current point and set leading
479 5
                    case 'TD':
480
                        break;
481
482 5
                    case 'Tf':
483 5
                        if (null !== $page) {
484 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
485 5
                            $id = trim($id, '/');
486 5
                            $current_font = $page->getFont($id);
487
                        }
488 5
                        break;
489
490 5
                    case "'":
491 5
                    case 'Tj':
492 4
                        $command[self::COMMAND] = [$command];
493
                        // no break
494 5
                    case 'TJ':
495 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
496 5
                        $text[] = $sub_text;
497 5
                        break;
498
499
                    // set leading
500 4
                    case 'TL':
501 3
                        break;
502
503 4
                    case 'Tm':
504 3
                        break;
505
506
                    // set super/subscripting text rise
507 4
                    case 'Ts':
508
                        break;
509
510
                    // set word spacing
511 4
                    case 'Tw':
512 1
                        break;
513
514
                    // set horizontal scaling
515 4
                    case 'Tz':
516
                        //$text .= "\n";
517
                        break;
518
519
                    // move to start of next line
520 4
                    case 'T*':
521
                        //$text .= "\n";
522 3
                        break;
523
524 3
                    case 'Da':
525
                        break;
526
527 3
                    case 'Do':
528
                        if (null !== $page) {
529
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
530
                            $id = trim(array_pop($args), '/ ');
531
                            if ($xobject = $page->getXObject($id)) {
532
                                $text[] = $xobject->getText($page);
533
                            }
534
                        }
535
                        break;
536
537 3
                    case 'rg':
538 3
                    case 'RG':
539 2
                        break;
540
541 3
                    case 're':
542
                        break;
543
544 3
                    case 'co':
545
                        break;
546
547 3
                    case 'cs':
548
                        break;
549
550 3
                    case 'gs':
551
                        break;
552
553 3
                    case 'en':
554
                        break;
555
556 3
                    case 'sc':
557 3
                    case 'SC':
558
                        break;
559
560 3
                    case 'g':
561 3
                    case 'G':
562 2
                        break;
563
564 1
                    case 'V':
565
                        break;
566
567 1
                    case 'vo':
568 1
                    case 'Vo':
569
                        break;
570
571
                    default:
572
                }
573
            }
574
        }
575
576 5
        return $text;
577
    }
578
579 23
    public function getCommandsText(string $text_part, int &$offset = 0): array
580
    {
581 23
        $commands = $matches = [];
582
583 23
        while ($offset < \strlen($text_part)) {
584 23
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
585 23
            $char = $text_part[$offset];
586
587 23
            $operator = '';
588 23
            $type = '';
589 23
            $command = false;
590
591 23
            switch ($char) {
592 23
                case '/':
593 23
                    $type = $char;
594 23
                    if (preg_match(
595 23
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
596 23
                        substr($text_part, $offset),
597
                        $matches
598
                    )
599
                    ) {
600 23
                        $operator = $matches[2];
601 23
                        $command = $matches[1];
602 23
                        $offset += \strlen($matches[0]);
603 7
                    } elseif (preg_match(
604 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
605 7
                        substr($text_part, $offset),
606
                        $matches
607
                    )
608
                    ) {
609 7
                        $operator = $matches[2];
610 7
                        $command = $matches[1];
611 7
                        $offset += \strlen($matches[0]);
612
                    }
613 23
                    break;
614
615 23
                case '[':
616 23
                case ']':
617
                    // array object
618 21
                    $type = $char;
619 21
                    if ('[' == $char) {
620 21
                        ++$offset;
621
                        // get elements
622 21
                        $command = $this->getCommandsText($text_part, $offset);
623
624 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
625 21
                            $operator = trim($matches[0]);
626 21
                            $offset += \strlen($matches[0]);
627
                        }
628
                    } else {
629 21
                        ++$offset;
630 21
                        break;
631
                    }
632 21
                    break;
633
634 23
                case '<':
635 23
                case '>':
636
                    // array object
637 10
                    $type = $char;
638 10
                    ++$offset;
639 10
                    if ('<' == $char) {
640 10
                        $strpos = strpos($text_part, '>', $offset);
641 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
642 10
                        $offset = $strpos + 1;
643
                    }
644
645 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
646 7
                        $operator = trim($matches[0]);
647 7
                        $offset += \strlen($matches[0]);
648
                    }
649 10
                    break;
650
651 23
                case '(':
652 23
                case ')':
653 16
                    ++$offset;
654 16
                    $type = $char;
655 16
                    $strpos = $offset;
656 16
                    if ('(' == $char) {
657 16
                        $open_bracket = 1;
658 16
                        while ($open_bracket > 0) {
659 16
                            if (!isset($text_part[$strpos])) {
660
                                break;
661
                            }
662 16
                            $ch = $text_part[$strpos];
663 16
                            switch ($ch) {
664 16
                                case '\\':
665
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
666
                                    // skip next character
667 11
                                    ++$strpos;
668 11
                                    break;
669
670 16
                                case '(':
671
                                 // LEFT PARENHESIS (28h)
672
                                    ++$open_bracket;
673
                                    break;
674
675 16
                                case ')':
676
                                 // RIGHT PARENTHESIS (29h)
677 16
                                    --$open_bracket;
678 16
                                    break;
679
                            }
680 16
                            ++$strpos;
681
                        }
682 16
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
683 16
                        $offset = $strpos;
684
685 16
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
686 12
                            $operator = $matches[1];
687 12
                            $offset += \strlen($matches[0]);
688
                        }
689
                    }
690 16
                    break;
691
692
                default:
693 23
                    if ('ET' == substr($text_part, $offset, 2)) {
694 1
                        break;
695 23
                    } elseif (preg_match(
696 23
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
697 23
                        substr($text_part, $offset),
698
                        $matches
699
                    )
700
                    ) {
701 23
                        $operator = trim($matches['id']);
702 23
                        $command = trim($matches['data']);
703 23
                        $offset += \strlen($matches[0]);
704 19
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
705 18
                        $type = 'n';
706 18
                        $command = trim($matches[0]);
707 18
                        $offset += \strlen($matches[0]);
708 12
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
709 12
                        $type = '';
710 12
                        $operator = $matches[1];
711 12
                        $command = '';
712 12
                        $offset += \strlen($matches[0]);
713
                    }
714
            }
715
716 23
            if (false !== $command) {
717 23
                $commands[] = [
718 23
                    self::TYPE => $type,
719 23
                    self::OPERATOR => $operator,
720 23
                    self::COMMAND => $command,
721
                ];
722
            } else {
723 21
                break;
724
            }
725
        }
726
727 23
        return $commands;
728
    }
729
730 35
    public static function factory(
731
        Document $document,
732
        Header $header,
733
        ?string $content,
734
        ?Config $config = null
735
    ): self {
736 35
        switch ($header->get('Type')->getContent()) {
737 35
            case 'XObject':
738 8
                switch ($header->get('Subtype')->getContent()) {
739 8
                    case 'Image':
740 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

740
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
741
742 6
                    case 'Form':
743 6
                        return new Form($document, $header, $content, $config);
744
                }
745
746
                return new self($document, $header, $content, $config);
747
748 35
            case 'Pages':
749 34
                return new Pages($document, $header, $content, $config);
750
751 35
            case 'Page':
752 34
                return new Page($document, $header, $content, $config);
753
754 35
            case 'Encoding':
755 5
                return new Encoding($document, $header, $content, $config);
756
757 35
            case 'Font':
758 34
                $subtype = $header->get('Subtype')->getContent();
759 34
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
760
761 34
                if (class_exists($classname)) {
762 34
                    return new $classname($document, $header, $content, $config);
763
                }
764
765
                return new Font($document, $header, $content, $config);
766
767
            default:
768 35
                return new self($document, $header, $content, $config);
769
        }
770
    }
771
772
    /**
773
     * Returns unique id identifying the object.
774
     */
775 15
    protected function getUniqueId(): string
776
    {
777 15
        return spl_object_hash($this);
778
    }
779
}
780