Test Failed
Pull Request — master (#510)
by Jeremy
04:34 queued 02:12
created

PDFObject::factory()   B

Complexity

Conditions 10
Paths 9

Size

Total Lines 39
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 10.0751

Importance

Changes 0
Metric Value
cc 10
eloc 22
c 0
b 0
f 0
nc 9
nop 4
dl 0
loc 39
ccs 20
cts 22
cp 0.9091
crap 10.0751
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 55
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 55
        $this->document = $document;
81 55
        $this->header = null !== $header ? $header : new Header();
82 55
        $this->content = $content;
83 55
        $this->config = $config;
84 55
    }
85
86 42
    public function init()
87
    {
88 42
    }
89
90 42
    public function getHeader(): ?Header
91
    {
92 42
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 43
    public function get(string $name)
99
    {
100 43
        return $this->header->get($name);
101
    }
102
103 40
    public function has(string $name): bool
104
    {
105 40
        return $this->header->has($name);
106
    }
107
108
    public function getDetails(bool $deep = true): array
109
    {
110
        return $this->header->getDetails($deep);
111
    }
112
113 31
    public function getContent(): ?string
114
    {
115 31
        return $this->content;
116
    }
117
118 26
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 26
        $char = $char[0];
121 26
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 26
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 26
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 26
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 26
        foreach ($matches[1] as $part) {
132 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 26
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 26
        foreach ($matches[1] as $part) {
138 15
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 26
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 26
            $content = '';
144 26
            $level = 0;
145 26
            foreach ($parts as $part) {
146 26
                if ('<' == $part) {
147 14
                    ++$level;
148
                }
149
150 26
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 26
                if ('>' == $part) {
153 14
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 26
        preg_match_all(
160 26
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 26
            \PREG_OFFSET_CAPTURE
164
        );
165 26
        foreach ($matches[1] as $part) {
166 4
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 26
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 26
        foreach ($matches[1] as $part) {
171 8
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 26
        return $content;
175
    }
176
177 25
    public function getSectionsText(?string $content): array
178
    {
179 25
        $sections = [];
180 25
        $content = ' '.$content.' ';
181 25
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 25
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 23
            foreach ($matches[2] as $pos => $part) {
186 23
                $text = $part[0];
187 23
                if ('' === $text) {
188
                    continue;
189
                }
190 23
                $offset = $part[1];
191 23
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 23
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 23
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 23
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 25
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 3
            foreach ($matches[1] as $part) {
207 3
                $text = $part[0];
208 3
                $offset = $part[1];
209 3
                $section = substr($content, $offset, \strlen($text));
210
211 3
                $sections[] = $section;
212
            }
213
        }
214
215 25
        return $sections;
216
    }
217
218 15
    private function getDefaultFont(Page $page = null): Font
219
    {
220 15
        $fonts = [];
221 15
        if (null !== $page) {
222 14
            $fonts = $page->getFonts();
223
        }
224
225 15
        $firstFont = $this->document->getFirstFont();
226 15
        if (null !== $firstFont) {
227 13
            $fonts[] = $firstFont;
228
        }
229
230 15
        if (\count($fonts) > 0) {
231 13
            return reset($fonts);
232
        }
233
234 2
        return new Font($this->document, null, null, $this->config);
235
    }
236
237
    /**
238
     * @throws \Exception
239
     */
240 15
    public function getText(?Page $page = null): string
241
    {
242 15
        $result = '';
243 15
        $sections = $this->getSectionsText($this->content);
244 15
        $current_font = $this->getDefaultFont($page);
245 15
        $clipped_font = $current_font;
246
247 15
        $current_position_td = ['x' => false, 'y' => false];
248 15
        $current_position_tm = ['x' => false, 'y' => false];
249
250 15
        self::$recursionStack[] = $this->getUniqueId();
251
252 15
        foreach ($sections as $section) {
253 13
            $commands = $this->getCommandsText($section);
254 13
            $reverse_text = false;
255 13
            $text = '';
256
257 13
            foreach ($commands as $command) {
258 13
                switch ($command[self::OPERATOR]) {
259 13
                    case 'BMC':
260 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
261 1
                            $reverse_text = true;
262
                        }
263 1
                        break;
264
265
                    // set character spacing
266 13
                    case 'Tc':
267 2
                        break;
268
269
                    // move text current point
270 13
                    case 'Td':
271 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
272 11
                        $y = array_pop($args);
273 11
                        $x = array_pop($args);
274 11
                        if (((float) $x <= 0) ||
275 11
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
276
                        ) {
277
                            // vertical offset
278 7
                            $text .= "\n";
279 11
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
280 11
                                $current_position_td['x']
281
                            )
282
                        ) {
283 8
                            $text .= $this->config->getHorizontalOffset();
284
                        }
285 11
                        $current_position_td = ['x' => $x, 'y' => $y];
286 11
                        break;
287
288
                    // move text current point and set leading
289 13
                    case 'TD':
290 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
291 1
                        $y = array_pop($args);
292 1
                        $x = array_pop($args);
293 1
                        if ((float) $y < 0) {
294 1
                            $text .= "\n";
295
                        } elseif ((float) $x <= 0) {
296
                            $text .= ' ';
297
                        }
298 1
                        break;
299
300 13
                    case 'Tf':
301 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
302 13
                        $id = trim($id, '/');
303 13
                        if (null !== $page) {
304 13
                            $new_font = $page->getFont($id);
305
                            // If an invalid font ID is given, do not update the font.
306
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
307
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
308
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
309
                            // But we want to make sure that malformed PDFs do not simply crash.
310 13
                            if (null !== $new_font) {
311 12
                                $current_font = $new_font;
312
                            }
313
                        }
314 13
                        break;
315
316 13
                    case 'Q':
317
                        // Use clip: restore font.
318 3
                        $current_font = $clipped_font;
319 3
                        break;
320
321 13
                    case 'q':
322
                        // Use clip: save font.
323 3
                        $clipped_font = $current_font;
324 3
                        break;
325
326 13
                    case "'":
327 13
                    case 'Tj':
328 8
                        $command[self::COMMAND] = [$command];
329
                        // no break
330 13
                    case 'TJ':
331 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
332 13
                        $text .= $sub_text;
333 13
                        break;
334
335
                    // set leading
336 11
                    case 'TL':
337 1
                        $text .= ' ';
338 1
                        break;
339
340 11
                    case 'Tm':
341 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
342 11
                        $y = array_pop($args);
343 11
                        $x = array_pop($args);
344 11
                        if (false !== $current_position_tm['x']) {
345 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
346 11
                            if ($delta > 10) {
347 9
                                $text .= "\t";
348
                            }
349
                        }
350 11
                        if (false !== $current_position_tm['y']) {
351 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
352 11
                            if ($delta > 10) {
353 7
                                $text .= "\n";
354
                            }
355
                        }
356 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
357 11
                        break;
358
359
                    // set super/subscripting text rise
360 8
                    case 'Ts':
361
                        break;
362
363
                    // set word spacing
364 8
                    case 'Tw':
365 2
                        break;
366
367
                    // set horizontal scaling
368 8
                    case 'Tz':
369
                        $text .= "\n";
370
                        break;
371
372
                    // move to start of next line
373 8
                    case 'T*':
374 2
                        $text .= "\n";
375 2
                        break;
376
377 7
                    case 'Da':
378
                        break;
379
380 7
                    case 'Do':
381 3
                        if (null !== $page) {
382 3
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
383 3
                            $id = trim(array_pop($args), '/ ');
384 3
                            $xobject = $page->getXObject($id);
385
386
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
387 3
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
388
                                // Not a circular reference.
389 3
                                $text .= $xobject->getText($page);
390
                            }
391
                        }
392 3
                        break;
393
394 6
                    case 'rg':
395 6
                    case 'RG':
396 1
                        break;
397
398 6
                    case 're':
399
                        break;
400
401 6
                    case 'co':
402
                        break;
403
404 6
                    case 'cs':
405 1
                        break;
406
407 6
                    case 'gs':
408 3
                        break;
409
410 5
                    case 'en':
411
                        break;
412
413 5
                    case 'sc':
414 5
                    case 'SC':
415
                        break;
416
417 5
                    case 'g':
418 5
                    case 'G':
419 1
                        break;
420
421 4
                    case 'V':
422
                        break;
423
424 4
                    case 'vo':
425 4
                    case 'Vo':
426
                        break;
427
428
                    default:
429
                }
430
            }
431
432
            // Fix Hebrew and other reverse text oriented languages.
433
            // @see: https://github.com/smalot/pdfparser/issues/398
434 13
            if ($reverse_text) {
435 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

435
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
436 1
                $text = implode('', array_reverse($chars));
437
            }
438
439 13
            $result .= $text;
440
        }
441
442 15
        return $result.' ';
443
    }
444
445
    /**
446
     * @throws \Exception
447
     */
448 5
    public function getTextArray(?Page $page = null): array
449
    {
450 5
        $text = [];
451 5
        $sections = $this->getSectionsText($this->content);
452 5
        $current_font = new Font($this->document, null, null, $this->config);
453
454 5
        foreach ($sections as $section) {
455 5
            $commands = $this->getCommandsText($section);
456
457 5
            foreach ($commands as $command) {
458 5
                switch ($command[self::OPERATOR]) {
459
                    // set character spacing
460 5
                    case 'Tc':
461 2
                        break;
462
463
                    // move text current point
464 5
                    case 'Td':
465 5
                        break;
466
467
                    // move text current point and set leading
468 5
                    case 'TD':
469
                        break;
470
471 5
                    case 'Tf':
472 5
                        if (null !== $page) {
473 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
474 5
                            $id = trim($id, '/');
475 5
                            $current_font = $page->getFont($id);
476
                        }
477 5
                        break;
478
479 5
                    case "'":
480 5
                    case 'Tj':
481 4
                        $command[self::COMMAND] = [$command];
482
                        // no break
483 5
                    case 'TJ':
484 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
485 5
                        $text[] = $sub_text;
486 5
                        break;
487
488
                    // set leading
489 4
                    case 'TL':
490 3
                        break;
491
492 4
                    case 'Tm':
493 3
                        break;
494
495
                    // set super/subscripting text rise
496 4
                    case 'Ts':
497
                        break;
498
499
                    // set word spacing
500 4
                    case 'Tw':
501 1
                        break;
502
503
                    // set horizontal scaling
504 4
                    case 'Tz':
505
                        //$text .= "\n";
506
                        break;
507
508
                    // move to start of next line
509 4
                    case 'T*':
510
                        //$text .= "\n";
511 3
                        break;
512
513 3
                    case 'Da':
514
                        break;
515
516 3
                    case 'Do':
517
                        if (null !== $page) {
518
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
519
                            $id = trim(array_pop($args), '/ ');
520
                            if ($xobject = $page->getXObject($id)) {
521
                                $text[] = $xobject->getText($page);
522
                            }
523
                        }
524
                        break;
525
526 3
                    case 'rg':
527 3
                    case 'RG':
528 2
                        break;
529
530 3
                    case 're':
531
                        break;
532
533 3
                    case 'co':
534
                        break;
535
536 3
                    case 'cs':
537
                        break;
538
539 3
                    case 'gs':
540
                        break;
541
542 3
                    case 'en':
543
                        break;
544
545 3
                    case 'sc':
546 3
                    case 'SC':
547
                        break;
548
549 3
                    case 'g':
550 3
                    case 'G':
551 2
                        break;
552
553 1
                    case 'V':
554
                        break;
555
556 1
                    case 'vo':
557 1
                    case 'Vo':
558
                        break;
559
560
                    default:
561
                }
562
            }
563
        }
564
565 5
        return $text;
566
    }
567
568 23
    public function getCommandsText(string $text_part, int &$offset = 0): array
569
    {
570 23
        $commands = $matches = [];
571
572 23
        while ($offset < \strlen($text_part)) {
573 23
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
574 23
            $char = $text_part[$offset];
575
576 23
            $operator = '';
577 23
            $type = '';
578 23
            $command = false;
579
580 23
            switch ($char) {
581 23
                case '/':
582 23
                    $type = $char;
583 23
                    if (preg_match(
584 23
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
585 23
                        substr($text_part, $offset),
586
                        $matches
587
                    )
588
                    ) {
589 23
                        $operator = $matches[2];
590 23
                        $command = $matches[1];
591 23
                        $offset += \strlen($matches[0]);
592 7
                    } elseif (preg_match(
593 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
594 7
                        substr($text_part, $offset),
595
                        $matches
596
                    )
597
                    ) {
598 7
                        $operator = $matches[2];
599 7
                        $command = $matches[1];
600 7
                        $offset += \strlen($matches[0]);
601
                    }
602 23
                    break;
603
604 23
                case '[':
605 23
                case ']':
606
                    // array object
607 21
                    $type = $char;
608 21
                    if ('[' == $char) {
609 21
                        ++$offset;
610
                        // get elements
611 21
                        $command = $this->getCommandsText($text_part, $offset);
612
613 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
614 21
                            $operator = trim($matches[0]);
615 21
                            $offset += \strlen($matches[0]);
616
                        }
617
                    } else {
618 21
                        ++$offset;
619 21
                        break;
620
                    }
621 21
                    break;
622
623 23
                case '<':
624 23
                case '>':
625
                    // array object
626 10
                    $type = $char;
627 10
                    ++$offset;
628 10
                    if ('<' == $char) {
629 10
                        $strpos = strpos($text_part, '>', $offset);
630 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
631 10
                        $offset = $strpos + 1;
632
                    }
633
634 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
635 7
                        $operator = trim($matches[0]);
636 7
                        $offset += \strlen($matches[0]);
637
                    }
638 10
                    break;
639
640 23
                case '(':
641 23
                case ')':
642 16
                    ++$offset;
643 16
                    $type = $char;
644 16
                    $strpos = $offset;
645 16
                    if ('(' == $char) {
646 16
                        $open_bracket = 1;
647 16
                        while ($open_bracket > 0) {
648 16
                            if (!isset($text_part[$strpos])) {
649
                                break;
650
                            }
651 16
                            $ch = $text_part[$strpos];
652 16
                            switch ($ch) {
653 16
                                case '\\':
654
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
655
                                    // skip next character
656 11
                                    ++$strpos;
657 11
                                    break;
658
659 16
                                case '(':
660
                                 // LEFT PARENHESIS (28h)
661
                                    ++$open_bracket;
662
                                    break;
663
664 16
                                case ')':
665
                                 // RIGHT PARENTHESIS (29h)
666 16
                                    --$open_bracket;
667 16
                                    break;
668
                            }
669 16
                            ++$strpos;
670
                        }
671 16
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
672 16
                        $offset = $strpos;
673
674 16
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
675 12
                            $operator = $matches[1];
676 12
                            $offset += \strlen($matches[0]);
677
                        }
678
                    }
679 16
                    break;
680
681
                default:
682 23
                    if ('ET' == substr($text_part, $offset, 2)) {
683 1
                        break;
684 23
                    } elseif (preg_match(
685 23
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
686 23
                        substr($text_part, $offset),
687
                        $matches
688
                    )
689
                    ) {
690 23
                        $operator = trim($matches['id']);
691 23
                        $command = trim($matches['data']);
692 23
                        $offset += \strlen($matches[0]);
693 19
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
694 18
                        $type = 'n';
695 18
                        $command = trim($matches[0]);
696 18
                        $offset += \strlen($matches[0]);
697 12
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
698 12
                        $type = '';
699 12
                        $operator = $matches[1];
700 12
                        $command = '';
701 12
                        $offset += \strlen($matches[0]);
702
                    }
703
            }
704
705 23
            if (false !== $command) {
706 23
                $commands[] = [
707 23
                    self::TYPE => $type,
708 23
                    self::OPERATOR => $operator,
709 23
                    self::COMMAND => $command,
710
                ];
711
            } else {
712 21
                break;
713
            }
714
        }
715
716 23
        return $commands;
717
    }
718
719 35
    public static function factory(
720
        Document $document,
721
        Header $header,
722
        ?string $content,
723
        ?Config $config = null
724
    ): self {
725 35
        switch ($header->get('Type')->getContent()) {
726 35
            case 'XObject':
727 8
                switch ($header->get('Subtype')->getContent()) {
728 8
                    case 'Image':
729 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

729
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
730
731 6
                    case 'Form':
732 6
                        return new Form($document, $header, $content, $config);
733
                }
734
735
                return new self($document, $header, $content, $config);
736
737 35
            case 'Pages':
738 34
                return new Pages($document, $header, $content, $config);
739
740 35
            case 'Page':
741 34
                return new Page($document, $header, $content, $config);
742
743 35
            case 'Encoding':
744 4
                return new Encoding($document, $header, $content, $config);
745
746 35
            case 'Font':
747 34
                $subtype = $header->get('Subtype')->getContent();
748 34
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
749
750 34
                if (class_exists($classname)) {
751 34
                    return new $classname($document, $header, $content, $config);
752
                }
753
754
                return new Font($document, $header, $content, $config);
755
756
            default:
757 35
                return new self($document, $header, $content, $config);
758
        }
759
    }
760
761
    /**
762
     * Returns unique id identifying the object.
763
     */
764 15
    protected function getUniqueId(): string
765
    {
766 15
        return spl_object_hash($this);
767
    }
768
}
769