Passed
Pull Request — master (#544)
by
unknown
08:13
created

PDFObject::factory()   B

Complexity

Conditions 10
Paths 9

Size

Total Lines 39
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 10.0751

Importance

Changes 0
Metric Value
cc 10
eloc 22
c 0
b 0
f 0
nc 9
nop 4
dl 0
loc 39
ccs 20
cts 22
cp 0.9091
crap 10.0751
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 60
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 60
        $this->document = $document;
81 60
        $this->header = $header ?? new Header();
82 60
        $this->content = $content;
83 60
        $this->config = $config;
84 60
    }
85
86 47
    public function init()
87
    {
88 47
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 47
    public function getHeader(): ?Header
96
    {
97 47
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 48
    public function get(string $name)
109
    {
110 48
        return $this->header->get($name);
111
    }
112
113 45
    public function has(string $name): bool
114
    {
115 45
        return $this->header->has($name);
116
    }
117
118 3
    public function getDetails(bool $deep = true): array
119
    {
120 3
        return $this->header->getDetails($deep);
121
    }
122
123 37
    public function getContent(): ?string
124
    {
125 37
        return $this->content;
126
    }
127
128 30
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 30
        $char = $char[0];
131 30
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 30
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 30
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 30
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 30
        foreach ($matches[1] as $part) {
142 20
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 30
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 30
        foreach ($matches[1] as $part) {
148 19
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 30
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 30
            $content = '';
154 30
            $level = 0;
155 30
            foreach ($parts as $part) {
156 30
                if ('<' == $part) {
157 17
                    ++$level;
158
                }
159
160 30
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 30
                if ('>' == $part) {
163 17
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 30
        preg_match_all(
170 30
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 30
            \PREG_OFFSET_CAPTURE
174
        );
175 30
        foreach ($matches[1] as $part) {
176 5
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 30
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 30
        foreach ($matches[1] as $part) {
181 9
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 30
        return $content;
185
    }
186
187 29
    public function getSectionsText(?string $content): array
188
    {
189 29
        $sections = [];
190 29
        $content = ' '.$content.' ';
191 29
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 29
        if (preg_match_all('/(.*?)\s+BT[\s|\(|\[]+(.*?)\s*ET(?=\s|$)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(.*?)\s+BT[\s|\(|\[]+(.*?)\s*ET(?=\s|$)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 27
            foreach ($matches[2] as $pos => $part) {
196 27
                $text = $part[0];
197 27
                if ('' === $text) {
198
                    continue;
199
                }
200 27
                $offset = $part[1];
201 27
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 27
                $section = trim(preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' '));
205
206
                // Add Q & q flags and Tf commands which before text block.
207
                // @see: https://github.com/smalot/pdfparser/issues/387
208
                // @see: https://github.com/smalot/pdfparser/issues/542
209 27
                if (!empty($matches[1][$pos][0])) {
210 26
                    if (preg_match_all('/(?:\s|^)([Qq])(?:\s|$)/', $matches[1][$pos][0], $qMatches, \PREG_OFFSET_CAPTURE)) {
211 20
                        $len = \strlen($matches[1][$pos][0]);
212 20
                        for ($i = \count($qMatches[0]) - 1; $i >= 0; --$i) {
213 20
                            $str = substr($matches[1][$pos][0], $qMatches[0][$i][1] + 3, $len - ($qMatches[0][$i][1] + 3));
214 20
                            $len = $qMatches[0][$i][1];
215 20
                            if (preg_match('/\sTf(\s|$)/', $str)) {
216 2
                                $section = trim($str)."\n".$section;
217
                            }
218 20
                            if ('Q' == $qMatches[1][$i][0]) {
219 19
                                $section = "Q\n".$section;
220 19
                            } elseif ('q' == $qMatches[1][$i][0]) {
221 19
                                $section = "q\n".$section;
222
                            }
223
                        }
224 20
                        $str = substr($matches[1][$pos][0], 0, $qMatches[0][0][1]);
225 20
                        if (preg_match('/\sTf(\s|$)/', $str)) {
226 20
                            $section = trim($str)."\n".$section;
227
                        }
228 16
                    } elseif (preg_match('/\sTf(\s|$)/', $matches[1][$pos][0])) {
229 1
                        $section = trim($matches[1][$pos][0])."\n".$section;
230
                    }
231
                }
232
233 27
                $sections[] = $section;
234
            }
235
        }
236
237
        // Extract 'do' commands.
238 29
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
239 5
            foreach ($matches[1] as $part) {
240 5
                $text = $part[0];
241 5
                $offset = $part[1];
242 5
                $section = substr($content, $offset, \strlen($text));
243
244 5
                $sections[] = $section;
245
            }
246
        }
247
248 29
        return $sections;
249
    }
250
251 18
    private function getDefaultFont(Page $page = null): Font
252
    {
253 18
        $fonts = [];
254 18
        if (null !== $page) {
255 17
            $fonts = $page->getFonts();
256
        }
257
258 18
        $firstFont = $this->document->getFirstFont();
259 18
        if (null !== $firstFont) {
260 16
            $fonts[] = $firstFont;
261
        }
262
263 18
        if (\count($fonts) > 0) {
264 16
            return reset($fonts);
265
        }
266
267 2
        return new Font($this->document, null, null, $this->config);
268
    }
269
270
    /**
271
     * @throws \Exception
272
     */
273 18
    public function getText(?Page $page = null): string
274
    {
275 18
        $result = '';
276 18
        $sections = $this->getSectionsText($this->content);
277 18
        $current_font = $this->getDefaultFont($page);
278 18
        $clipped_font = $current_font;
279
280 18
        $current_position_td = ['x' => false, 'y' => false];
281 18
        $current_position_tm = ['x' => false, 'y' => false];
282
283 18
        self::$recursionStack[] = $this->getUniqueId();
284
285 18
        foreach ($sections as $section) {
286 16
            $commands = $this->getCommandsText($section);
287 16
            $reverse_text = false;
288 16
            $text = '';
289
290 16
            foreach ($commands as $command) {
291 16
                switch ($command[self::OPERATOR]) {
292 16
                    case 'BMC':
293 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
294 1
                            $reverse_text = true;
295
                        }
296 1
                        break;
297
298
                    // set character spacing
299 16
                    case 'Tc':
300 4
                        break;
301
302
                    // move text current point
303 16
                    case 'Td':
304 13
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
305 13
                        $y = array_pop($args);
306 13
                        $x = array_pop($args);
307 13
                        if (((float) $x <= 0) ||
308 13
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
309
                        ) {
310
                            // vertical offset
311 9
                            $text .= "\n";
312 13
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
313 13
                                $current_position_td['x']
314
                            )
315
                        ) {
316 9
                            $text .= $this->config->getHorizontalOffset();
317
                        }
318 13
                        $current_position_td = ['x' => $x, 'y' => $y];
319 13
                        break;
320
321
                    // move text current point and set leading
322 16
                    case 'TD':
323 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
324 1
                        $y = array_pop($args);
325 1
                        $x = array_pop($args);
326 1
                        if ((float) $y < 0) {
327 1
                            $text .= "\n";
328
                        } elseif ((float) $x <= 0) {
329
                            $text .= ' ';
330
                        }
331 1
                        break;
332
333 16
                    case 'Tf':
334 16
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
335 16
                        $id = trim($id, '/');
336 16
                        if (null !== $page) {
337 16
                            $new_font = $page->getFont($id);
338
                            // If an invalid font ID is given, do not update the font.
339
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
340
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
341
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
342
                            // But we want to make sure that malformed PDFs do not simply crash.
343 16
                            if (null !== $new_font) {
344 15
                                $current_font = $new_font;
345
                            }
346
                        }
347 16
                        break;
348
349 16
                    case 'Q':
350
                        // Use clip: restore font.
351 11
                        $current_font = $clipped_font;
352 11
                        break;
353
354 16
                    case 'q':
355
                        // Use clip: save font.
356 12
                        $clipped_font = $current_font;
357 12
                        break;
358
359 16
                    case "'":
360 16
                    case 'Tj':
361 11
                        $command[self::COMMAND] = [$command];
362
                        // no break
363 15
                    case 'TJ':
364 16
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
365 16
                        $text .= $sub_text;
366 16
                        break;
367
368
                    // set leading
369 13
                    case 'TL':
370 1
                        $text .= ' ';
371 1
                        break;
372
373 13
                    case 'Tm':
374 13
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
375 13
                        $y = array_pop($args);
376 13
                        $x = array_pop($args);
377 13
                        if (false !== $current_position_tm['x']) {
378 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
379 12
                            if ($delta > 10) {
380 10
                                $text .= "\t";
381
                            }
382
                        }
383 13
                        if (false !== $current_position_tm['y']) {
384 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
385 12
                            if ($delta > 10) {
386 8
                                $text .= "\n";
387
                            }
388
                        }
389 13
                        $current_position_tm = ['x' => $x, 'y' => $y];
390 13
                        break;
391
392
                    // set super/subscripting text rise
393 10
                    case 'Ts':
394
                        break;
395
396
                    // set word spacing
397 10
                    case 'Tw':
398 2
                        break;
399
400
                    // set horizontal scaling
401 10
                    case 'Tz':
402
                        $text .= "\n";
403
                        break;
404
405
                    // move to start of next line
406 10
                    case 'T*':
407 2
                        $text .= "\n";
408 2
                        break;
409
410 9
                    case 'Da':
411
                        break;
412
413 9
                    case 'Do':
414 5
                        if (null !== $page) {
415 5
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
416 5
                            $id = trim(array_pop($args), '/ ');
417 5
                            $xobject = $page->getXObject($id);
418
419
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
420 5
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
421
                                // Not a circular reference.
422 5
                                $text .= $xobject->getText($page);
423
                            }
424
                        }
425 5
                        break;
426
427 7
                    case 'rg':
428 7
                    case 'RG':
429 1
                        break;
430
431 7
                    case 're':
432
                        break;
433
434 7
                    case 'co':
435
                        break;
436
437 7
                    case 'cs':
438 1
                        break;
439
440 7
                    case 'gs':
441 3
                        break;
442
443 7
                    case 'en':
444
                        break;
445
446 7
                    case 'sc':
447 6
                    case 'SC':
448 1
                        break;
449
450 6
                    case 'g':
451 6
                    case 'G':
452 2
                        break;
453
454 5
                    case 'V':
455
                        break;
456
457 5
                    case 'vo':
458 5
                    case 'Vo':
459
                        break;
460
461
                    default:
462
                }
463
            }
464
465
            // Fix Hebrew and other reverse text oriented languages.
466
            // @see: https://github.com/smalot/pdfparser/issues/398
467 16
            if ($reverse_text) {
468 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

468
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
469 1
                $text = implode('', array_reverse($chars));
470
            }
471
472 16
            $result .= $text;
473
        }
474
475 18
        return $result.' ';
476
    }
477
478
    /**
479
     * @throws \Exception
480
     */
481 6
    public function getTextArray(?Page $page = null): array
482
    {
483 6
        $text = [];
484 6
        $sections = $this->getSectionsText($this->content);
485 6
        $current_font = new Font($this->document, null, null, $this->config);
486
487 6
        foreach ($sections as $section) {
488 6
            $commands = $this->getCommandsText($section);
489
490 6
            foreach ($commands as $command) {
491 6
                switch ($command[self::OPERATOR]) {
492
                    // set character spacing
493 6
                    case 'Tc':
494 3
                        break;
495
496
                    // move text current point
497 6
                    case 'Td':
498 6
                        break;
499
500
                    // move text current point and set leading
501 6
                    case 'TD':
502
                        break;
503
504 6
                    case 'Tf':
505 6
                        if (null !== $page) {
506 6
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
507 6
                            $id = trim($id, '/');
508 6
                            $current_font = $page->getFont($id);
509
                        }
510 6
                        break;
511
512 6
                    case "'":
513 6
                    case 'Tj':
514 5
                        $command[self::COMMAND] = [$command];
515
                        // no break
516 6
                    case 'TJ':
517 6
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
518 6
                        $text[] = $sub_text;
519 6
                        break;
520
521
                    // set leading
522 5
                    case 'TL':
523 4
                        break;
524
525 5
                    case 'Tm':
526 4
                        break;
527
528
                    // set super/subscripting text rise
529 5
                    case 'Ts':
530
                        break;
531
532
                    // set word spacing
533 5
                    case 'Tw':
534 2
                        break;
535
536
                    // set horizontal scaling
537 5
                    case 'Tz':
538
                        //$text .= "\n";
539
                        break;
540
541
                    // move to start of next line
542 5
                    case 'T*':
543
                        //$text .= "\n";
544 4
                        break;
545
546 5
                    case 'Da':
547
                        break;
548
549 5
                    case 'Do':
550
                        if (null !== $page) {
551
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
552
                            $id = trim(array_pop($args), '/ ');
553
                            if ($xobject = $page->getXObject($id)) {
554
                                $text[] = $xobject->getText($page);
555
                            }
556
                        }
557
                        break;
558
559 5
                    case 'rg':
560 5
                    case 'RG':
561 2
                        break;
562
563 5
                    case 're':
564
                        break;
565
566 5
                    case 'co':
567
                        break;
568
569 5
                    case 'cs':
570
                        break;
571
572 5
                    case 'gs':
573 1
                        break;
574
575 5
                    case 'en':
576
                        break;
577
578 5
                    case 'sc':
579 5
                    case 'SC':
580
                        break;
581
582 5
                    case 'g':
583 5
                    case 'G':
584 2
                        break;
585
586 5
                    case 'V':
587
                        break;
588
589 5
                    case 'vo':
590 5
                    case 'Vo':
591
                        break;
592
593
                    default:
594
                }
595
            }
596
        }
597
598 6
        return $text;
599
    }
600
601 27
    public function getCommandsText(string $text_part, int &$offset = 0): array
602
    {
603 27
        $commands = $matches = [];
604
605 27
        while ($offset < \strlen($text_part)) {
606 27
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
607 27
            $char = $text_part[$offset];
608
609 27
            $operator = '';
610 27
            $type = '';
611 27
            $command = false;
612
613 27
            switch ($char) {
614 27
                case '/':
615 27
                    $type = $char;
616 27
                    if (preg_match(
617 27
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
618 27
                        substr($text_part, $offset),
619
                        $matches
620
                    )
621
                    ) {
622 27
                        $operator = $matches[2];
623 27
                        $command = $matches[1];
624 27
                        $offset += \strlen($matches[0]);
625 10
                    } elseif (preg_match(
626 10
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
627 10
                        substr($text_part, $offset),
628
                        $matches
629
                    )
630
                    ) {
631 10
                        $operator = $matches[2];
632 10
                        $command = $matches[1];
633 10
                        $offset += \strlen($matches[0]);
634
                    }
635 27
                    break;
636
637 27
                case '[':
638 27
                case ']':
639
                    // array object
640 23
                    $type = $char;
641 23
                    if ('[' == $char) {
642 23
                        ++$offset;
643
                        // get elements
644 23
                        $command = $this->getCommandsText($text_part, $offset);
645
646 23
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
647 23
                            $operator = trim($matches[0]);
648 23
                            $offset += \strlen($matches[0]);
649
                        }
650
                    } else {
651 23
                        ++$offset;
652 23
                        break;
653
                    }
654 23
                    break;
655
656 27
                case '<':
657 27
                case '>':
658
                    // array object
659 13
                    $type = $char;
660 13
                    ++$offset;
661 13
                    if ('<' == $char) {
662 13
                        $strpos = strpos($text_part, '>', $offset);
663 13
                        $command = substr($text_part, $offset, ($strpos - $offset));
664 13
                        $offset = $strpos + 1;
665
                    }
666
667 13
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
668 10
                        $operator = trim($matches[0]);
669 10
                        $offset += \strlen($matches[0]);
670
                    }
671 13
                    break;
672
673 27
                case '(':
674 27
                case ')':
675 20
                    ++$offset;
676 20
                    $type = $char;
677 20
                    $strpos = $offset;
678 20
                    if ('(' == $char) {
679 20
                        $open_bracket = 1;
680 20
                        while ($open_bracket > 0) {
681 20
                            if (!isset($text_part[$strpos])) {
682
                                break;
683
                            }
684 20
                            $ch = $text_part[$strpos];
685 20
                            switch ($ch) {
686 20
                                case '\\':
687
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
688
                                    // skip next character
689 13
                                    ++$strpos;
690 13
                                    break;
691
692 20
                                case '(':
693
                                 // LEFT PARENHESIS (28h)
694
                                    ++$open_bracket;
695
                                    break;
696
697 20
                                case ')':
698
                                 // RIGHT PARENTHESIS (29h)
699 20
                                    --$open_bracket;
700 20
                                    break;
701
                            }
702 20
                            ++$strpos;
703
                        }
704 20
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
705 20
                        $offset = $strpos;
706
707 20
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
708 16
                            $operator = $matches[1];
709 16
                            $offset += \strlen($matches[0]);
710
                        }
711
                    }
712 20
                    break;
713
714
                default:
715 27
                    if ('ET' == substr($text_part, $offset, 2)) {
716 1
                        break;
717 27
                    } elseif (preg_match(
718 27
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
719 27
                        substr($text_part, $offset),
720
                        $matches
721
                    )
722
                    ) {
723 27
                        $operator = trim($matches['id']);
724 27
                        $command = trim($matches['data']);
725 27
                        $offset += \strlen($matches[0]);
726 23
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
727 20
                        $type = 'n';
728 20
                        $command = trim($matches[0]);
729 20
                        $offset += \strlen($matches[0]);
730 20
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
731 20
                        $type = '';
732 20
                        $operator = $matches[1];
733 20
                        $command = '';
734 20
                        $offset += \strlen($matches[0]);
735
                    }
736
            }
737
738 27
            if (false !== $command) {
739 27
                $commands[] = [
740 27
                    self::TYPE => $type,
741 27
                    self::OPERATOR => $operator,
742 27
                    self::COMMAND => $command,
743
                ];
744
            } else {
745 23
                break;
746
            }
747
        }
748
749 27
        return $commands;
750
    }
751
752 40
    public static function factory(
753
        Document $document,
754
        Header $header,
755
        ?string $content,
756
        ?Config $config = null
757
    ): self {
758 40
        switch ($header->get('Type')->getContent()) {
759 40
            case 'XObject':
760 9
                switch ($header->get('Subtype')->getContent()) {
761 9
                    case 'Image':
762 4
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

762
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
763
764 7
                    case 'Form':
765 7
                        return new Form($document, $header, $content, $config);
766
                }
767
768
                return new self($document, $header, $content, $config);
769
770 40
            case 'Pages':
771 39
                return new Pages($document, $header, $content, $config);
772
773 40
            case 'Page':
774 39
                return new Page($document, $header, $content, $config);
775
776 40
            case 'Encoding':
777 5
                return new Encoding($document, $header, $content, $config);
778
779 40
            case 'Font':
780 39
                $subtype = $header->get('Subtype')->getContent();
781 39
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
782
783 39
                if (class_exists($classname)) {
784 39
                    return new $classname($document, $header, $content, $config);
785
                }
786
787
                return new Font($document, $header, $content, $config);
788
789
            default:
790 40
                return new self($document, $header, $content, $config);
791
        }
792
    }
793
794
    /**
795
     * Returns unique id identifying the object.
796
     */
797 18
    protected function getUniqueId(): string
798
    {
799 18
        return spl_object_hash($this);
800
    }
801
}
802