Passed
Pull Request — master (#544)
by
unknown
02:16
created

PDFObject::factory()   B

Complexity

Conditions 10
Paths 9

Size

Total Lines 39
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 10.0751

Importance

Changes 0
Metric Value
cc 10
eloc 22
c 0
b 0
f 0
nc 9
nop 4
dl 0
loc 39
ccs 20
cts 22
cp 0.9091
crap 10.0751
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 59
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 59
        $this->document = $document;
81 59
        $this->header = $header ?? new Header();
82 59
        $this->content = $content;
83 59
        $this->config = $config;
84 59
    }
85
86 46
    public function init()
87
    {
88 46
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 46
    public function getHeader(): ?Header
96
    {
97 46
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 47
    public function get(string $name)
109
    {
110 47
        return $this->header->get($name);
111
    }
112
113 44
    public function has(string $name): bool
114
    {
115 44
        return $this->header->has($name);
116
    }
117
118 3
    public function getDetails(bool $deep = true): array
119
    {
120 3
        return $this->header->getDetails($deep);
121
    }
122
123 36
    public function getContent(): ?string
124
    {
125 36
        return $this->content;
126
    }
127
128 29
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 29
        $char = $char[0];
131 29
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 29
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 29
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 29
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 29
        foreach ($matches[1] as $part) {
142 20
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 29
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 29
        foreach ($matches[1] as $part) {
148 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 29
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 29
            $content = '';
154 29
            $level = 0;
155 29
            foreach ($parts as $part) {
156 29
                if ('<' == $part) {
157 16
                    ++$level;
158
                }
159
160 29
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 29
                if ('>' == $part) {
163 16
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 29
        preg_match_all(
170 29
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 29
            \PREG_OFFSET_CAPTURE
174
        );
175 29
        foreach ($matches[1] as $part) {
176 5
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 29
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 29
        foreach ($matches[1] as $part) {
181 9
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 29
        return $content;
185
    }
186
187 28
    public function getSectionsText(?string $content): array
188
    {
189 28
        $sections = [];
190 28
        $content = ' '.$content.' ';
191 28
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 28
        if (preg_match_all('/(\sQ)?\s+(.*?)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+(.*?)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 26
            foreach ($matches[3] as $pos => $part) {
196 26
                $text = $part[0];
197 26
                if ('' === $text) {
198
                    continue;
199
                }
200 26
                $offset = $part[1];
201 26
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 26
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Tx commands which before BT.
207
                // @see: https://github.com/smalot/pdfparser/issues/542
208 26
                if (!empty($matches[2][$pos][0]) && preg_match('/\sTf\s/', $matches[2][$pos][0])) {
209 1
                    $section = trim($matches[2][$pos][0].$section);
210
                }
211
212
                // Add Q and q flags if detected around BT/ET.
213
                // @see: https://github.com/smalot/pdfparser/issues/387
214 26
                if (empty($matches[2][$pos][0]) || !preg_match('/\sq\s/', $matches[2][$pos][0])) {
215 19
                    $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[4][$pos][0]) ? "\nq" : '');
216
                }
217
218 26
                $sections[] = $section;
219
            }
220
        }
221
222
        // Extract 'do' commands.
223 28
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
224 4
            foreach ($matches[1] as $part) {
225 4
                $text = $part[0];
226 4
                $offset = $part[1];
227 4
                $section = substr($content, $offset, \strlen($text));
228
229 4
                $sections[] = $section;
230
            }
231
        }
232
233 28
        return $sections;
234
    }
235
236 17
    private function getDefaultFont(Page $page = null): Font
237
    {
238 17
        $fonts = [];
239 17
        if (null !== $page) {
240 16
            $fonts = $page->getFonts();
241
        }
242
243 17
        $firstFont = $this->document->getFirstFont();
244 17
        if (null !== $firstFont) {
245 15
            $fonts[] = $firstFont;
246
        }
247
248 17
        if (\count($fonts) > 0) {
249 15
            return reset($fonts);
250
        }
251
252 2
        return new Font($this->document, null, null, $this->config);
253
    }
254
255
    /**
256
     * @throws \Exception
257
     */
258 17
    public function getText(?Page $page = null): string
259
    {
260 17
        $result = '';
261 17
        $sections = $this->getSectionsText($this->content);
262 17
        $current_font = $this->getDefaultFont($page);
263 17
        $clipped_font = $current_font;
264
265 17
        $current_position_td = ['x' => false, 'y' => false];
266 17
        $current_position_tm = ['x' => false, 'y' => false];
267
268 17
        self::$recursionStack[] = $this->getUniqueId();
269
270 17
        foreach ($sections as $section) {
271 15
            $commands = $this->getCommandsText($section);
272 15
            $reverse_text = false;
273 15
            $text = '';
274
275 15
            foreach ($commands as $command) {
276 15
                switch ($command[self::OPERATOR]) {
277 15
                    case 'BMC':
278 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
279 1
                            $reverse_text = true;
280
                        }
281 1
                        break;
282
283
                    // set character spacing
284 15
                    case 'Tc':
285 3
                        break;
286
287
                    // move text current point
288 15
                    case 'Td':
289 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
290 12
                        $y = array_pop($args);
291 12
                        $x = array_pop($args);
292 12
                        if (((float) $x <= 0) ||
293 12
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
294
                        ) {
295
                            // vertical offset
296 8
                            $text .= "\n";
297 12
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
298 12
                                $current_position_td['x']
299
                            )
300
                        ) {
301 9
                            $text .= $this->config->getHorizontalOffset();
302
                        }
303 12
                        $current_position_td = ['x' => $x, 'y' => $y];
304 12
                        break;
305
306
                    // move text current point and set leading
307 15
                    case 'TD':
308 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
309 1
                        $y = array_pop($args);
310 1
                        $x = array_pop($args);
311 1
                        if ((float) $y < 0) {
312 1
                            $text .= "\n";
313
                        } elseif ((float) $x <= 0) {
314
                            $text .= ' ';
315
                        }
316 1
                        break;
317
318 15
                    case 'Tf':
319 15
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
320 15
                        $id = trim($id, '/');
321 15
                        if (null !== $page) {
322 15
                            $new_font = $page->getFont($id);
323
                            // If an invalid font ID is given, do not update the font.
324
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
325
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
326
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
327
                            // But we want to make sure that malformed PDFs do not simply crash.
328 15
                            if (null !== $new_font) {
329 14
                                $current_font = $new_font;
330
                            }
331
                        }
332 15
                        break;
333
334 15
                    case 'Q':
335
                        // Use clip: restore font.
336 4
                        $current_font = $clipped_font;
337 4
                        break;
338
339 15
                    case 'q':
340
                        // Use clip: save font.
341 4
                        $clipped_font = $current_font;
342 4
                        break;
343
344 15
                    case "'":
345 15
                    case 'Tj':
346 10
                        $command[self::COMMAND] = [$command];
347
                        // no break
348 14
                    case 'TJ':
349 15
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
350 15
                        $text .= $sub_text;
351 15
                        break;
352
353
                    // set leading
354 12
                    case 'TL':
355 1
                        $text .= ' ';
356 1
                        break;
357
358 12
                    case 'Tm':
359 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
360 12
                        $y = array_pop($args);
361 12
                        $x = array_pop($args);
362 12
                        if (false !== $current_position_tm['x']) {
363 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
364 12
                            if ($delta > 10) {
365 10
                                $text .= "\t";
366
                            }
367
                        }
368 12
                        if (false !== $current_position_tm['y']) {
369 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
370 12
                            if ($delta > 10) {
371 8
                                $text .= "\n";
372
                            }
373
                        }
374 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
375 12
                        break;
376
377
                    // set super/subscripting text rise
378 9
                    case 'Ts':
379
                        break;
380
381
                    // set word spacing
382 9
                    case 'Tw':
383 2
                        break;
384
385
                    // set horizontal scaling
386 9
                    case 'Tz':
387
                        $text .= "\n";
388
                        break;
389
390
                    // move to start of next line
391 9
                    case 'T*':
392 2
                        $text .= "\n";
393 2
                        break;
394
395 8
                    case 'Da':
396
                        break;
397
398 8
                    case 'Do':
399 4
                        if (null !== $page) {
400 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
401 4
                            $id = trim(array_pop($args), '/ ');
402 4
                            $xobject = $page->getXObject($id);
403
404
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
405 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
406
                                // Not a circular reference.
407 4
                                $text .= $xobject->getText($page);
408
                            }
409
                        }
410 4
                        break;
411
412 6
                    case 'rg':
413 6
                    case 'RG':
414 1
                        break;
415
416 6
                    case 're':
417 1
                        break;
418
419 6
                    case 'co':
420
                        break;
421
422 6
                    case 'cs':
423 1
                        break;
424
425 6
                    case 'gs':
426 3
                        break;
427
428 6
                    case 'en':
429
                        break;
430
431 6
                    case 'sc':
432 6
                    case 'SC':
433
                        break;
434
435 6
                    case 'g':
436 6
                    case 'G':
437 2
                        break;
438
439 5
                    case 'V':
440
                        break;
441
442 5
                    case 'vo':
443 5
                    case 'Vo':
444
                        break;
445
446
                    default:
447
                }
448
            }
449
450
            // Fix Hebrew and other reverse text oriented languages.
451
            // @see: https://github.com/smalot/pdfparser/issues/398
452 15
            if ($reverse_text) {
453 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

453
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
454 1
                $text = implode('', array_reverse($chars));
455
            }
456
457 15
            $result .= $text;
458
        }
459
460 17
        return $result.' ';
461
    }
462
463
    /**
464
     * @throws \Exception
465
     */
466 6
    public function getTextArray(?Page $page = null): array
467
    {
468 6
        $text = [];
469 6
        $sections = $this->getSectionsText($this->content);
470 6
        $current_font = new Font($this->document, null, null, $this->config);
471
472 6
        foreach ($sections as $section) {
473 6
            $commands = $this->getCommandsText($section);
474
475 6
            foreach ($commands as $command) {
476 6
                switch ($command[self::OPERATOR]) {
477
                    // set character spacing
478 6
                    case 'Tc':
479 3
                        break;
480
481
                    // move text current point
482 6
                    case 'Td':
483 6
                        break;
484
485
                    // move text current point and set leading
486 6
                    case 'TD':
487
                        break;
488
489 6
                    case 'Tf':
490 6
                        if (null !== $page) {
491 6
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
492 6
                            $id = trim($id, '/');
493 6
                            $current_font = $page->getFont($id);
494
                        }
495 6
                        break;
496
497 6
                    case "'":
498 6
                    case 'Tj':
499 5
                        $command[self::COMMAND] = [$command];
500
                        // no break
501 6
                    case 'TJ':
502 6
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
503 6
                        $text[] = $sub_text;
504 6
                        break;
505
506
                    // set leading
507 5
                    case 'TL':
508 4
                        break;
509
510 5
                    case 'Tm':
511 4
                        break;
512
513
                    // set super/subscripting text rise
514 5
                    case 'Ts':
515
                        break;
516
517
                    // set word spacing
518 5
                    case 'Tw':
519 2
                        break;
520
521
                    // set horizontal scaling
522 5
                    case 'Tz':
523
                        //$text .= "\n";
524
                        break;
525
526
                    // move to start of next line
527 5
                    case 'T*':
528
                        //$text .= "\n";
529 4
                        break;
530
531 4
                    case 'Da':
532
                        break;
533
534 4
                    case 'Do':
535
                        if (null !== $page) {
536
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
537
                            $id = trim(array_pop($args), '/ ');
538
                            if ($xobject = $page->getXObject($id)) {
539
                                $text[] = $xobject->getText($page);
540
                            }
541
                        }
542
                        break;
543
544 4
                    case 'rg':
545 4
                    case 'RG':
546 2
                        break;
547
548 4
                    case 're':
549
                        break;
550
551 4
                    case 'co':
552
                        break;
553
554 4
                    case 'cs':
555
                        break;
556
557 4
                    case 'gs':
558 1
                        break;
559
560 4
                    case 'en':
561
                        break;
562
563 4
                    case 'sc':
564 4
                    case 'SC':
565
                        break;
566
567 4
                    case 'g':
568 4
                    case 'G':
569 2
                        break;
570
571 2
                    case 'V':
572
                        break;
573
574 2
                    case 'vo':
575 2
                    case 'Vo':
576
                        break;
577
578
                    default:
579
                }
580
            }
581
        }
582
583 6
        return $text;
584
    }
585
586 26
    public function getCommandsText(string $text_part, int &$offset = 0): array
587
    {
588 26
        $commands = $matches = [];
589
590 26
        while ($offset < \strlen($text_part)) {
591 26
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
592 26
            $char = $text_part[$offset];
593
594 26
            $operator = '';
595 26
            $type = '';
596 26
            $command = false;
597
598 26
            switch ($char) {
599 26
                case '/':
600 26
                    $type = $char;
601 26
                    if (preg_match(
602 26
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
603 26
                        substr($text_part, $offset),
604
                        $matches
605
                    )
606
                    ) {
607 26
                        $operator = $matches[2];
608 26
                        $command = $matches[1];
609 26
                        $offset += \strlen($matches[0]);
610 9
                    } elseif (preg_match(
611 9
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
612 9
                        substr($text_part, $offset),
613
                        $matches
614
                    )
615
                    ) {
616 9
                        $operator = $matches[2];
617 9
                        $command = $matches[1];
618 9
                        $offset += \strlen($matches[0]);
619
                    }
620 26
                    break;
621
622 26
                case '[':
623 26
                case ']':
624
                    // array object
625 23
                    $type = $char;
626 23
                    if ('[' == $char) {
627 23
                        ++$offset;
628
                        // get elements
629 23
                        $command = $this->getCommandsText($text_part, $offset);
630
631 23
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
632 23
                            $operator = trim($matches[0]);
633 23
                            $offset += \strlen($matches[0]);
634
                        }
635
                    } else {
636 23
                        ++$offset;
637 23
                        break;
638
                    }
639 23
                    break;
640
641 26
                case '<':
642 26
                case '>':
643
                    // array object
644 12
                    $type = $char;
645 12
                    ++$offset;
646 12
                    if ('<' == $char) {
647 12
                        $strpos = strpos($text_part, '>', $offset);
648 12
                        $command = substr($text_part, $offset, ($strpos - $offset));
649 12
                        $offset = $strpos + 1;
650
                    }
651
652 12
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
653 9
                        $operator = trim($matches[0]);
654 9
                        $offset += \strlen($matches[0]);
655
                    }
656 12
                    break;
657
658 26
                case '(':
659 26
                case ')':
660 19
                    ++$offset;
661 19
                    $type = $char;
662 19
                    $strpos = $offset;
663 19
                    if ('(' == $char) {
664 19
                        $open_bracket = 1;
665 19
                        while ($open_bracket > 0) {
666 19
                            if (!isset($text_part[$strpos])) {
667
                                break;
668
                            }
669 19
                            $ch = $text_part[$strpos];
670 19
                            switch ($ch) {
671 19
                                case '\\':
672
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
673
                                    // skip next character
674 13
                                    ++$strpos;
675 13
                                    break;
676
677 19
                                case '(':
678
                                 // LEFT PARENHESIS (28h)
679
                                    ++$open_bracket;
680
                                    break;
681
682 19
                                case ')':
683
                                 // RIGHT PARENTHESIS (29h)
684 19
                                    --$open_bracket;
685 19
                                    break;
686
                            }
687 19
                            ++$strpos;
688
                        }
689 19
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
690 19
                        $offset = $strpos;
691
692 19
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
693 15
                            $operator = $matches[1];
694 15
                            $offset += \strlen($matches[0]);
695
                        }
696
                    }
697 19
                    break;
698
699
                default:
700 26
                    if ('ET' == substr($text_part, $offset, 2)) {
701 1
                        break;
702 26
                    } elseif (preg_match(
703 26
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
704 26
                        substr($text_part, $offset),
705
                        $matches
706
                    )
707
                    ) {
708 26
                        $operator = trim($matches['id']);
709 26
                        $command = trim($matches['data']);
710 26
                        $offset += \strlen($matches[0]);
711 21
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
712 20
                        $type = 'n';
713 20
                        $command = trim($matches[0]);
714 20
                        $offset += \strlen($matches[0]);
715 15
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
716 15
                        $type = '';
717 15
                        $operator = $matches[1];
718 15
                        $command = '';
719 15
                        $offset += \strlen($matches[0]);
720
                    }
721
            }
722
723 26
            if (false !== $command) {
724 26
                $commands[] = [
725 26
                    self::TYPE => $type,
726 26
                    self::OPERATOR => $operator,
727 26
                    self::COMMAND => $command,
728
                ];
729
            } else {
730 23
                break;
731
            }
732
        }
733
734 26
        return $commands;
735
    }
736
737 39
    public static function factory(
738
        Document $document,
739
        Header $header,
740
        ?string $content,
741
        ?Config $config = null
742
    ): self {
743 39
        switch ($header->get('Type')->getContent()) {
744 39
            case 'XObject':
745 8
                switch ($header->get('Subtype')->getContent()) {
746 8
                    case 'Image':
747 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

747
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
748
749 6
                    case 'Form':
750 6
                        return new Form($document, $header, $content, $config);
751
                }
752
753
                return new self($document, $header, $content, $config);
754
755 39
            case 'Pages':
756 38
                return new Pages($document, $header, $content, $config);
757
758 39
            case 'Page':
759 38
                return new Page($document, $header, $content, $config);
760
761 39
            case 'Encoding':
762 5
                return new Encoding($document, $header, $content, $config);
763
764 39
            case 'Font':
765 38
                $subtype = $header->get('Subtype')->getContent();
766 38
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
767
768 38
                if (class_exists($classname)) {
769 38
                    return new $classname($document, $header, $content, $config);
770
                }
771
772
                return new Font($document, $header, $content, $config);
773
774
            default:
775 39
                return new self($document, $header, $content, $config);
776
        }
777
    }
778
779
    /**
780
     * Returns unique id identifying the object.
781
     */
782 17
    protected function getUniqueId(): string
783
    {
784 17
        return spl_object_hash($this);
785
    }
786
}
787