Passed
Pull Request — master (#457)
by
unknown
02:33
created

PDFObject::factory()   B

Complexity

Conditions 10
Paths 9

Size

Total Lines 39
Code Lines 22

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 10.0751

Importance

Changes 0
Metric Value
cc 10
eloc 22
c 0
b 0
f 0
nc 9
nop 4
dl 0
loc 39
ccs 20
cts 22
cp 0.9091
crap 10.0751
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 48
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 48
        $this->document = $document;
81 48
        $this->header = null !== $header ? $header : new Header();
82 48
        $this->content = $content;
83 48
        $this->config = $config;
84 48
    }
85
86 38
    public function init()
87
    {
88 38
    }
89
90 38
    public function getHeader(): ?Header
91
    {
92 38
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 38
    public function get(string $name)
99
    {
100 38
        return $this->header->get($name);
101
    }
102
103 36
    public function has(string $name): bool
104
    {
105 36
        return $this->header->has($name);
106
    }
107
108 2
    public function getDetails(bool $deep = true): array
109
    {
110 2
        return $this->header->getDetails($deep);
111
    }
112
113 29
    public function getContent(): ?string
114
    {
115 29
        return $this->content;
116
    }
117
118 23
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 23
        $char = $char[0];
121 23
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 23
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 23
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 23
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 23
        foreach ($matches[1] as $part) {
132 17
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 23
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 23
        foreach ($matches[1] as $part) {
138 14
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 23
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 23
            $content = '';
144 23
            $level = 0;
145 23
            foreach ($parts as $part) {
146 23
                if ('<' == $part) {
147 14
                    ++$level;
148
                }
149
150 23
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 23
                if ('>' == $part) {
153 14
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 23
        preg_match_all(
160 23
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 23
            \PREG_OFFSET_CAPTURE
164
        );
165 23
        foreach ($matches[1] as $part) {
166 3
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 23
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 23
        foreach ($matches[1] as $part) {
171 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 23
        return $content;
175
    }
176
177 22
    public function getSectionsText(?string $content): array
178
    {
179 22
        $sections = [];
180 22
        $content = ' '.$content.' ';
181 22
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 22
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 22
            foreach ($matches[2] as $pos => $part) {
186 22
                $text = $part[0];
187 22
                if ('' === $text) {
188
                    continue;
189
                }
190 22
                $offset = $part[1];
191 22
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 22
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 22
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 22
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 22
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 4
            foreach ($matches[1] as $part) {
207 4
                $text = $part[0];
208 4
                $offset = $part[1];
209 4
                $section = substr($content, $offset, \strlen($text));
210
211 4
                $sections[] = $section;
212
            }
213
        }
214
215 22
        return $sections;
216
    }
217
218 13
    private function getDefaultFont(Page $page = null): Font
219
    {
220 13
        $fonts = [];
221 13
        if (null !== $page) {
222 13
            $fonts = $page->getFonts();
223
        }
224
225 13
        $fonts[] = $this->document->getFirstFont();
226
227 13
        if (\count($fonts) > 0) {
228 13
            return reset($fonts);
229
        }
230
231
        return new Font($this->document, null, null, $this->config);
232
    }
233
234
    /**
235
     * @throws \Exception
236
     */
237 13
    public function getText(?Page $page = null): string
238
    {
239 13
        $result = '';
240 13
        $sections = $this->getSectionsText($this->content);
241 13
        $current_font = $this->getDefaultFont($page);
242 13
        $clipped_font = $current_font;
243
244 13
        $current_position_td = ['x' => false, 'y' => false];
245 13
        $current_position_tm = ['x' => false, 'y' => false];
246
247 13
        self::$recursionStack[] = $this->getUniqueId();
248
249 13
        foreach ($sections as $section) {
250 13
            $commands = $this->getCommandsText($section);
251 13
            $reverse_text = false;
252 13
            $text = '';
253
254 13
            foreach ($commands as $command) {
255 13
                switch ($command[self::OPERATOR]) {
256 13
                    case 'BMC':
257 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
258 1
                            $reverse_text = true;
259
                        }
260 1
                        break;
261
262
                    // set character spacing
263 13
                    case 'Tc':
264 2
                        break;
265
266
                    // move text current point
267 13
                    case 'Td':
268 10
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
269 10
                        $y = array_pop($args);
270 10
                        $x = array_pop($args);
271 10
                        if (((float) $x <= 0) ||
272 10
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
273
                        ) {
274
                            // vertical offset
275 6
                            $text .= "\n";
276 10
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
277 10
                                $current_position_td['x']
278
                            )
279
                        ) {
280
                            // horizontal offset
281 7
                            $text .= ' ';
282
                        }
283 10
                        $current_position_td = ['x' => $x, 'y' => $y];
284 10
                        break;
285
286
                    // move text current point and set leading
287 13
                    case 'TD':
288 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
289 1
                        $y = array_pop($args);
290 1
                        $x = array_pop($args);
291 1
                        if ((float) $y < 0) {
292 1
                            $text .= "\n";
293
                        } elseif ((float) $x <= 0) {
294
                            $text .= ' ';
295
                        }
296 1
                        break;
297
298 13
                    case 'Tf':
299 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
300 13
                        $id = trim($id, '/');
301 13
                        if (null !== $page) {
302 13
                            $new_font = $page->getFont($id);
303
                            // If an invalid font ID is given, do not update the font.
304
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
305
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
306
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
307
                            // But we want to make sure that malformed PDFs do not simply crash.
308 13
                            if (null !== $new_font) {
309 12
                                $current_font = $new_font;
310
                            }
311
                        }
312 13
                        break;
313
314 13
                    case 'Q':
315
                        // Use clip: restore font.
316 3
                        $current_font = $clipped_font;
317 3
                        break;
318
319 13
                    case 'q':
320
                        // Use clip: save font.
321 3
                        $clipped_font = $current_font;
322 3
                        break;
323
324 13
                    case "'":
325 13
                    case 'Tj':
326 8
                        $command[self::COMMAND] = [$command];
327
                        // no break
328 13
                    case 'TJ':
329 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
330 13
                        $text .= $sub_text;
331 13
                        break;
332
333
                    // set leading
334 11
                    case 'TL':
335 1
                        $text .= ' ';
336 1
                        break;
337
338 11
                    case 'Tm':
339 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
340 11
                        $y = array_pop($args);
341 11
                        $x = array_pop($args);
342 11
                        if (false !== $current_position_tm['x']) {
343 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
344 11
                            if ($delta > 10) {
345 9
                                $text .= "\t";
346
                            }
347
                        }
348 11
                        if (false !== $current_position_tm['y']) {
349 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
350 11
                            if ($delta > 10) {
351 7
                                $text .= "\n";
352
                            }
353
                        }
354 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
355 11
                        break;
356
357
                    // set super/subscripting text rise
358 8
                    case 'Ts':
359
                        break;
360
361
                    // set word spacing
362 8
                    case 'Tw':
363 1
                        break;
364
365
                    // set horizontal scaling
366 8
                    case 'Tz':
367
                        $text .= "\n";
368
                        break;
369
370
                    // move to start of next line
371 8
                    case 'T*':
372 2
                        $text .= "\n";
373 2
                        break;
374
375 7
                    case 'Da':
376
                        break;
377
378 7
                    case 'Do':
379 4
                        if (null !== $page) {
380 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
381 4
                            $id = trim(array_pop($args), '/ ');
382 4
                            $xobject = $page->getXObject($id);
383
384
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
385 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
386
                                // Not a circular reference.
387 4
                                $text .= $xobject->getText($page);
388
                            }
389
                        }
390 4
                        break;
391
392 5
                    case 'rg':
393 5
                    case 'RG':
394 1
                        break;
395
396 5
                    case 're':
397
                        break;
398
399 5
                    case 'co':
400
                        break;
401
402 5
                    case 'cs':
403
                        break;
404
405 5
                    case 'gs':
406 3
                        break;
407
408 4
                    case 'en':
409
                        break;
410
411 4
                    case 'sc':
412 4
                    case 'SC':
413
                        break;
414
415 4
                    case 'g':
416 4
                    case 'G':
417 1
                        break;
418
419 3
                    case 'V':
420
                        break;
421
422 3
                    case 'vo':
423 3
                    case 'Vo':
424
                        break;
425
426
                    default:
427
                }
428
            }
429
430
            // Fix Hebrew and other reverse text oriented languages.
431
            // @see: https://github.com/smalot/pdfparser/issues/398
432 13
            if ($reverse_text) {
433 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

433
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
434 1
                $text = implode('', array_reverse($chars));
435
            }
436
437 13
            $result .= $text;
438
        }
439
440 13
        return $result.' ';
441
    }
442
443
    /**
444
     * @throws \Exception
445
     */
446 4
    public function getTextArray(?Page $page = null): array
447
    {
448 4
        $text = [];
449 4
        $sections = $this->getSectionsText($this->content);
450 4
        $current_font = new Font($this->document, null, null, $this->config);
451
452 4
        foreach ($sections as $section) {
453 4
            $commands = $this->getCommandsText($section);
454
455 4
            foreach ($commands as $command) {
456 4
                switch ($command[self::OPERATOR]) {
457
                    // set character spacing
458 4
                    case 'Tc':
459 2
                        break;
460
461
                    // move text current point
462 4
                    case 'Td':
463 4
                        break;
464
465
                    // move text current point and set leading
466 4
                    case 'TD':
467
                        break;
468
469 4
                    case 'Tf':
470 4
                        if (null !== $page) {
471 4
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
472 4
                            $id = trim($id, '/');
473 4
                            $current_font = $page->getFont($id);
474
                        }
475 4
                        break;
476
477 4
                    case "'":
478 4
                    case 'Tj':
479 3
                        $command[self::COMMAND] = [$command];
480
                        // no break
481 4
                    case 'TJ':
482 4
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
483 4
                        $text[] = $sub_text;
484 4
                        break;
485
486
                    // set leading
487 3
                    case 'TL':
488 2
                        break;
489
490 3
                    case 'Tm':
491 2
                        break;
492
493
                    // set super/subscripting text rise
494 3
                    case 'Ts':
495
                        break;
496
497
                    // set word spacing
498 3
                    case 'Tw':
499 1
                        break;
500
501
                    // set horizontal scaling
502 3
                    case 'Tz':
503
                        //$text .= "\n";
504
                        break;
505
506
                    // move to start of next line
507 3
                    case 'T*':
508
                        //$text .= "\n";
509 2
                        break;
510
511 3
                    case 'Da':
512
                        break;
513
514 3
                    case 'Do':
515
                        if (null !== $page) {
516
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
517
                            $id = trim(array_pop($args), '/ ');
518
                            if ($xobject = $page->getXObject($id)) {
519
                                $text[] = $xobject->getText($page);
520
                            }
521
                        }
522
                        break;
523
524 3
                    case 'rg':
525 3
                    case 'RG':
526 2
                        break;
527
528 3
                    case 're':
529
                        break;
530
531 3
                    case 'co':
532
                        break;
533
534 3
                    case 'cs':
535
                        break;
536
537 3
                    case 'gs':
538
                        break;
539
540 3
                    case 'en':
541
                        break;
542
543 3
                    case 'sc':
544 3
                    case 'SC':
545
                        break;
546
547 3
                    case 'g':
548 3
                    case 'G':
549 2
                        break;
550
551 1
                    case 'V':
552
                        break;
553
554 1
                    case 'vo':
555 1
                    case 'Vo':
556
                        break;
557
558
                    default:
559
                }
560
            }
561
        }
562
563 4
        return $text;
564
    }
565
566 22
    public function getCommandsText(string $text_part, int &$offset = 0): array
567
    {
568 22
        $commands = $matches = [];
569
570 22
        while ($offset < \strlen($text_part)) {
571 22
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
572 22
            $char = $text_part[$offset];
573
574 22
            $operator = '';
575 22
            $type = '';
576 22
            $command = false;
577
578 22
            switch ($char) {
579 22
                case '/':
580 22
                    $type = $char;
581 22
                    if (preg_match(
582 22
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
583 22
                        substr($text_part, $offset),
584
                        $matches
585
                    )
586
                    ) {
587 22
                        $operator = $matches[2];
588 22
                        $command = $matches[1];
589 22
                        $offset += \strlen($matches[0]);
590 7
                    } elseif (preg_match(
591 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
592 7
                        substr($text_part, $offset),
593
                        $matches
594
                    )
595
                    ) {
596 7
                        $operator = $matches[2];
597 7
                        $command = $matches[1];
598 7
                        $offset += \strlen($matches[0]);
599
                    }
600 22
                    break;
601
602 22
                case '[':
603 22
                case ']':
604
                    // array object
605 20
                    $type = $char;
606 20
                    if ('[' == $char) {
607 20
                        ++$offset;
608
                        // get elements
609 20
                        $command = $this->getCommandsText($text_part, $offset);
610
611 20
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
612 20
                            $operator = trim($matches[0]);
613 20
                            $offset += \strlen($matches[0]);
614
                        }
615
                    } else {
616 20
                        ++$offset;
617 20
                        break;
618
                    }
619 20
                    break;
620
621 22
                case '<':
622 22
                case '>':
623
                    // array object
624 10
                    $type = $char;
625 10
                    ++$offset;
626 10
                    if ('<' == $char) {
627 10
                        $strpos = strpos($text_part, '>', $offset);
628 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
629 10
                        $offset = $strpos + 1;
630
                    }
631
632 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
633 7
                        $operator = trim($matches[0]);
634 7
                        $offset += \strlen($matches[0]);
635
                    }
636 10
                    break;
637
638 22
                case '(':
639 22
                case ')':
640 15
                    ++$offset;
641 15
                    $type = $char;
642 15
                    $strpos = $offset;
643 15
                    if ('(' == $char) {
644 15
                        $open_bracket = 1;
645 15
                        while ($open_bracket > 0) {
646 15
                            if (!isset($text_part[$strpos])) {
647
                                break;
648
                            }
649 15
                            $ch = $text_part[$strpos];
650 15
                            switch ($ch) {
651 15
                                case '\\':
652
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
653
                                    // skip next character
654 10
                                    ++$strpos;
655 10
                                    break;
656
657 15
                                case '(':
658
                                 // LEFT PARENHESIS (28h)
659
                                    ++$open_bracket;
660
                                    break;
661
662 15
                                case ')':
663
                                 // RIGHT PARENTHESIS (29h)
664 15
                                    --$open_bracket;
665 15
                                    break;
666
                            }
667 15
                            ++$strpos;
668
                        }
669 15
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
670 15
                        $offset = $strpos;
671
672 15
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
673 11
                            $operator = $matches[1];
674 11
                            $offset += \strlen($matches[0]);
675
                        }
676
                    }
677 15
                    break;
678
679
                default:
680 22
                    if ('ET' == substr($text_part, $offset, 2)) {
681 1
                        break;
682 22
                    } elseif (preg_match(
683 22
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
684 22
                        substr($text_part, $offset),
685
                        $matches
686
                    )
687
                    ) {
688 22
                        $operator = trim($matches['id']);
689 22
                        $command = trim($matches['data']);
690 22
                        $offset += \strlen($matches[0]);
691 18
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
692 17
                        $type = 'n';
693 17
                        $command = trim($matches[0]);
694 17
                        $offset += \strlen($matches[0]);
695 11
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
696 11
                        $type = '';
697 11
                        $operator = $matches[1];
698 11
                        $command = '';
699 11
                        $offset += \strlen($matches[0]);
700
                    }
701
            }
702
703 22
            if (false !== $command) {
704 22
                $commands[] = [
705 22
                    self::TYPE => $type,
706 22
                    self::OPERATOR => $operator,
707 22
                    self::COMMAND => $command,
708
                ];
709
            } else {
710 20
                break;
711
            }
712
        }
713
714 22
        return $commands;
715
    }
716
717 31
    public static function factory(
718
        Document $document,
719
        Header $header,
720
        ?string $content,
721
        ?Config $config = null
722
    ): self {
723 31
        switch ($header->get('Type')->getContent()) {
724 31
            case 'XObject':
725 5
                switch ($header->get('Subtype')->getContent()) {
726 5
                    case 'Image':
727 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

727
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
728
729 3
                    case 'Form':
730 3
                        return new Form($document, $header, $content, $config);
731
                }
732
733
                return new self($document, $header, $content, $config);
734
735 31
            case 'Pages':
736 30
                return new Pages($document, $header, $content, $config);
737
738 31
            case 'Page':
739 30
                return new Page($document, $header, $content, $config);
740
741 31
            case 'Encoding':
742 5
                return new Encoding($document, $header, $content, $config);
743
744 31
            case 'Font':
745 30
                $subtype = $header->get('Subtype')->getContent();
746 30
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
747
748 30
                if (class_exists($classname)) {
749 30
                    return new $classname($document, $header, $content, $config);
750
                }
751
752
                return new Font($document, $header, $content, $config);
753
754
            default:
755 31
                return new self($document, $header, $content, $config);
756
        }
757
    }
758
759
    /**
760
     * Returns unique id identifying the object.
761
     */
762 13
    protected function getUniqueId(): string
763
    {
764 13
        return spl_object_hash($this);
765
    }
766
}
767