Test Failed
Pull Request — master (#510)
by Jeremy
04:34 queued 02:12
created

PDFObject::has()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 3
Code Lines 1

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 2
CRAP Score 1

Importance

Changes 0
Metric Value
eloc 1
dl 0
loc 3
rs 10
c 0
b 0
f 0
ccs 2
cts 2
cp 1
cc 1
nc 1
nop 1
crap 1
1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 55
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 55
        $this->document = $document;
81 55
        $this->header = null !== $header ? $header : new Header();
82 55
        $this->content = $content;
83 55
        $this->config = $config;
84 55
    }
85
86 42
    public function init()
87
    {
88 42
    }
89
90 42
    public function getHeader(): ?Header
91
    {
92 42
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 43
    public function get(string $name)
99
    {
100 43
        return $this->header->get($name);
101
    }
102
103 40
    public function has(string $name): bool
104
    {
105 40
        return $this->header->has($name);
106
    }
107
108
    public function getDetails(bool $deep = true): array
109
    {
110
        return $this->header->getDetails($deep);
111
    }
112
113 31
    public function getContent(): ?string
114
    {
115 31
        return $this->content;
116
    }
117
118 26
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 26
        $char = $char[0];
121 26
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 26
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 26
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 26
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 26
        foreach ($matches[1] as $part) {
132 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 26
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 26
        foreach ($matches[1] as $part) {
138 15
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 26
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 26
            $content = '';
144 26
            $level = 0;
145 26
            foreach ($parts as $part) {
146 26
                if ('<' == $part) {
147 14
                    ++$level;
148
                }
149
150 26
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 26
                if ('>' == $part) {
153 14
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 26
        preg_match_all(
160 26
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 26
            \PREG_OFFSET_CAPTURE
164
        );
165 26
        foreach ($matches[1] as $part) {
166 4
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 26
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 26
        foreach ($matches[1] as $part) {
171 8
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 26
        return $content;
175
    }
176
177 25
    public function getSectionsText(?string $content): array
178
    {
179 25
        $sections = [];
180 25
        $content = ' '.$content.' ';
181 25
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 25
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 23
            foreach ($matches[2] as $pos => $part) {
186 23
                $text = $part[0];
187 23
                if ('' === $text) {
188
                    continue;
189
                }
190 23
                $offset = $part[1];
191 23
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 23
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 23
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 23
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 25
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 3
            foreach ($matches[1] as $part) {
207 3
                $text = $part[0];
208 3
                $offset = $part[1];
209 3
                $section = substr($content, $offset, \strlen($text));
210
211 3
                $sections[] = $section;
212
            }
213
        }
214
215 25
        return $sections;
216
    }
217
218 15
    private function getDefaultFont(Page $page = null): Font
219
    {
220 15
        $fonts = [];
221 15
        if (null !== $page) {
222 14
            $fonts = $page->getFonts();
223
        }
224
225 15
        $firstFont = $this->document->getFirstFont();
226 15
        if (null !== $firstFont) {
227 13
            $fonts[] = $firstFont;
228
        }
229
230 15
        if (\count($fonts) > 0) {
231 13
            return reset($fonts);
232
        }
233
234 2
        return new Font($this->document, null, null, $this->config);
235
    }
236
237
    /**
238
     * @throws \Exception
239
     */
240 15
    public function getText(?Page $page = null): string
241
    {
242 15
        $result = '';
243 15
        $sections = $this->getSectionsText($this->content);
244 15
        $current_font = $this->getDefaultFont($page);
245 15
        $clipped_font = $current_font;
246
247 15
        $current_position_td = ['x' => false, 'y' => false];
248 15
        $current_position_tm = ['x' => false, 'y' => false];
249
250 15
        self::$recursionStack[] = $this->getUniqueId();
251
252 15
        foreach ($sections as $section) {
253 13
            $commands = $this->getCommandsText($section);
254 13
            $reverse_text = false;
255 13
            $text = '';
256
257 13
            foreach ($commands as $command) {
258 13
                switch ($command[self::OPERATOR]) {
259 13
                    case 'BMC':
260 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
261 1
                            $reverse_text = true;
262
                        }
263 1
                        break;
264
265
                    // set character spacing
266 13
                    case 'Tc':
267 2
                        break;
268
269
                    // move text current point
270 13
                    case 'Td':
271 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
272 11
                        $y = array_pop($args);
273 11
                        $x = array_pop($args);
274 11
                        if (((float) $x <= 0) ||
275 11
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
276
                        ) {
277
                            // vertical offset
278 7
                            $text .= "\n";
279 11
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
280 11
                                $current_position_td['x']
281
                            )
282
                        ) {
283 8
                            $text .= $this->config->getHorizontalOffset();
284
                        }
285 11
                        $current_position_td = ['x' => $x, 'y' => $y];
286 11
                        break;
287
288
                    // move text current point and set leading
289 13
                    case 'TD':
290 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
291 1
                        $y = array_pop($args);
292 1
                        $x = array_pop($args);
293 1
                        if ((float) $y < 0) {
294 1
                            $text .= "\n";
295
                        } elseif ((float) $x <= 0) {
296
                            $text .= ' ';
297
                        }
298 1
                        break;
299
300 13
                    case 'Tf':
301 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
302 13
                        $id = trim($id, '/');
303 13
                        if (null !== $page) {
304 13
                            $new_font = $page->getFont($id);
305
                            // If an invalid font ID is given, do not update the font.
306
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
307
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
308
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
309
                            // But we want to make sure that malformed PDFs do not simply crash.
310 13
                            if (null !== $new_font) {
311 12
                                $current_font = $new_font;
312
                            }
313
                        }
314 13
                        break;
315
316 13
                    case 'Q':
317
                        // Use clip: restore font.
318 3
                        $current_font = $clipped_font;
319 3
                        break;
320
321 13
                    case 'q':
322
                        // Use clip: save font.
323 3
                        $clipped_font = $current_font;
324 3
                        break;
325
326 13
                    case "'":
327 13
                    case 'Tj':
328 8
                        $command[self::COMMAND] = [$command];
329
                        // no break
330 13
                    case 'TJ':
331 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
332 13
                        $text .= $sub_text;
333 13
                        break;
334
335
                    // set leading
336 11
                    case 'TL':
337 1
                        $text .= ' ';
338 1
                        break;
339
340 11
                    case 'Tm':
341 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
342 11
                        $y = array_pop($args);
343 11
                        $x = array_pop($args);
344 11
                        if (false !== $current_position_tm['x']) {
345 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
346 11
                            if ($delta > 10) {
347 9
                                $text .= "\t";
348
                            }
349
                        }
350 11
                        if (false !== $current_position_tm['y']) {
351 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
352 11
                            if ($delta > 10) {
353 7
                                $text .= "\n";
354
                            }
355
                        }
356 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
357 11
                        break;
358
359
                    // set super/subscripting text rise
360 8
                    case 'Ts':
361
                        break;
362
363
                    // set word spacing
364 8
                    case 'Tw':
365 2
                        break;
366
367
                    // set horizontal scaling
368 8
                    case 'Tz':
369
                        $text .= "\n";
370
                        break;
371
372
                    // move to start of next line
373 8
                    case 'T*':
374 2
                        $text .= "\n";
375 2
                        break;
376
377 7
                    case 'Da':
378
                        break;
379
380 7
                    case 'Do':
381 3
                        if (null !== $page) {
382 3
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
383 3
                            $id = trim(array_pop($args), '/ ');
384 3
                            $xobject = $page->getXObject($id);
385
386
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
387 3
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
388
                                // Not a circular reference.
389 3
                                $text .= $xobject->getText($page);
390
                            }
391
                        }
392 3
                        break;
393
394 6
                    case 'rg':
395 6
                    case 'RG':
396 1
                        break;
397
398 6
                    case 're':
399
                        break;
400
401 6
                    case 'co':
402
                        break;
403
404 6
                    case 'cs':
405 1
                        break;
406
407 6
                    case 'gs':
408 3
                        break;
409
410 5
                    case 'en':
411
                        break;
412
413 5
                    case 'sc':
414 5
                    case 'SC':
415
                        break;
416
417 5
                    case 'g':
418 5
                    case 'G':
419 1
                        break;
420
421 4
                    case 'V':
422
                        break;
423
424 4
                    case 'vo':
425 4
                    case 'Vo':
426
                        break;
427
428
                    default:
429
                }
430
            }
431
432
            // Fix Hebrew and other reverse text oriented languages.
433
            // @see: https://github.com/smalot/pdfparser/issues/398
434 13
            if ($reverse_text) {
435 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

435
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
436 1
                $text = implode('', array_reverse($chars));
437
            }
438
439 13
            $result .= $text;
440
        }
441
442 15
        return $result.' ';
443
    }
444
445
    /**
446
     * @throws \Exception
447
     */
448 5
    public function getTextArray(?Page $page = null): array
449
    {
450 5
        $text = [];
451 5
        $sections = $this->getSectionsText($this->content);
452 5
        $current_font = new Font($this->document, null, null, $this->config);
453
454 5
        foreach ($sections as $section) {
455 5
            $commands = $this->getCommandsText($section);
456
457 5
            foreach ($commands as $command) {
458 5
                switch ($command[self::OPERATOR]) {
459
                    // set character spacing
460 5
                    case 'Tc':
461 2
                        break;
462
463
                    // move text current point
464 5
                    case 'Td':
465 5
                        break;
466
467
                    // move text current point and set leading
468 5
                    case 'TD':
469
                        break;
470
471 5
                    case 'Tf':
472 5
                        if (null !== $page) {
473 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
474 5
                            $id = trim($id, '/');
475 5
                            $current_font = $page->getFont($id);
476
                        }
477 5
                        break;
478
479 5
                    case "'":
480 5
                    case 'Tj':
481 4
                        $command[self::COMMAND] = [$command];
482
                        // no break
483 5
                    case 'TJ':
484 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
485 5
                        $text[] = $sub_text;
486 5
                        break;
487
488
                    // set leading
489 4
                    case 'TL':
490 3
                        break;
491
492 4
                    case 'Tm':
493 3
                        break;
494
495
                    // set super/subscripting text rise
496 4
                    case 'Ts':
497
                        break;
498
499
                    // set word spacing
500 4
                    case 'Tw':
501 1
                        break;
502
503
                    // set horizontal scaling
504 4
                    case 'Tz':
505
                        //$text .= "\n";
506
                        break;
507
508
                    // move to start of next line
509 4
                    case 'T*':
510
                        //$text .= "\n";
511 3
                        break;
512
513 3
                    case 'Da':
514
                        break;
515
516 3
                    case 'Do':
517
                        if (null !== $page) {
518
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
519
                            $id = trim(array_pop($args), '/ ');
520
                            if ($xobject = $page->getXObject($id)) {
521
                                $text[] = $xobject->getText($page);
522
                            }
523
                        }
524
                        break;
525
526 3
                    case 'rg':
527 3
                    case 'RG':
528 2
                        break;
529
530 3
                    case 're':
531
                        break;
532
533 3
                    case 'co':
534
                        break;
535
536 3
                    case 'cs':
537
                        break;
538
539 3
                    case 'gs':
540
                        break;
541
542 3
                    case 'en':
543
                        break;
544
545 3
                    case 'sc':
546 3
                    case 'SC':
547
                        break;
548
549 3
                    case 'g':
550 3
                    case 'G':
551 2
                        break;
552
553 1
                    case 'V':
554
                        break;
555
556 1
                    case 'vo':
557 1
                    case 'Vo':
558
                        break;
559
560
                    default:
561
                }
562
            }
563
        }
564
565 5
        return $text;
566
    }
567
568 23
    public function getCommandsText(string $text_part, int &$offset = 0): array
569
    {
570 23
        $commands = $matches = [];
571
572 23
        while ($offset < \strlen($text_part)) {
573 23
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
574 23
            $char = $text_part[$offset];
575
576 23
            $operator = '';
577 23
            $type = '';
578 23
            $command = false;
579
580 23
            switch ($char) {
581 23
                case '/':
582 23
                    $type = $char;
583 23
                    if (preg_match(
584 23
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
585 23
                        substr($text_part, $offset),
586
                        $matches
587
                    )
588
                    ) {
589 23
                        $operator = $matches[2];
590 23
                        $command = $matches[1];
591 23
                        $offset += \strlen($matches[0]);
592 7
                    } elseif (preg_match(
593 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
594 7
                        substr($text_part, $offset),
595
                        $matches
596
                    )
597
                    ) {
598 7
                        $operator = $matches[2];
599 7
                        $command = $matches[1];
600 7
                        $offset += \strlen($matches[0]);
601
                    }
602 23
                    break;
603
604 23
                case '[':
605 23
                case ']':
606
                    // array object
607 21
                    $type = $char;
608 21
                    if ('[' == $char) {
609 21
                        ++$offset;
610
                        // get elements
611 21
                        $command = $this->getCommandsText($text_part, $offset);
612
613 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
614 21
                            $operator = trim($matches[0]);
615 21
                            $offset += \strlen($matches[0]);
616
                        }
617
                    } else {
618 21
                        ++$offset;
619 21
                        break;
620
                    }
621 21
                    break;
622
623 23
                case '<':
624 23
                case '>':
625
                    // array object
626 10
                    $type = $char;
627 10
                    ++$offset;
628 10
                    if ('<' == $char) {
629 10
                        $strpos = strpos($text_part, '>', $offset);
630 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
631 10
                        $offset = $strpos + 1;
632
                    }
633
634 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
635 7
                        $operator = trim($matches[0]);
636 7
                        $offset += \strlen($matches[0]);
637
                    }
638 10
                    break;
639
640 23
                case '(':
641 23
                case ')':
642 16
                    ++$offset;
643 16
                    $type = $char;
644 16
                    $strpos = $offset;
645 16
                    if ('(' == $char) {
646 16
                        $open_bracket = 1;
647 16
                        while ($open_bracket > 0) {
648 16
                            if (!isset($text_part[$strpos])) {
649
                                break;
650
                            }
651 16
                            $ch = $text_part[$strpos];
652 16
                            switch ($ch) {
653 16
                                case '\\':
654
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
655
                                    // skip next character
656 11
                                    ++$strpos;
657 11
                                    break;
658
659 16
                                case '(':
660
                                 // LEFT PARENHESIS (28h)
661
                                    ++$open_bracket;
662
                                    break;
663
664 16
                                case ')':
665
                                 // RIGHT PARENTHESIS (29h)
666 16
                                    --$open_bracket;
667 16
                                    break;
668
                            }
669 16
                            ++$strpos;
670
                        }
671 16
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
672 16
                        $offset = $strpos;
673
674 16
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
675 12
                            $operator = $matches[1];
676 12
                            $offset += \strlen($matches[0]);
677
                        }
678
                    }
679 16
                    break;
680
681
                default:
682 23
                    if ('ET' == substr($text_part, $offset, 2)) {
683 1
                        break;
684 23
                    } elseif (preg_match(
685 23
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
686 23
                        substr($text_part, $offset),
687
                        $matches
688
                    )
689
                    ) {
690 23
                        $operator = trim($matches['id']);
691 23
                        $command = trim($matches['data']);
692 23
                        $offset += \strlen($matches[0]);
693 19
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
694 18
                        $type = 'n';
695 18
                        $command = trim($matches[0]);
696 18
                        $offset += \strlen($matches[0]);
697 12
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
698 12
                        $type = '';
699 12
                        $operator = $matches[1];
700 12
                        $command = '';
701 12
                        $offset += \strlen($matches[0]);
702
                    }
703
            }
704
705 23
            if (false !== $command) {
706 23
                $commands[] = [
707 23
                    self::TYPE => $type,
708 23
                    self::OPERATOR => $operator,
709 23
                    self::COMMAND => $command,
710
                ];
711
            } else {
712 21
                break;
713
            }
714
        }
715
716 23
        return $commands;
717
    }
718
719 35
    public static function factory(
720
        Document $document,
721
        Header $header,
722
        ?string $content,
723
        ?Config $config = null
724
    ): self {
725 35
        switch ($header->get('Type')->getContent()) {
726 35
            case 'XObject':
727 8
                switch ($header->get('Subtype')->getContent()) {
728 8
                    case 'Image':
729 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

729
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
730
731 6
                    case 'Form':
732 6
                        return new Form($document, $header, $content, $config);
733
                }
734
735
                return new self($document, $header, $content, $config);
736
737 35
            case 'Pages':
738 34
                return new Pages($document, $header, $content, $config);
739
740 35
            case 'Page':
741 34
                return new Page($document, $header, $content, $config);
742
743 35
            case 'Encoding':
744 4
                return new Encoding($document, $header, $content, $config);
745
746 35
            case 'Font':
747 34
                $subtype = $header->get('Subtype')->getContent();
748 34
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
749
750 34
                if (class_exists($classname)) {
751 34
                    return new $classname($document, $header, $content, $config);
752
                }
753
754
                return new Font($document, $header, $content, $config);
755
756
            default:
757 35
                return new self($document, $header, $content, $config);
758
        }
759
    }
760
761
    /**
762
     * Returns unique id identifying the object.
763
     */
764 15
    protected function getUniqueId(): string
765
    {
766 15
        return spl_object_hash($this);
767
    }
768
}
769