Test Failed
Pull Request — master (#614)
by
unknown
02:18
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 *
9
 * @date    2017-01-03
10
 *
11
 * @license LGPLv3
12
 *
13
 * @url     <https://github.com/smalot/pdfparser>
14
 *
15
 *  PdfParser is a pdf library written in PHP, extraction oriented.
16
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
17
 *
18
 *  This program is free software: you can redistribute it and/or modify
19
 *  it under the terms of the GNU Lesser General Public License as published by
20
 *  the Free Software Foundation, either version 3 of the License, or
21
 *  (at your option) any later version.
22
 *
23
 *  This program is distributed in the hope that it will be useful,
24
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
25
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
26
 *  GNU Lesser General Public License for more details.
27
 *
28
 *  You should have received a copy of the GNU Lesser General Public License
29
 *  along with this program.
30
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
31
 */
32
33
namespace Smalot\PdfParser;
34
35
use Smalot\PdfParser\XObject\Form;
36
use Smalot\PdfParser\XObject\Image;
37
38
/**
39
 * Class PDFObject
40
 */
41
class PDFObject
42
{
43
    public const TYPE = 't';
44
45
    public const OPERATOR = 'o';
46
47
    public const COMMAND = 'c';
48
49
    /**
50
     * The recursion stack.
51
     *
52
     * @var array
53
     */
54
    public static $recursionStack = [];
55
56
    /**
57
     * @var Document
58
     */
59
    protected $document;
60
61
    /**
62
     * @var Header
63
     */
64
    protected $header;
65
66
    /**
67
     * @var string
68
     */
69
    protected $content;
70
71
    /**
72
     * @var Config
73
     */
74
    protected $config;
75
76 62
    public function __construct(
77
        Document $document,
78
        Header $header = null,
79
        string $content = null,
80
        Config $config = null
81
    ) {
82 62
        $this->document = $document;
83 62
        $this->header = $header ?? new Header();
84 62
        $this->content = $content;
85 62
        $this->config = $config;
86 62
    }
87
88 49
    public function init()
89
    {
90 49
    }
91
92 3
    public function getDocument(): Document
93
    {
94 3
        return $this->document;
95
    }
96
97 49
    public function getHeader(): ?Header
98
    {
99 49
        return $this->header;
100
    }
101
102 3
    public function getConfig(): ?Config
103
    {
104 3
        return $this->config;
105
    }
106
107
    /**
108
     * @return Element|PDFObject|Header
109
     */
110 50
    public function get(string $name)
111
    {
112 50
        return $this->header->get($name);
113
    }
114
115 47
    public function has(string $name): bool
116
    {
117 47
        return $this->header->has($name);
118
    }
119
120 3
    public function getDetails(bool $deep = true): array
121
    {
122 3
        return $this->header->getDetails($deep);
123
    }
124
125 38
    public function getContent(): ?string
126
    {
127 38
        return $this->content;
128
    }
129
130 32
    public function cleanContent(string $content, string $char = 'X')
131
    {
132 32
        $char = $char[0];
133 32
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
134
135
        // Remove image bloc with binary content
136 32
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 32
        foreach ($matches[0] as $part) {
138
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean content in square brackets [.....]
142 32
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

142
        /** @scrutinizer ignore-call */ 
143
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
143 32
        foreach ($matches[1] as $part) {
144 22
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
145
        }
146
147
        // Clean content in round brackets (.....)
148 32
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
149 32
        foreach ($matches[1] as $part) {
150 21
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
151
        }
152
153
        // Clean structure
154 32
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

154
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
155 32
            $content = '';
156 32
            $level = 0;
157 32
            foreach ($parts as $part) {
158 32
                if ('<' == $part) {
159 18
                    ++$level;
160
                }
161
162 32
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
163
164 32
                if ('>' == $part) {
165 18
                    --$level;
166
                }
167
            }
168
        }
169
170
        // Clean BDC and EMC markup
171 32
        preg_match_all(
172 32
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
173
            $content,
174
            $matches,
175 32
            \PREG_OFFSET_CAPTURE
176
        );
177 32
        foreach ($matches[1] as $part) {
178 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
179
        }
180
181 32
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
182 32
        foreach ($matches[1] as $part) {
183 11
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
184
        }
185
186 32
        return $content;
187
    }
188
189 31
    public function getSectionsText(?string $content): array
190
    {
191 31
        $sections = [];
192 31
        $content = ' '.$content.' ';
193 31
        $textCleaned = $this->cleanContent($content, '_');
194
195
        // Extract text blocks.
196 31
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

196
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
197 29
            foreach ($matches[2] as $pos => $part) {
198 29
                $text = $part[0];
199 29
                if ('' === $text) {
200
                    continue;
201
                }
202 29
                $offset = $part[1];
203 29
                $section = substr($content, $offset, \strlen($text));
204
205
                // Removes BDC and EMC markup.
206 29
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
207
208
                // Add Q and q flags if detected around BT/ET.
209
                // @see: https://github.com/smalot/pdfparser/issues/387
210 29
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
211
212 29
                $sections[] = $section;
213
            }
214
        }
215
216
        // Extract 'do' commands.
217 31
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
218 4
            foreach ($matches[1] as $part) {
219 4
                $text = $part[0];
220 4
                $offset = $part[1];
221 4
                $section = substr($content, $offset, \strlen($text));
222
223 4
                $sections[] = $section;
224
            }
225
        }
226
227 31
        return $sections;
228
    }
229
230 20
    private function getDefaultFont(Page $page = null): Font
231
    {
232 20
        $fonts = [];
233 20
        if (null !== $page) {
234 19
            $fonts = $page->getFonts();
235
        }
236
237 20
        $firstFont = $this->document->getFirstFont();
238 20
        if (null !== $firstFont) {
239 18
            $fonts[] = $firstFont;
240
        }
241
242 20
        if (\count($fonts) > 0) {
243 18
            return reset($fonts);
244
        }
245
246 2
        return new Font($this->document, null, null, $this->config);
247
    }
248
249
    private function getTJUsingFontFallback(Font $font, array $command, Page $page = null): string
250
    {
251
        $orig_text = $font->decodeText($command);
252 20
        $text = $orig_text;
253
254 20
        // If we make this a Config option, we can add a check if it's
255 20
        // enabled here.
256 20
        if (null !== $page) {
257 20
            $font_ids = array_keys($page->getFonts());
258
259 20
            // If the decoded text contains UTF-8 control characters
260 20
            // then the font page being used is probably the wrong one.
261
            // Loop through the rest of the fonts to see if we can get
262 20
            // a good decode.
263
            while (preg_match('/[\x00-\x1f\x7f]/u', $text)) {
264 20
                // If we're out of font IDs, then give up and use the
265 18
                // original string
266 18
                if (0 == \count($font_ids)) {
267 18
                    return $orig_text;
268
                }
269 18
270 18
                // Try the next font ID
271 18
                $font = $page->getFont(array_shift($font_ids));
272 1
                $text = $font->decodeText($command);
273 1
            }
274
        }
275 1
276
        return $text;
277
    }
278 18
279 5
    /**
280
     * @throws \Exception
281
     */
282 18
    public function getText(Page $page = null): string
283 15
    {
284 15
        $result = '';
285 15
        $sections = $this->getSectionsText($this->content);
286 15
        $current_font = $this->getDefaultFont($page);
287 15
        $clipped_font = $current_font;
288
289
        $current_position_td = ['x' => false, 'y' => false];
290 11
        $current_position_tm = ['x' => false, 'y' => false];
291 15
292 15
        self::$recursionStack[] = $this->getUniqueId();
293
294 12
        foreach ($sections as $section) {
295
            $commands = $this->getCommandsText($section);
296 15
            $reverse_text = false;
297 15
            $text = '';
298
299
            foreach ($commands as $command) {
300 18
                switch ($command[self::OPERATOR]) {
301 3
                    case 'BMC':
302 3
                        if ('ReversedChars' == $command[self::COMMAND]) {
303 3
                            $reverse_text = true;
304 3
                        }
305 3
                        break;
306
307
                        // set character spacing
308
                    case 'Tc':
309 3
                        break;
310
311 18
                        // move text current point
312 18
                    case 'Td':
313 18
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
314 18
                        $y = array_pop($args);
315 18
                        $x = array_pop($args);
316
                        if (((float) $x <= 0)
317
                            || (false !== $current_position_td['y'] && (float) $y < (float) $current_position_td['y'])
318
                        ) {
319
                            // vertical offset
320
                            $text .= "\n";
321 18
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float)
322 16
                            $current_position_td['x']
323
                        ) {
324
                            $text .= $this->config->getHorizontalOffset();
325 18
                        }
326
                        $current_position_td = ['x' => $x, 'y' => $y];
327 18
                        break;
328
329 5
                        // move text current point and set leading
330 5
                    case 'TD':
331
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
332 18
                        $y = array_pop($args);
333
                        $x = array_pop($args);
334 6
                        if ((float) $y < 0) {
335 6
                            $text .= "\n";
336
                        } elseif ((float) $x <= 0) {
337 18
                            $text .= ' ';
338 18
                        }
339 13
                        break;
340
341 17
                    case 'Tf':
342 18
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
343 18
                        $id = trim($id, '/');
344 18
                        if (null !== $page) {
345
                            $new_font = $page->getFont($id);
346
                            // If an invalid font ID is given, do not update the font.
347 15
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
348 1
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
349 1
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
350
                            // But we want to make sure that malformed PDFs do not simply crash.
351 15
                            if (null !== $new_font) {
352 14
                                $current_font = $new_font;
353 14
                            }
354 14
                        }
355 14
                        break;
356 14
357 14
                    case 'Q':
358 12
                        // Use clip: restore font.
359
                        $current_font = $clipped_font;
360
                        break;
361 14
362 14
                    case 'q':
363 14
                        // Use clip: save font.
364 10
                        $clipped_font = $current_font;
365
                        break;
366
367 14
                    case "'":
368 14
                    case 'Tj':
369
                        $command[self::COMMAND] = [$command];
370
                        // no break
371 12
                    case 'TJ':
372
                        $text .= $this->getTJUsingFontFallback(
373
                            $current_font,
374
                            $command[self::COMMAND],
375 12
                            $page
376 4
                        );
377
                        break;
378
379 12
                        // set leading
380
                    case 'TL':
381
                        $text .= ' ';
382
                        break;
383
384 12
                    case 'Tm':
385 4
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
386 4
                        $y = array_pop($args);
387
                        $x = array_pop($args);
388 11
                        if (false !== $current_position_tm['x']) {
389
                            $delta = abs((float) $x - (float) $current_position_tm['x']);
390
                            if ($delta > 10) {
391 11
                                $text .= "\t";
392 4
                            }
393 4
                        }
394 4
                        if (false !== $current_position_tm['y']) {
395 4
                            $delta = abs((float) $y - (float) $current_position_tm['y']);
396
                            if ($delta > 10) {
397
                                $text .= "\n";
398 4
                            }
399
                        }
400 4
                        $current_position_tm = ['x' => $x, 'y' => $y];
401
                        break;
402
403 4
                        // set super/subscripting text rise
404
                    case 'Ts':
405 9
                        break;
406 8
407 2
                        // set word spacing
408
                    case 'Tw':
409 8
                        break;
410
411
                        // set horizontal scaling
412 8
                    case 'Tz':
413
                        $text .= "\n";
414
                        break;
415 8
416 3
                        // move to start of next line
417
                    case 'T*':
418 8
                        $text .= "\n";
419 3
                        break;
420
421 7
                    case 'Da':
422
                        break;
423
424 7
                    case 'Do':
425 7
                        if (null !== $page) {
426
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
427
                            $id = trim(array_pop($args), '/ ');
428 7
                            $xobject = $page->getXObject($id);
429 7
430 1
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
431
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
432 6
                                // Not a circular reference.
433
                                $text .= $xobject->getText($page);
434
                            }
435 6
                        }
436 6
                        break;
437
438
                    case 'rg':
439
                    case 'RG':
440
                        break;
441
442
                    case 're':
443
                        break;
444
445 18
                    case 'co':
446 1
                        break;
447 1
448
                    case 'cs':
449
                        break;
450 18
451
                    case 'gs':
452
                        break;
453 20
454
                    case 'en':
455
                        break;
456
457
                    case 'sc':
458
                    case 'SC':
459 6
                        break;
460
461 6
                    case 'g':
462 6
                    case 'G':
463 6
                        break;
464
465 6
                    case 'V':
466 6
                        break;
467
468 6
                    case 'vo':
469 6
                    case 'Vo':
470
                        break;
471 6
472 3
                    default:
473
                }
474
            }
475 6
476 6
            // Fix Hebrew and other reverse text oriented languages.
477
            // @see: https://github.com/smalot/pdfparser/issues/398
478
            if ($reverse_text) {
479 6
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

479
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
480
                $text = implode('', array_reverse($chars));
481
            }
482 6
483 6
            $result .= $text;
484 6
        }
485 6
486 6
        return $result.' ';
487
    }
488 6
489
    /**
490 6
     * @throws \Exception
491 6
     */
492 5
    public function getTextArray(Page $page = null): array
493
    {
494 6
        $text = [];
495 6
        $sections = $this->getSectionsText($this->content);
496 6
        $current_font = new Font($this->document, null, null, $this->config);
497 6
498
        foreach ($sections as $section) {
499
            $commands = $this->getCommandsText($section);
500 5
501 4
            foreach ($commands as $command) {
502
                switch ($command[self::OPERATOR]) {
503 5
                    // set character spacing
504 4
                    case 'Tc':
505
                        break;
506
507 5
                        // move text current point
508
                    case 'Td':
509
                        break;
510
511 5
                        // move text current point and set leading
512 2
                    case 'TD':
513
                        break;
514
515 5
                    case 'Tf':
516
                        if (null !== $page) {
517
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
518
                            $id = trim($id, '/');
519
                            $current_font = $page->getFont($id);
520 5
                        }
521
                        break;
522 4
523
                    case "'":
524 4
                    case 'Tj':
525
                        $command[self::COMMAND] = [$command];
526
                        // no break
527 4
                    case 'TJ':
528
                        $text[] = $this->getTJUsingFontFallback(
529
                            $current_font,
530
                            $command[self::COMMAND],
531
                            $page
532
                        );
533
                        break;
534
535
                        // set leading
536
                    case 'TL':
537 4
                        break;
538 4
539 2
                    case 'Tm':
540
                        break;
541 4
542
                        // set super/subscripting text rise
543
                    case 'Ts':
544 4
                        break;
545
546
                        // set word spacing
547 4
                    case 'Tw':
548
                        break;
549
550 4
                        // set horizontal scaling
551 1
                    case 'Tz':
552
                        // $text .= "\n";
553 4
                        break;
554
555
                        // move to start of next line
556 4
                    case 'T*':
557 4
                        // $text .= "\n";
558
                        break;
559
560 4
                    case 'Da':
561 4
                        break;
562 2
563
                    case 'Do':
564 2
                        if (null !== $page) {
565
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
566
                            $id = trim(array_pop($args), '/ ');
567 2
                            if ($xobject = $page->getXObject($id)) {
568 2
                                $text[] = $xobject->getText($page);
569
                            }
570
                        }
571
                        break;
572
573
                    case 'rg':
574
                    case 'RG':
575
                        break;
576 6
577
                    case 're':
578
                        break;
579 29
580
                    case 'co':
581 29
                        break;
582
583 29
                    case 'cs':
584 29
                        break;
585 29
586
                    case 'gs':
587 29
                        break;
588 29
589 29
                    case 'en':
590
                        break;
591 29
592 29
                    case 'sc':
593 29
                    case 'SC':
594 29
                        break;
595 29
596 29
                    case 'g':
597
                    case 'G':
598
                        break;
599
600 29
                    case 'V':
601 29
                        break;
602 29
603 11
                    case 'vo':
604 11
                    case 'Vo':
605 11
                        break;
606
607
                    default:
608
                }
609 11
            }
610 11
        }
611 11
612
        return $text;
613 29
    }
614
615 29
    public function getCommandsText(string $text_part, int &$offset = 0): array
616 29
    {
617
        $commands = $matches = [];
618 25
619 25
        while ($offset < \strlen($text_part)) {
620 25
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
621
            $char = $text_part[$offset];
622 25
623
            $operator = '';
624 25
            $type = '';
625 25
            $command = false;
626 25
627
            switch ($char) {
628
                case '/':
629 25
                    $type = $char;
630 25
                    if (preg_match(
631
                        '/\G\/([A-Z0-9\._,\+-]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
632 25
                        $text_part,
633
                        $matches,
634 29
                        0,
635 29
                        $offset
636
                    )
637 14
                    ) {
638 14
                        $operator = $matches[2];
639 14
                        $command = $matches[1];
640 14
                        $offset += \strlen($matches[0]);
641 14
                    } elseif (preg_match(
642 14
                        '/\G\/([A-Z0-9\._,\+-]+)\s+([A-Z]+)\s*/si',
643
                        $text_part,
644
                        $matches,
645 14
                        0,
646 9
                        $offset
647 9
                    )
648
                    ) {
649 14
                        $operator = $matches[2];
650
                        $command = $matches[1];
651 29
                        $offset += \strlen($matches[0]);
652 29
                    }
653 22
                    break;
654 22
655 22
                case '[':
656 22
                case ']':
657 22
                    // array object
658 22
                    $type = $char;
659 22
                    if ('[' == $char) {
660
                        ++$offset;
661
                        // get elements
662 22
                        $command = $this->getCommandsText($text_part, $offset);
663 22
664 22
                        if (preg_match(
665
                            '/\G\s*[A-Z]{1,2}\s*/si',
666
                            $text_part,
667 16
                            $matches,
668 16
                            0,
669
                            $offset
670 22
                        )
671
                        ) {
672
                            $operator = trim($matches[0]);
673
                            $offset += \strlen($matches[0]);
674
                        }
675 22
                    } else {
676
                        ++$offset;
677 22
                        break;
678 22
                    }
679
                    break;
680 22
681
                case '<':
682 22
                case '>':
683 22
                    // array object
684
                    $type = $char;
685 22
                    ++$offset;
686 18
                    if ('<' == $char) {
687 18
                        $strpos = strpos($text_part, '>', $offset);
688
                        $command = substr($text_part, $offset, $strpos - $offset);
689
                        $offset = $strpos + 1;
690 22
                    }
691
692
                    if (preg_match(
693 29
                        '/\G\s*[A-Z]{1,2}\s*/si',
694 1
                        $text_part,
695 29
                        $matches,
696 29
                        0,
697 29
                        $offset
698
                    )
699
                    ) {
700
                        $operator = trim($matches[0]);
701 29
                        $offset += \strlen($matches[0]);
702 29
                    }
703 29
                    break;
704 24
705 22
                case '(':
706 22
                case ')':
707 22
                    ++$offset;
708 17
                    $type = $char;
709 17
                    $strpos = $offset;
710 17
                    if ('(' == $char) {
711 17
                        $open_bracket = 1;
712 17
                        while ($open_bracket > 0) {
713
                            if (!isset($text_part[$strpos])) {
714
                                break;
715
                            }
716 29
                            $ch = $text_part[$strpos];
717 29
                            switch ($ch) {
718 29
                                case '\\':
719 29
                                    // REVERSE SOLIDUS (5Ch) (Backslash)
720 29
                                    // skip next character
721
                                    ++$strpos;
722
                                    break;
723 25
724
                                case '(':
725
                                    // LEFT PARENHESIS (28h)
726
                                    ++$open_bracket;
727 29
                                    break;
728
729
                                case ')':
730 42
                                    // RIGHT PARENTHESIS (29h)
731
                                    --$open_bracket;
732
                                    break;
733
                            }
734
                            ++$strpos;
735
                        }
736 42
                        $command = substr($text_part, $offset, $strpos - $offset - 1);
737 42
                        $offset = $strpos;
738 8
739 8
                        if (preg_match(
740 3
                            '/\G\s*([A-Z\']{1,2})\s*/si',
741
                            $text_part,
742 6
                            $matches,
743 6
                            0,
744
                            $offset
745
                        )
746
                        ) {
747
                            $operator = $matches[1];
748 42
                            $offset += \strlen($matches[0]);
749 41
                        }
750
                    }
751 42
                    break;
752 41
753
                default:
754 42
                    if ('ET' == substr($text_part, $offset, 2)) {
755 6
                        break;
756
                    } elseif (preg_match(
757 42
                        '/\G\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
758 41
                        $text_part,
759 41
                        $matches,
760
                        0,
761 41
                        $offset
762 41
                    )
763
                    ) {
764
                        $operator = trim($matches['id']);
765
                        $command = trim($matches['data']);
766
                        $offset += \strlen($matches[0]);
767
                    } elseif (preg_match(
768 42
                        '/\G\s*([0-9\.\-]+\s*?)+\s*/si',
769
                        $text_part,
770
                        $matches,
771
                        0,
772
                        $offset
773
                    )
774
                    ) {
775 20
                        $type = 'n';
776
                        $command = trim($matches[0]);
777 20
                        $offset += \strlen($matches[0]);
778
                    } elseif (preg_match(
779
                        '/\G\s*([A-Z\*]+)\s*/si',
780
                        $text_part,
781
                        $matches,
782
                        0,
783
                        $offset
784
                    )
785
                    ) {
786
                        $type = '';
787
                        $operator = $matches[1];
788
                        $command = '';
789
                        $offset += \strlen($matches[0]);
790
                    }
791
            }
792
793
            if (false !== $command) {
794
                $commands[] = [
795
                    self::TYPE => $type,
796
                    self::OPERATOR => $operator,
797
                    self::COMMAND => $command,
798
                ];
799
            } else {
800
                break;
801
            }
802
        }
803
804
        return $commands;
805
    }
806
807
    public static function factory(
808
        Document $document,
809
        Header $header,
810
        ?string $content,
811
        Config $config = null
812
    ): self {
813
        switch ($header->get('Type')->getContent()) {
814
            case 'XObject':
815
                switch ($header->get('Subtype')->getContent()) {
816
                    case 'Image':
817
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

817
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
818
819
                    case 'Form':
820
                        return new Form($document, $header, $content, $config);
821
                }
822
823
                return new self($document, $header, $content, $config);
824
825
            case 'Pages':
826
                return new Pages($document, $header, $content, $config);
827
828
            case 'Page':
829
                return new Page($document, $header, $content, $config);
830
831
            case 'Encoding':
832
                return new Encoding($document, $header, $content, $config);
833
834
            case 'Font':
835
                $subtype = $header->get('Subtype')->getContent();
836
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
837
838
                if (class_exists($classname)) {
839
                    return new $classname($document, $header, $content, $config);
840
                }
841
842
                return new Font($document, $header, $content, $config);
843
844
            default:
845
                return new self($document, $header, $content, $config);
846
        }
847
    }
848
849
    /**
850
     * Returns unique id identifying the object.
851
     */
852
    protected function getUniqueId(): string
853
    {
854
        return spl_object_hash($this);
855
    }
856
}
857