Passed
Pull Request — master (#500)
by Konrad
05:56 queued 03:54
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 57
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 57
        $this->document = $document;
81 57
        $this->header = null !== $header ? $header : new Header();
82 57
        $this->content = $content;
83 57
        $this->config = $config;
84 57
    }
85
86 44
    public function init()
87
    {
88 44
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 44
    public function getHeader(): ?Header
96
    {
97 44
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 45
    public function get(string $name)
109
    {
110 45
        return $this->header->get($name);
111
    }
112
113 42
    public function has(string $name): bool
114
    {
115 42
        return $this->header->has($name);
116
    }
117
118 2
    public function getDetails(bool $deep = true): array
119
    {
120 2
        return $this->header->getDetails($deep);
121
    }
122
123 34
    public function getContent(): ?string
124
    {
125 34
        return $this->content;
126
    }
127
128 28
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 28
        $char = $char[0];
131 28
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 28
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 28
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 28
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 28
        foreach ($matches[1] as $part) {
142 19
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 28
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 28
        foreach ($matches[1] as $part) {
148 17
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 28
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 28
            $content = '';
154 28
            $level = 0;
155 28
            foreach ($parts as $part) {
156 28
                if ('<' == $part) {
157 15
                    ++$level;
158
                }
159
160 28
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 28
                if ('>' == $part) {
163 15
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 28
        preg_match_all(
170 28
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 28
            \PREG_OFFSET_CAPTURE
174
        );
175 28
        foreach ($matches[1] as $part) {
176 4
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 28
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 28
        foreach ($matches[1] as $part) {
181 8
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 28
        return $content;
185
    }
186
187 27
    public function getSectionsText(?string $content): array
188
    {
189 27
        $sections = [];
190 27
        $content = ' '.$content.' ';
191 27
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 27
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 25
            foreach ($matches[2] as $pos => $part) {
196 25
                $text = $part[0];
197 25
                if ('' === $text) {
198
                    continue;
199
                }
200 25
                $offset = $part[1];
201 25
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 25
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Q and q flags if detected around BT/ET.
207
                // @see: https://github.com/smalot/pdfparser/issues/387
208 25
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
209
210 25
                $sections[] = $section;
211
            }
212
        }
213
214
        // Extract 'do' commands.
215 27
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
216 4
            foreach ($matches[1] as $part) {
217 4
                $text = $part[0];
218 4
                $offset = $part[1];
219 4
                $section = substr($content, $offset, \strlen($text));
220
221 4
                $sections[] = $section;
222
            }
223
        }
224
225 27
        return $sections;
226
    }
227
228 17
    private function getDefaultFont(Page $page = null): Font
229
    {
230 17
        $fonts = [];
231 17
        if (null !== $page) {
232 16
            $fonts = $page->getFonts();
233
        }
234
235 17
        $firstFont = $this->document->getFirstFont();
236 17
        if (null !== $firstFont) {
237 15
            $fonts[] = $firstFont;
238
        }
239
240 17
        if (\count($fonts) > 0) {
241 15
            return reset($fonts);
242
        }
243
244 2
        return new Font($this->document, null, null, $this->config);
245
    }
246
247
    /**
248
     * @throws \Exception
249
     */
250 17
    public function getText(?Page $page = null): string
251
    {
252 17
        $result = '';
253 17
        $sections = $this->getSectionsText($this->content);
254 17
        $current_font = $this->getDefaultFont($page);
255 17
        $clipped_font = $current_font;
256
257 17
        $current_position_td = ['x' => false, 'y' => false];
258 17
        $current_position_tm = ['x' => false, 'y' => false];
259
260 17
        self::$recursionStack[] = $this->getUniqueId();
261
262 17
        foreach ($sections as $section) {
263 15
            $commands = $this->getCommandsText($section);
264 15
            $reverse_text = false;
265 15
            $text = '';
266
267 15
            foreach ($commands as $command) {
268 15
                switch ($command[self::OPERATOR]) {
269 15
                    case 'BMC':
270 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
271 1
                            $reverse_text = true;
272
                        }
273 1
                        break;
274
275
                    // set character spacing
276 15
                    case 'Tc':
277 3
                        break;
278
279
                    // move text current point
280 15
                    case 'Td':
281 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
282 12
                        $y = array_pop($args);
283 12
                        $x = array_pop($args);
284 12
                        if (((float) $x <= 0) ||
285 12
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
286
                        ) {
287
                            // vertical offset
288 8
                            $text .= "\n";
289 12
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
290 12
                                $current_position_td['x']
291
                            )
292
                        ) {
293 9
                            $text .= $this->config->getHorizontalOffset();
294
                        }
295 12
                        $current_position_td = ['x' => $x, 'y' => $y];
296 12
                        break;
297
298
                    // move text current point and set leading
299 15
                    case 'TD':
300 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
301 1
                        $y = array_pop($args);
302 1
                        $x = array_pop($args);
303 1
                        if ((float) $y < 0) {
304 1
                            $text .= "\n";
305
                        } elseif ((float) $x <= 0) {
306
                            $text .= ' ';
307
                        }
308 1
                        break;
309
310 15
                    case 'Tf':
311 15
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
312 15
                        $id = trim($id, '/');
313 15
                        if (null !== $page) {
314 15
                            $new_font = $page->getFont($id);
315
                            // If an invalid font ID is given, do not update the font.
316
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
317
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
318
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
319
                            // But we want to make sure that malformed PDFs do not simply crash.
320 15
                            if (null !== $new_font) {
321 14
                                $current_font = $new_font;
322
                            }
323
                        }
324 15
                        break;
325
326 15
                    case 'Q':
327
                        // Use clip: restore font.
328 4
                        $current_font = $clipped_font;
329 4
                        break;
330
331 15
                    case 'q':
332
                        // Use clip: save font.
333 4
                        $clipped_font = $current_font;
334 4
                        break;
335
336 15
                    case "'":
337 15
                    case 'Tj':
338 10
                        $command[self::COMMAND] = [$command];
339
                        // no break
340 14
                    case 'TJ':
341 15
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
342 15
                        $text .= $sub_text;
343 15
                        break;
344
345
                    // set leading
346 12
                    case 'TL':
347 1
                        $text .= ' ';
348 1
                        break;
349
350 12
                    case 'Tm':
351 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
352 12
                        $y = array_pop($args);
353 12
                        $x = array_pop($args);
354 12
                        if (false !== $current_position_tm['x']) {
355 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
356 12
                            if ($delta > 10) {
357 10
                                $text .= "\t";
358
                            }
359
                        }
360 12
                        if (false !== $current_position_tm['y']) {
361 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
362 12
                            if ($delta > 10) {
363 8
                                $text .= "\n";
364
                            }
365
                        }
366 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
367 12
                        break;
368
369
                    // set super/subscripting text rise
370 9
                    case 'Ts':
371
                        break;
372
373
                    // set word spacing
374 9
                    case 'Tw':
375 2
                        break;
376
377
                    // set horizontal scaling
378 9
                    case 'Tz':
379
                        $text .= "\n";
380
                        break;
381
382
                    // move to start of next line
383 9
                    case 'T*':
384 2
                        $text .= "\n";
385 2
                        break;
386
387 8
                    case 'Da':
388
                        break;
389
390 8
                    case 'Do':
391 4
                        if (null !== $page) {
392 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
393 4
                            $id = trim(array_pop($args), '/ ');
394 4
                            $xobject = $page->getXObject($id);
395
396
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
397 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
398
                                // Not a circular reference.
399 4
                                $text .= $xobject->getText($page);
400
                            }
401
                        }
402 4
                        break;
403
404 6
                    case 'rg':
405 6
                    case 'RG':
406 1
                        break;
407
408 6
                    case 're':
409
                        break;
410
411 6
                    case 'co':
412
                        break;
413
414 6
                    case 'cs':
415 1
                        break;
416
417 6
                    case 'gs':
418 3
                        break;
419
420 5
                    case 'en':
421
                        break;
422
423 5
                    case 'sc':
424 5
                    case 'SC':
425
                        break;
426
427 5
                    case 'g':
428 5
                    case 'G':
429 1
                        break;
430
431 4
                    case 'V':
432
                        break;
433
434 4
                    case 'vo':
435 4
                    case 'Vo':
436
                        break;
437
438
                    default:
439
                }
440
            }
441
442
            // Fix Hebrew and other reverse text oriented languages.
443
            // @see: https://github.com/smalot/pdfparser/issues/398
444 15
            if ($reverse_text) {
445 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

445
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
446 1
                $text = implode('', array_reverse($chars));
447
            }
448
449 15
            $result .= $text;
450
        }
451
452 17
        return $result.' ';
453
    }
454
455
    /**
456
     * @throws \Exception
457
     */
458 5
    public function getTextArray(?Page $page = null): array
459
    {
460 5
        $text = [];
461 5
        $sections = $this->getSectionsText($this->content);
462 5
        $current_font = new Font($this->document, null, null, $this->config);
463
464 5
        foreach ($sections as $section) {
465 5
            $commands = $this->getCommandsText($section);
466
467 5
            foreach ($commands as $command) {
468 5
                switch ($command[self::OPERATOR]) {
469
                    // set character spacing
470 5
                    case 'Tc':
471 2
                        break;
472
473
                    // move text current point
474 5
                    case 'Td':
475 5
                        break;
476
477
                    // move text current point and set leading
478 5
                    case 'TD':
479
                        break;
480
481 5
                    case 'Tf':
482 5
                        if (null !== $page) {
483 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
484 5
                            $id = trim($id, '/');
485 5
                            $current_font = $page->getFont($id);
486
                        }
487 5
                        break;
488
489 5
                    case "'":
490 5
                    case 'Tj':
491 4
                        $command[self::COMMAND] = [$command];
492
                        // no break
493 5
                    case 'TJ':
494 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
495 5
                        $text[] = $sub_text;
496 5
                        break;
497
498
                    // set leading
499 4
                    case 'TL':
500 3
                        break;
501
502 4
                    case 'Tm':
503 3
                        break;
504
505
                    // set super/subscripting text rise
506 4
                    case 'Ts':
507
                        break;
508
509
                    // set word spacing
510 4
                    case 'Tw':
511 1
                        break;
512
513
                    // set horizontal scaling
514 4
                    case 'Tz':
515
                        //$text .= "\n";
516
                        break;
517
518
                    // move to start of next line
519 4
                    case 'T*':
520
                        //$text .= "\n";
521 3
                        break;
522
523 3
                    case 'Da':
524
                        break;
525
526 3
                    case 'Do':
527
                        if (null !== $page) {
528
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
529
                            $id = trim(array_pop($args), '/ ');
530
                            if ($xobject = $page->getXObject($id)) {
531
                                $text[] = $xobject->getText($page);
532
                            }
533
                        }
534
                        break;
535
536 3
                    case 'rg':
537 3
                    case 'RG':
538 2
                        break;
539
540 3
                    case 're':
541
                        break;
542
543 3
                    case 'co':
544
                        break;
545
546 3
                    case 'cs':
547
                        break;
548
549 3
                    case 'gs':
550
                        break;
551
552 3
                    case 'en':
553
                        break;
554
555 3
                    case 'sc':
556 3
                    case 'SC':
557
                        break;
558
559 3
                    case 'g':
560 3
                    case 'G':
561 2
                        break;
562
563 1
                    case 'V':
564
                        break;
565
566 1
                    case 'vo':
567 1
                    case 'Vo':
568
                        break;
569
570
                    default:
571
                }
572
            }
573
        }
574
575 5
        return $text;
576
    }
577
578 25
    public function getCommandsText(string $text_part, int &$offset = 0): array
579
    {
580 25
        $commands = $matches = [];
581
582 25
        while ($offset < \strlen($text_part)) {
583 25
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
584 25
            $char = $text_part[$offset];
585
586 25
            $operator = '';
587 25
            $type = '';
588 25
            $command = false;
589
590 25
            switch ($char) {
591 25
                case '/':
592 25
                    $type = $char;
593 25
                    if (preg_match(
594 25
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
595 25
                        substr($text_part, $offset),
596
                        $matches
597
                    )
598
                    ) {
599 25
                        $operator = $matches[2];
600 25
                        $command = $matches[1];
601 25
                        $offset += \strlen($matches[0]);
602 8
                    } elseif (preg_match(
603 8
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
604 8
                        substr($text_part, $offset),
605
                        $matches
606
                    )
607
                    ) {
608 8
                        $operator = $matches[2];
609 8
                        $command = $matches[1];
610 8
                        $offset += \strlen($matches[0]);
611
                    }
612 25
                    break;
613
614 25
                case '[':
615 25
                case ']':
616
                    // array object
617 22
                    $type = $char;
618 22
                    if ('[' == $char) {
619 22
                        ++$offset;
620
                        // get elements
621 22
                        $command = $this->getCommandsText($text_part, $offset);
622
623 22
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
624 22
                            $operator = trim($matches[0]);
625 22
                            $offset += \strlen($matches[0]);
626
                        }
627
                    } else {
628 22
                        ++$offset;
629 22
                        break;
630
                    }
631 22
                    break;
632
633 25
                case '<':
634 25
                case '>':
635
                    // array object
636 11
                    $type = $char;
637 11
                    ++$offset;
638 11
                    if ('<' == $char) {
639 11
                        $strpos = strpos($text_part, '>', $offset);
640 11
                        $command = substr($text_part, $offset, ($strpos - $offset));
641 11
                        $offset = $strpos + 1;
642
                    }
643
644 11
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
645 8
                        $operator = trim($matches[0]);
646 8
                        $offset += \strlen($matches[0]);
647
                    }
648 11
                    break;
649
650 25
                case '(':
651 25
                case ')':
652 18
                    ++$offset;
653 18
                    $type = $char;
654 18
                    $strpos = $offset;
655 18
                    if ('(' == $char) {
656 18
                        $open_bracket = 1;
657 18
                        while ($open_bracket > 0) {
658 18
                            if (!isset($text_part[$strpos])) {
659
                                break;
660
                            }
661 18
                            $ch = $text_part[$strpos];
662 18
                            switch ($ch) {
663 18
                                case '\\':
664
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
665
                                    // skip next character
666 12
                                    ++$strpos;
667 12
                                    break;
668
669 18
                                case '(':
670
                                 // LEFT PARENHESIS (28h)
671
                                    ++$open_bracket;
672
                                    break;
673
674 18
                                case ')':
675
                                 // RIGHT PARENTHESIS (29h)
676 18
                                    --$open_bracket;
677 18
                                    break;
678
                            }
679 18
                            ++$strpos;
680
                        }
681 18
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
682 18
                        $offset = $strpos;
683
684 18
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
685 14
                            $operator = $matches[1];
686 14
                            $offset += \strlen($matches[0]);
687
                        }
688
                    }
689 18
                    break;
690
691
                default:
692 25
                    if ('ET' == substr($text_part, $offset, 2)) {
693 1
                        break;
694 25
                    } elseif (preg_match(
695 25
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
696 25
                        substr($text_part, $offset),
697
                        $matches
698
                    )
699
                    ) {
700 25
                        $operator = trim($matches['id']);
701 25
                        $command = trim($matches['data']);
702 25
                        $offset += \strlen($matches[0]);
703 20
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
704 19
                        $type = 'n';
705 19
                        $command = trim($matches[0]);
706 19
                        $offset += \strlen($matches[0]);
707 13
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
708 13
                        $type = '';
709 13
                        $operator = $matches[1];
710 13
                        $command = '';
711 13
                        $offset += \strlen($matches[0]);
712
                    }
713
            }
714
715 25
            if (false !== $command) {
716 25
                $commands[] = [
717 25
                    self::TYPE => $type,
718 25
                    self::OPERATOR => $operator,
719 25
                    self::COMMAND => $command,
720
                ];
721
            } else {
722 22
                break;
723
            }
724
        }
725
726 25
        return $commands;
727
    }
728
729 37
    public static function factory(
730
        Document $document,
731
        Header $header,
732
        ?string $content,
733
        ?Config $config = null
734
    ): self {
735 37
        switch ($header->get('Type')->getContent()) {
736 37
            case 'XObject':
737 8
                switch ($header->get('Subtype')->getContent()) {
738 8
                    case 'Image':
739 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

739
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
740
741 6
                    case 'Form':
742 6
                        return new Form($document, $header, $content, $config);
743
                }
744
745
                return new self($document, $header, $content, $config);
746
747 37
            case 'Pages':
748 36
                return new Pages($document, $header, $content, $config);
749
750 37
            case 'Page':
751 36
                return new Page($document, $header, $content, $config);
752
753 37
            case 'Encoding':
754 5
                return new Encoding($document, $header, $content, $config);
755
756 37
            case 'Font':
757 36
                $subtype = $header->get('Subtype')->getContent();
758 36
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
759
760 36
                if (class_exists($classname)) {
761 36
                    return new $classname($document, $header, $content, $config);
762
                }
763
764
                return new Font($document, $header, $content, $config);
765
766
            default:
767 37
                return new self($document, $header, $content, $config);
768
        }
769
    }
770
771
    /**
772
     * Returns unique id identifying the object.
773
     */
774 17
    protected function getUniqueId(): string
775
    {
776 17
        return spl_object_hash($this);
777
    }
778
}
779