Passed
Pull Request — master (#481)
by Konrad
02:38
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 54
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 54
        $this->document = $document;
81 54
        $this->header = null !== $header ? $header : new Header();
82 54
        $this->content = $content;
83 54
        $this->config = $config;
84 54
    }
85
86 41
    public function init()
87
    {
88 41
    }
89
90 41
    public function getHeader(): ?Header
91
    {
92 41
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 42
    public function get(string $name)
99
    {
100 42
        return $this->header->get($name);
101
    }
102
103 39
    public function has(string $name): bool
104
    {
105 39
        return $this->header->has($name);
106
    }
107
108 2
    public function getDetails(bool $deep = true): array
109
    {
110 2
        return $this->header->getDetails($deep);
111
    }
112
113 32
    public function getContent(): ?string
114
    {
115 32
        return $this->content;
116
    }
117
118 26
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 26
        $char = $char[0];
121 26
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 26
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 26
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 26
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 26
        foreach ($matches[1] as $part) {
132 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 26
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 26
        foreach ($matches[1] as $part) {
138 15
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 26
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 26
            $content = '';
144 26
            $level = 0;
145 26
            foreach ($parts as $part) {
146 26
                if ('<' == $part) {
147 14
                    ++$level;
148
                }
149
150 26
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 26
                if ('>' == $part) {
153 14
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 26
        preg_match_all(
160 26
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 26
            \PREG_OFFSET_CAPTURE
164
        );
165 26
        foreach ($matches[1] as $part) {
166 3
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 26
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 26
        foreach ($matches[1] as $part) {
171 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 26
        return $content;
175
    }
176
177 25
    public function getSectionsText(?string $content): array
178
    {
179 25
        $sections = [];
180 25
        $content = ' '.$content.' ';
181 25
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 25
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 23
            foreach ($matches[2] as $pos => $part) {
186 23
                $text = $part[0];
187 23
                if ('' === $text) {
188
                    continue;
189
                }
190 23
                $offset = $part[1];
191 23
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 23
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 23
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 23
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 25
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 4
            foreach ($matches[1] as $part) {
207 4
                $text = $part[0];
208 4
                $offset = $part[1];
209 4
                $section = substr($content, $offset, \strlen($text));
210
211 4
                $sections[] = $section;
212
            }
213
        }
214
215 25
        return $sections;
216
    }
217
218 15
    private function getDefaultFont(Page $page = null): Font
219
    {
220 15
        $fonts = [];
221 15
        if (null !== $page) {
222 14
            $fonts = $page->getFonts();
223
        }
224
225 15
        $firstFont = $this->document->getFirstFont();
226 15
        if (null !== $firstFont) {
227 13
            $fonts[] = $firstFont;
228
        }
229
230 15
        if (\count($fonts) > 0) {
231 13
            return reset($fonts);
232
        }
233
234 2
        return new Font($this->document, null, null, $this->config);
235
    }
236
237
    /**
238
     * @throws \Exception
239
     */
240 15
    public function getText(?Page $page = null): string
241
    {
242 15
        $result = '';
243 15
        $sections = $this->getSectionsText($this->content);
244 15
        $current_font = $this->getDefaultFont($page);
245 15
        $clipped_font = $current_font;
246
247 15
        $current_position_td = ['x' => false, 'y' => false];
248 15
        $current_position_tm = ['x' => false, 'y' => false];
249
250 15
        self::$recursionStack[] = $this->getUniqueId();
251
252 15
        foreach ($sections as $section) {
253 13
            $commands = $this->getCommandsText($section);
254 13
            $reverse_text = false;
255 13
            $text = '';
256
257 13
            foreach ($commands as $command) {
258 13
                switch ($command[self::OPERATOR]) {
259 13
                    case 'BMC':
260 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
261 1
                            $reverse_text = true;
262
                        }
263 1
                        break;
264
265
                    // set character spacing
266 13
                    case 'Tc':
267 2
                        break;
268
269
                    // move text current point
270 13
                    case 'Td':
271 10
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
272 10
                        $y = array_pop($args);
273 10
                        $x = array_pop($args);
274 10
                        if (((float) $x <= 0) ||
275 10
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
276
                        ) {
277
                            // vertical offset
278 6
                            $text .= "\n";
279 10
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
280 10
                                $current_position_td['x']
281
                            )
282
                        ) {
283
                            // horizontal offset
284 7
                            $text .= ' ';
285
                        }
286 10
                        $current_position_td = ['x' => $x, 'y' => $y];
287 10
                        break;
288
289
                    // move text current point and set leading
290 13
                    case 'TD':
291 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
292 1
                        $y = array_pop($args);
293 1
                        $x = array_pop($args);
294 1
                        if ((float) $y < 0) {
295 1
                            $text .= "\n";
296
                        } elseif ((float) $x <= 0) {
297
                            $text .= ' ';
298
                        }
299 1
                        break;
300
301 13
                    case 'Tf':
302 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
303 13
                        $id = trim($id, '/');
304 13
                        if (null !== $page) {
305 13
                            $new_font = $page->getFont($id);
306
                            // If an invalid font ID is given, do not update the font.
307
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
308
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
309
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
310
                            // But we want to make sure that malformed PDFs do not simply crash.
311 13
                            if (null !== $new_font) {
312 12
                                $current_font = $new_font;
313
                            }
314
                        }
315 13
                        break;
316
317 13
                    case 'Q':
318
                        // Use clip: restore font.
319 3
                        $current_font = $clipped_font;
320 3
                        break;
321
322 13
                    case 'q':
323
                        // Use clip: save font.
324 3
                        $clipped_font = $current_font;
325 3
                        break;
326
327 13
                    case "'":
328 13
                    case 'Tj':
329 8
                        $command[self::COMMAND] = [$command];
330
                        // no break
331 13
                    case 'TJ':
332 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
333 13
                        $text .= $sub_text;
334 13
                        break;
335
336
                    // set leading
337 11
                    case 'TL':
338 1
                        $text .= ' ';
339 1
                        break;
340
341 11
                    case 'Tm':
342 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
343 11
                        $y = array_pop($args);
344 11
                        $x = array_pop($args);
345 11
                        if (false !== $current_position_tm['x']) {
346 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
347 11
                            if ($delta > 10) {
348 9
                                $text .= "\t";
349
                            }
350
                        }
351 11
                        if (false !== $current_position_tm['y']) {
352 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
353 11
                            if ($delta > 10) {
354 7
                                $text .= "\n";
355
                            }
356
                        }
357 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
358 11
                        break;
359
360
                    // set super/subscripting text rise
361 8
                    case 'Ts':
362
                        break;
363
364
                    // set word spacing
365 8
                    case 'Tw':
366 1
                        break;
367
368
                    // set horizontal scaling
369 8
                    case 'Tz':
370
                        $text .= "\n";
371
                        break;
372
373
                    // move to start of next line
374 8
                    case 'T*':
375 2
                        $text .= "\n";
376 2
                        break;
377
378 7
                    case 'Da':
379
                        break;
380
381 7
                    case 'Do':
382 4
                        if (null !== $page) {
383 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
384 4
                            $id = trim(array_pop($args), '/ ');
385 4
                            $xobject = $page->getXObject($id);
386
387
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
388 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
389
                                // Not a circular reference.
390 4
                                $text .= $xobject->getText($page);
391
                            }
392
                        }
393 4
                        break;
394
395 5
                    case 'rg':
396 5
                    case 'RG':
397 1
                        break;
398
399 5
                    case 're':
400
                        break;
401
402 5
                    case 'co':
403
                        break;
404
405 5
                    case 'cs':
406
                        break;
407
408 5
                    case 'gs':
409 3
                        break;
410
411 4
                    case 'en':
412
                        break;
413
414 4
                    case 'sc':
415 4
                    case 'SC':
416
                        break;
417
418 4
                    case 'g':
419 4
                    case 'G':
420 1
                        break;
421
422 3
                    case 'V':
423
                        break;
424
425 3
                    case 'vo':
426 3
                    case 'Vo':
427
                        break;
428
429
                    default:
430
                }
431
            }
432
433
            // Fix Hebrew and other reverse text oriented languages.
434
            // @see: https://github.com/smalot/pdfparser/issues/398
435 13
            if ($reverse_text) {
436 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

436
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
437 1
                $text = implode('', array_reverse($chars));
438
            }
439
440 13
            $result .= $text;
441
        }
442
443 15
        return $result.' ';
444
    }
445
446
    /**
447
     * @throws \Exception
448
     */
449 5
    public function getTextArray(?Page $page = null): array
450
    {
451 5
        $text = [];
452 5
        $sections = $this->getSectionsText($this->content);
453 5
        $current_font = new Font($this->document, null, null, $this->config);
454
455 5
        foreach ($sections as $section) {
456 5
            $commands = $this->getCommandsText($section);
457
458 5
            foreach ($commands as $command) {
459 5
                switch ($command[self::OPERATOR]) {
460
                    // set character spacing
461 5
                    case 'Tc':
462 2
                        break;
463
464
                    // move text current point
465 5
                    case 'Td':
466 5
                        break;
467
468
                    // move text current point and set leading
469 5
                    case 'TD':
470
                        break;
471
472 5
                    case 'Tf':
473 5
                        if (null !== $page) {
474 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
475 5
                            $id = trim($id, '/');
476 5
                            $current_font = $page->getFont($id);
477
                        }
478 5
                        break;
479
480 5
                    case "'":
481 5
                    case 'Tj':
482 4
                        $command[self::COMMAND] = [$command];
483
                        // no break
484 5
                    case 'TJ':
485 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
486 5
                        $text[] = $sub_text;
487 5
                        break;
488
489
                    // set leading
490 4
                    case 'TL':
491 3
                        break;
492
493 4
                    case 'Tm':
494 3
                        break;
495
496
                    // set super/subscripting text rise
497 4
                    case 'Ts':
498
                        break;
499
500
                    // set word spacing
501 4
                    case 'Tw':
502 1
                        break;
503
504
                    // set horizontal scaling
505 4
                    case 'Tz':
506
                        //$text .= "\n";
507
                        break;
508
509
                    // move to start of next line
510 4
                    case 'T*':
511
                        //$text .= "\n";
512 3
                        break;
513
514 3
                    case 'Da':
515
                        break;
516
517 3
                    case 'Do':
518
                        if (null !== $page) {
519
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
520
                            $id = trim(array_pop($args), '/ ');
521
                            if ($xobject = $page->getXObject($id)) {
522
                                $text[] = $xobject->getText($page);
523
                            }
524
                        }
525
                        break;
526
527 3
                    case 'rg':
528 3
                    case 'RG':
529 2
                        break;
530
531 3
                    case 're':
532
                        break;
533
534 3
                    case 'co':
535
                        break;
536
537 3
                    case 'cs':
538
                        break;
539
540 3
                    case 'gs':
541
                        break;
542
543 3
                    case 'en':
544
                        break;
545
546 3
                    case 'sc':
547 3
                    case 'SC':
548
                        break;
549
550 3
                    case 'g':
551 3
                    case 'G':
552 2
                        break;
553
554 1
                    case 'V':
555
                        break;
556
557 1
                    case 'vo':
558 1
                    case 'Vo':
559
                        break;
560
561
                    default:
562
                }
563
            }
564
        }
565
566 5
        return $text;
567
    }
568
569 23
    public function getCommandsText(string $text_part, int &$offset = 0): array
570
    {
571 23
        $commands = $matches = [];
572
573 23
        while ($offset < \strlen($text_part)) {
574 23
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
575 23
            $char = $text_part[$offset];
576
577 23
            $operator = '';
578 23
            $type = '';
579 23
            $command = false;
580
581 23
            switch ($char) {
582 23
                case '/':
583 23
                    $type = $char;
584 23
                    if (preg_match(
585 23
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
586 23
                        substr($text_part, $offset),
587
                        $matches
588
                    )
589
                    ) {
590 23
                        $operator = $matches[2];
591 23
                        $command = $matches[1];
592 23
                        $offset += \strlen($matches[0]);
593 7
                    } elseif (preg_match(
594 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
595 7
                        substr($text_part, $offset),
596
                        $matches
597
                    )
598
                    ) {
599 7
                        $operator = $matches[2];
600 7
                        $command = $matches[1];
601 7
                        $offset += \strlen($matches[0]);
602
                    }
603 23
                    break;
604
605 23
                case '[':
606 23
                case ']':
607
                    // array object
608 21
                    $type = $char;
609 21
                    if ('[' == $char) {
610 21
                        ++$offset;
611
                        // get elements
612 21
                        $command = $this->getCommandsText($text_part, $offset);
613
614 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
615 21
                            $operator = trim($matches[0]);
616 21
                            $offset += \strlen($matches[0]);
617
                        }
618
                    } else {
619 21
                        ++$offset;
620 21
                        break;
621
                    }
622 21
                    break;
623
624 23
                case '<':
625 23
                case '>':
626
                    // array object
627 10
                    $type = $char;
628 10
                    ++$offset;
629 10
                    if ('<' == $char) {
630 10
                        $strpos = strpos($text_part, '>', $offset);
631 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
632 10
                        $offset = $strpos + 1;
633
                    }
634
635 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
636 7
                        $operator = trim($matches[0]);
637 7
                        $offset += \strlen($matches[0]);
638
                    }
639 10
                    break;
640
641 23
                case '(':
642 23
                case ')':
643 16
                    ++$offset;
644 16
                    $type = $char;
645 16
                    $strpos = $offset;
646 16
                    if ('(' == $char) {
647 16
                        $open_bracket = 1;
648 16
                        while ($open_bracket > 0) {
649 16
                            if (!isset($text_part[$strpos])) {
650
                                break;
651
                            }
652 16
                            $ch = $text_part[$strpos];
653 16
                            switch ($ch) {
654 16
                                case '\\':
655
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
656
                                    // skip next character
657 11
                                    ++$strpos;
658 11
                                    break;
659
660 16
                                case '(':
661
                                 // LEFT PARENHESIS (28h)
662
                                    ++$open_bracket;
663
                                    break;
664
665 16
                                case ')':
666
                                 // RIGHT PARENTHESIS (29h)
667 16
                                    --$open_bracket;
668 16
                                    break;
669
                            }
670 16
                            ++$strpos;
671
                        }
672 16
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
673 16
                        $offset = $strpos;
674
675 16
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
676 12
                            $operator = $matches[1];
677 12
                            $offset += \strlen($matches[0]);
678
                        }
679
                    }
680 16
                    break;
681
682
                default:
683 23
                    if ('ET' == substr($text_part, $offset, 2)) {
684 1
                        break;
685 23
                    } elseif (preg_match(
686 23
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
687 23
                        substr($text_part, $offset),
688
                        $matches
689
                    )
690
                    ) {
691 23
                        $operator = trim($matches['id']);
692 23
                        $command = trim($matches['data']);
693 23
                        $offset += \strlen($matches[0]);
694 19
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
695 18
                        $type = 'n';
696 18
                        $command = trim($matches[0]);
697 18
                        $offset += \strlen($matches[0]);
698 12
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
699 12
                        $type = '';
700 12
                        $operator = $matches[1];
701 12
                        $command = '';
702 12
                        $offset += \strlen($matches[0]);
703
                    }
704
            }
705
706 23
            if (false !== $command) {
707 23
                $commands[] = [
708 23
                    self::TYPE => $type,
709 23
                    self::OPERATOR => $operator,
710 23
                    self::COMMAND => $command,
711
                ];
712
            } else {
713 21
                break;
714
            }
715
        }
716
717 23
        return $commands;
718
    }
719
720 34
    public static function factory(
721
        Document $document,
722
        Header $header,
723
        ?string $content,
724
        ?Config $config = null
725
    ): self {
726 34
        switch ($header->get('Type')->getContent()) {
727 34
            case 'XObject':
728 8
                switch ($header->get('Subtype')->getContent()) {
729 8
                    case 'Image':
730 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

730
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
731
732 6
                    case 'Form':
733 6
                        return new Form($document, $header, $content, $config);
734
                }
735
736
                return new self($document, $header, $content, $config);
737
738 34
            case 'Pages':
739 33
                return new Pages($document, $header, $content, $config);
740
741 34
            case 'Page':
742 33
                return new Page($document, $header, $content, $config);
743
744 34
            case 'Encoding':
745 5
                return new Encoding($document, $header, $content, $config);
746
747 34
            case 'Font':
748 33
                $subtype = $header->get('Subtype')->getContent();
749 33
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
750
751 33
                if (class_exists($classname)) {
752 33
                    return new $classname($document, $header, $content, $config);
753
                }
754
755
                return new Font($document, $header, $content, $config);
756
757
            default:
758 34
                return new self($document, $header, $content, $config);
759
        }
760
    }
761
762
    /**
763
     * Returns unique id identifying the object.
764
     */
765 15
    protected function getUniqueId(): string
766
    {
767 15
        return spl_object_hash($this);
768
    }
769
}
770