Test Failed
Pull Request — master (#543)
by
unknown
07:36
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 58
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 58
        $this->document = $document;
81 58
        $this->header = $header ?? new Header();
82 58
        $this->content = $content;
83 58
        $this->config = $config;
84 58
    }
85
86 45
    public function init()
87
    {
88 45
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 45
    public function getHeader(): ?Header
96
    {
97 45
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 46
    public function get(string $name)
109
    {
110 46
        return $this->header->get($name);
111
    }
112
113 43
    public function has(string $name): bool
114
    {
115 43
        return $this->header->has($name);
116
    }
117
118 3
    public function getDetails(bool $deep = true): array
119
    {
120 3
        return $this->header->getDetails($deep);
121
    }
122
123 35
    public function getContent(): ?string
124
    {
125 35
        return $this->content;
126
    }
127
128 28
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 28
        $char = $char[0];
131 28
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 28
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 28
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 28
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 28
        foreach ($matches[1] as $part) {
142 20
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 28
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 28
        foreach ($matches[1] as $part) {
148 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 28
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 28
            $content = '';
154 28
            $level = 0;
155 28
            foreach ($parts as $part) {
156 28
                if ('<' == $part) {
157 15
                    ++$level;
158
                }
159
160 28
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 28
                if ('>' == $part) {
163 15
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 28
        preg_match_all(
170 28
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 28
            \PREG_OFFSET_CAPTURE
174
        );
175 28
        foreach ($matches[1] as $part) {
176 5
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 28
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 28
        foreach ($matches[1] as $part) {
181 9
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 28
        return $content;
185
    }
186
187 27
    public function getSectionsText(?string $content): array
188
    {
189 27
        $sections = [];
190 27
        $content = ' '.$content.' ';
191 27
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 27
        if (preg_match_all('/(\sQ)?\s+((?:[^\n]*\sT[a-z]\s+)*)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+((?:[^\n]*\sT[a-z]\s+)*)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 25
            foreach ($matches[3] as $pos => $part) {
196 25
                $text = $part[0];
197 25
                if ('' === $text) {
198
                    continue;
199
                }
200 25
                $offset = $part[1];
201 25
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 25
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Tx commands which before BT.
207
                // @see: https://github.com/smalot/pdfparser/issues/542
208 25
                $section = trim((!empty($matches[2][$pos][0]) ? $matches[2][$pos][0] : '').$section);
209
210
                // Add Q and q flags if detected around BT/ET.
211
                // @see: https://github.com/smalot/pdfparser/issues/387
212 25
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
213
214 25
                $sections[] = $section;
215
            }
216
        }
217
218
        // Extract 'do' commands.
219 27
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
220 4
            foreach ($matches[1] as $part) {
221 4
                $text = $part[0];
222 4
                $offset = $part[1];
223 4
                $section = substr($content, $offset, \strlen($text));
224
225 4
                $sections[] = $section;
226
            }
227
        }
228
229 27
        return $sections;
230
    }
231
232 17
    private function getDefaultFont(Page $page = null): Font
233
    {
234 17
        $fonts = [];
235 17
        if (null !== $page) {
236 16
            $fonts = $page->getFonts();
237
        }
238
239 17
        $firstFont = $this->document->getFirstFont();
240 17
        if (null !== $firstFont) {
241 15
            $fonts[] = $firstFont;
242
        }
243
244 17
        if (\count($fonts) > 0) {
245 15
            return reset($fonts);
246
        }
247
248 2
        return new Font($this->document, null, null, $this->config);
249
    }
250
251
    /**
252
     * @throws \Exception
253
     */
254 17
    public function getText(?Page $page = null): string
255
    {
256 17
        $result = '';
257 17
        $sections = $this->getSectionsText($this->content);
258 17
        $current_font = $this->getDefaultFont($page);
259 17
        $clipped_font = $current_font;
260
261 17
        $current_position_td = ['x' => false, 'y' => false];
262 17
        $current_position_tm = ['x' => false, 'y' => false];
263
264 17
        self::$recursionStack[] = $this->getUniqueId();
265
266 17
        foreach ($sections as $section) {
267 15
            $commands = $this->getCommandsText($section);
268 15
            $reverse_text = false;
269 15
            $text = '';
270
271 15
            foreach ($commands as $command) {
272 15
                switch ($command[self::OPERATOR]) {
273 15
                    case 'BMC':
274 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
275 1
                            $reverse_text = true;
276
                        }
277 1
                        break;
278
279
                    // set character spacing
280 15
                    case 'Tc':
281 3
                        break;
282
283
                    // move text current point
284 15
                    case 'Td':
285 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
286 11
                        $y = array_pop($args);
287 11
                        $x = array_pop($args);
288 11
                        if (((float) $x <= 0) ||
289 11
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
290
                        ) {
291
                            // vertical offset
292 7
                            $text .= "\n";
293 11
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
294 11
                                $current_position_td['x']
295
                            )
296
                        ) {
297 8
                            $text .= $this->config->getHorizontalOffset();
298
                        }
299 11
                        $current_position_td = ['x' => $x, 'y' => $y];
300 11
                        break;
301
302
                    // move text current point and set leading
303 15
                    case 'TD':
304 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
305 1
                        $y = array_pop($args);
306 1
                        $x = array_pop($args);
307 1
                        if ((float) $y < 0) {
308 1
                            $text .= "\n";
309
                        } elseif ((float) $x <= 0) {
310
                            $text .= ' ';
311
                        }
312 1
                        break;
313
314 15
                    case 'Tf':
315 14
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
316 14
                        $id = trim($id, '/');
317 14
                        if (null !== $page) {
318 14
                            $new_font = $page->getFont($id);
319
                            // If an invalid font ID is given, do not update the font.
320
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
321
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
322
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
323
                            // But we want to make sure that malformed PDFs do not simply crash.
324 14
                            if (null !== $new_font) {
325 13
                                $current_font = $new_font;
326
                            }
327
                        }
328 14
                        break;
329
330 15
                    case 'Q':
331
                        // Use clip: restore font.
332 3
                        $current_font = $clipped_font;
333 3
                        break;
334
335 15
                    case 'q':
336
                        // Use clip: save font.
337 14
                        $clipped_font = $current_font;
338 14
                        break;
339
340 15
                    case "'":
341 15
                    case 'Tj':
342 10
                        $command[self::COMMAND] = [$command];
343
                        // no break
344 14
                    case 'TJ':
345 14
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
346 14
                        $text .= $sub_text;
347 14
                        break;
348
349
                    // set leading
350 12
                    case 'TL':
351 1
                        $text .= ' ';
352 1
                        break;
353
354 12
                    case 'Tm':
355 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
356 12
                        $y = array_pop($args);
357 12
                        $x = array_pop($args);
358 12
                        if (false !== $current_position_tm['x']) {
359 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
360 12
                            if ($delta > 10) {
361 10
                                $text .= "\t";
362
                            }
363
                        }
364 12
                        if (false !== $current_position_tm['y']) {
365 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
366 12
                            if ($delta > 10) {
367 8
                                $text .= "\n";
368
                            }
369
                        }
370 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
371 12
                        break;
372
373
                    // set super/subscripting text rise
374 9
                    case 'Ts':
375
                        break;
376
377
                    // set word spacing
378 9
                    case 'Tw':
379 2
                        break;
380
381
                    // set horizontal scaling
382 9
                    case 'Tz':
383
                        $text .= "\n";
384
                        break;
385
386
                    // move to start of next line
387 9
                    case 'T*':
388 2
                        $text .= "\n";
389 2
                        break;
390
391 8
                    case 'Da':
392
                        break;
393
394 8
                    case 'Do':
395 4
                        if (null !== $page) {
396 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
397 4
                            $id = trim(array_pop($args), '/ ');
398 4
                            $xobject = $page->getXObject($id);
399
400
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
401 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
402
                                // Not a circular reference.
403 4
                                $text .= $xobject->getText($page);
404
                            }
405
                        }
406 4
                        break;
407
408 6
                    case 'rg':
409 6
                    case 'RG':
410 1
                        break;
411
412 6
                    case 're':
413
                        break;
414
415 6
                    case 'co':
416
                        break;
417
418 6
                    case 'cs':
419 1
                        break;
420
421 6
                    case 'gs':
422 3
                        break;
423
424 5
                    case 'en':
425
                        break;
426
427 5
                    case 'sc':
428 5
                    case 'SC':
429
                        break;
430
431 5
                    case 'g':
432 5
                    case 'G':
433 1
                        break;
434
435 4
                    case 'V':
436
                        break;
437
438 4
                    case 'vo':
439 4
                    case 'Vo':
440
                        break;
441
442
                    default:
443
                }
444
            }
445
446
            // Fix Hebrew and other reverse text oriented languages.
447
            // @see: https://github.com/smalot/pdfparser/issues/398
448 15
            if ($reverse_text) {
449 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

449
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
450 1
                $text = implode('', array_reverse($chars));
451
            }
452
453 15
            $result .= $text;
454
        }
455
456 17
        return $result.' ';
457
    }
458
459
    /**
460
     * @throws \Exception
461
     */
462 5
    public function getTextArray(?Page $page = null): array
463
    {
464 5
        $text = [];
465 5
        $sections = $this->getSectionsText($this->content);
466 5
        $current_font = new Font($this->document, null, null, $this->config);
467
468 5
        foreach ($sections as $section) {
469 5
            $commands = $this->getCommandsText($section);
470
471 5
            foreach ($commands as $command) {
472 5
                switch ($command[self::OPERATOR]) {
473
                    // set character spacing
474 5
                    case 'Tc':
475 3
                        break;
476
477
                    // move text current point
478 5
                    case 'Td':
479 5
                        break;
480
481
                    // move text current point and set leading
482 5
                    case 'TD':
483
                        break;
484
485 5
                    case 'Tf':
486 5
                        if (null !== $page) {
487 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
488 5
                            $id = trim($id, '/');
489 5
                            $current_font = $page->getFont($id);
490
                        }
491 5
                        break;
492
493 5
                    case "'":
494 5
                    case 'Tj':
495 4
                        $command[self::COMMAND] = [$command];
496
                        // no break
497 5
                    case 'TJ':
498 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
499 5
                        $text[] = $sub_text;
500 5
                        break;
501
502
                    // set leading
503 5
                    case 'TL':
504 4
                        break;
505
506 5
                    case 'Tm':
507 4
                        break;
508
509
                    // set super/subscripting text rise
510 5
                    case 'Ts':
511
                        break;
512
513
                    // set word spacing
514 5
                    case 'Tw':
515 1
                        break;
516
517
                    // set horizontal scaling
518 5
                    case 'Tz':
519
                        //$text .= "\n";
520
                        break;
521
522
                    // move to start of next line
523 5
                    case 'T*':
524
                        //$text .= "\n";
525 4
                        break;
526
527 5
                    case 'Da':
528
                        break;
529
530 5
                    case 'Do':
531
                        if (null !== $page) {
532
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
533
                            $id = trim(array_pop($args), '/ ');
534
                            if ($xobject = $page->getXObject($id)) {
535
                                $text[] = $xobject->getText($page);
536
                            }
537
                        }
538
                        break;
539
540 5
                    case 'rg':
541 5
                    case 'RG':
542 2
                        break;
543
544 5
                    case 're':
545
                        break;
546
547 5
                    case 'co':
548
                        break;
549
550 5
                    case 'cs':
551
                        break;
552
553 5
                    case 'gs':
554 1
                        break;
555
556 5
                    case 'en':
557
                        break;
558
559 5
                    case 'sc':
560 5
                    case 'SC':
561
                        break;
562
563 5
                    case 'g':
564 5
                    case 'G':
565 2
                        break;
566
567 5
                    case 'V':
568
                        break;
569
570 5
                    case 'vo':
571 5
                    case 'Vo':
572
                        break;
573
574
                    default:
575
                }
576
            }
577
        }
578
579 5
        return $text;
580
    }
581
582 25
    public function getCommandsText(string $text_part, int &$offset = 0): array
583
    {
584 25
        $commands = $matches = [];
585
586 25
        while ($offset < \strlen($text_part)) {
587 25
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
588 25
            $char = $text_part[$offset];
589
590 25
            $operator = '';
591 25
            $type = '';
592 25
            $command = false;
593
594 25
            switch ($char) {
595 25
                case '/':
596 25
                    $type = $char;
597 25
                    if (preg_match(
598 25
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
599 25
                        substr($text_part, $offset),
600
                        $matches
601
                    )
602
                    ) {
603 24
                        $operator = $matches[2];
604 24
                        $command = $matches[1];
605 24
                        $offset += \strlen($matches[0]);
606 9
                    } elseif (preg_match(
607 9
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
608 9
                        substr($text_part, $offset),
609
                        $matches
610
                    )
611
                    ) {
612 9
                        $operator = $matches[2];
613 9
                        $command = $matches[1];
614 9
                        $offset += \strlen($matches[0]);
615
                    }
616 25
                    break;
617
618 25
                case '[':
619 25
                case ']':
620
                    // array object
621 21
                    $type = $char;
622 21
                    if ('[' == $char) {
623 21
                        ++$offset;
624
                        // get elements
625 21
                        $command = $this->getCommandsText($text_part, $offset);
626
627 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
628 21
                            $operator = trim($matches[0]);
629 21
                            $offset += \strlen($matches[0]);
630
                        }
631
                    } else {
632 21
                        ++$offset;
633 21
                        break;
634
                    }
635 21
                    break;
636
637 25
                case '<':
638 25
                case '>':
639
                    // array object
640 10
                    $type = $char;
641 10
                    ++$offset;
642 10
                    if ('<' == $char) {
643 10
                        $strpos = strpos($text_part, '>', $offset);
644 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
645 10
                        $offset = $strpos + 1;
646
                    }
647
648 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
649 8
                        $operator = trim($matches[0]);
650 8
                        $offset += \strlen($matches[0]);
651
                    }
652 10
                    break;
653
654 25
                case '(':
655 25
                case ')':
656 19
                    ++$offset;
657 19
                    $type = $char;
658 19
                    $strpos = $offset;
659 19
                    if ('(' == $char) {
660 19
                        $open_bracket = 1;
661 19
                        while ($open_bracket > 0) {
662 19
                            if (!isset($text_part[$strpos])) {
663
                                break;
664
                            }
665 19
                            $ch = $text_part[$strpos];
666 19
                            switch ($ch) {
667 19
                                case '\\':
668
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
669
                                    // skip next character
670 13
                                    ++$strpos;
671 13
                                    break;
672
673 19
                                case '(':
674
                                 // LEFT PARENHESIS (28h)
675
                                    ++$open_bracket;
676
                                    break;
677
678 19
                                case ')':
679
                                 // RIGHT PARENTHESIS (29h)
680 19
                                    --$open_bracket;
681 19
                                    break;
682
                            }
683 19
                            ++$strpos;
684
                        }
685 19
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
686 19
                        $offset = $strpos;
687
688 19
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
689 15
                            $operator = $matches[1];
690 15
                            $offset += \strlen($matches[0]);
691
                        }
692
                    }
693 19
                    break;
694
695
                default:
696 25
                    if ('ET' == substr($text_part, $offset, 2)) {
697 1
                        break;
698 25
                    } elseif (preg_match(
699 25
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
700 25
                        substr($text_part, $offset),
701
                        $matches
702
                    )
703
                    ) {
704 25
                        $operator = trim($matches['id']);
705 25
                        $command = trim($matches['data']);
706 25
                        $offset += \strlen($matches[0]);
707 24
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
708 18
                        $type = 'n';
709 18
                        $command = trim($matches[0]);
710 18
                        $offset += \strlen($matches[0]);
711 24
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
712 24
                        $type = '';
713 24
                        $operator = $matches[1];
714 24
                        $command = '';
715 24
                        $offset += \strlen($matches[0]);
716
                    }
717
            }
718
719 25
            if (false !== $command) {
720 25
                $commands[] = [
721 25
                    self::TYPE => $type,
722 25
                    self::OPERATOR => $operator,
723 25
                    self::COMMAND => $command,
724
                ];
725
            } else {
726 22
                break;
727
            }
728
        }
729
730 25
        return $commands;
731
    }
732
733 38
    public static function factory(
734
        Document $document,
735
        Header $header,
736
        ?string $content,
737
        ?Config $config = null
738
    ): self {
739 38
        switch ($header->get('Type')->getContent()) {
740 38
            case 'XObject':
741 8
                switch ($header->get('Subtype')->getContent()) {
742 8
                    case 'Image':
743 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

743
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
744
745 6
                    case 'Form':
746 6
                        return new Form($document, $header, $content, $config);
747
                }
748
749
                return new self($document, $header, $content, $config);
750
751 38
            case 'Pages':
752 37
                return new Pages($document, $header, $content, $config);
753
754 38
            case 'Page':
755 37
                return new Page($document, $header, $content, $config);
756
757 38
            case 'Encoding':
758 5
                return new Encoding($document, $header, $content, $config);
759
760 38
            case 'Font':
761 37
                $subtype = $header->get('Subtype')->getContent();
762 37
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
763
764 37
                if (class_exists($classname)) {
765 37
                    return new $classname($document, $header, $content, $config);
766
                }
767
768
                return new Font($document, $header, $content, $config);
769
770
            default:
771 38
                return new self($document, $header, $content, $config);
772
        }
773
    }
774
775
    /**
776
     * Returns unique id identifying the object.
777
     */
778 17
    protected function getUniqueId(): string
779
    {
780 17
        return spl_object_hash($this);
781
    }
782
}
783