Test Failed
Pull Request — master (#543)
by
unknown
07:36
created

PDFObject::getCommandsText()   F

Complexity

Conditions 27
Paths 65

Size

Total Lines 149
Code Lines 105

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 97
CRAP Score 27.0196

Importance

Changes 0
Metric Value
cc 27
eloc 105
nc 65
nop 2
dl 0
loc 149
rs 3.3333
c 0
b 0
f 0
ccs 97
cts 100
cp 0.97
crap 27.0196

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 58
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 58
        $this->document = $document;
81 58
        $this->header = $header ?? new Header();
82 58
        $this->content = $content;
83 58
        $this->config = $config;
84 58
    }
85
86 45
    public function init()
87
    {
88 45
    }
89
90 3
    public function getDocument(): Document
91
    {
92 3
        return $this->document;
93
    }
94
95 45
    public function getHeader(): ?Header
96
    {
97 45
        return $this->header;
98
    }
99
100 3
    public function getConfig(): ?Config
101
    {
102 3
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 46
    public function get(string $name)
109
    {
110 46
        return $this->header->get($name);
111
    }
112
113 43
    public function has(string $name): bool
114
    {
115 43
        return $this->header->has($name);
116
    }
117
118 3
    public function getDetails(bool $deep = true): array
119
    {
120 3
        return $this->header->getDetails($deep);
121
    }
122
123 35
    public function getContent(): ?string
124
    {
125 35
        return $this->content;
126
    }
127
128 28
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 28
        $char = $char[0];
131 28
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 28
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 28
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 28
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 28
        foreach ($matches[1] as $part) {
142 20
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 28
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 28
        foreach ($matches[1] as $part) {
148 18
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 28
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 28
            $content = '';
154 28
            $level = 0;
155 28
            foreach ($parts as $part) {
156 28
                if ('<' == $part) {
157 15
                    ++$level;
158
                }
159
160 28
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 28
                if ('>' == $part) {
163 15
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 28
        preg_match_all(
170 28
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 28
            \PREG_OFFSET_CAPTURE
174
        );
175 28
        foreach ($matches[1] as $part) {
176 5
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 28
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 28
        foreach ($matches[1] as $part) {
181 9
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 28
        return $content;
185
    }
186
187 27
    public function getSectionsText(?string $content): array
188
    {
189 27
        $sections = [];
190 27
        $content = ' '.$content.' ';
191 27
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 27
        if (preg_match_all('/(\sQ)?\s+((?:[^\n]*\sT[a-z]\s+)*)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+((?:[^\n]*\sT[a-z]\s+)*)BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 25
            foreach ($matches[3] as $pos => $part) {
196 25
                $text = $part[0];
197 25
                if ('' === $text) {
198
                    continue;
199
                }
200 25
                $offset = $part[1];
201 25
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 25
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Tx commands which before BT.
207
                // @see: https://github.com/smalot/pdfparser/issues/542
208 25
                $section = trim((!empty($matches[2][$pos][0]) ? $matches[2][$pos][0] : '').$section);
209
210
                // Add Q and q flags if detected around BT/ET.
211
                // @see: https://github.com/smalot/pdfparser/issues/387
212 25
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
213
214 25
                $sections[] = $section;
215
            }
216
        }
217
218
        // Extract 'do' commands.
219 27
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
220 4
            foreach ($matches[1] as $part) {
221 4
                $text = $part[0];
222 4
                $offset = $part[1];
223 4
                $section = substr($content, $offset, \strlen($text));
224
225 4
                $sections[] = $section;
226
            }
227
        }
228
229 27
        return $sections;
230
    }
231
232 17
    private function getDefaultFont(Page $page = null): Font
233
    {
234 17
        $fonts = [];
235 17
        if (null !== $page) {
236 16
            $fonts = $page->getFonts();
237
        }
238
239 17
        $firstFont = $this->document->getFirstFont();
240 17
        if (null !== $firstFont) {
241 15
            $fonts[] = $firstFont;
242
        }
243
244 17
        if (\count($fonts) > 0) {
245 15
            return reset($fonts);
246
        }
247
248 2
        return new Font($this->document, null, null, $this->config);
249
    }
250
251
    /**
252
     * @throws \Exception
253
     */
254 17
    public function getText(?Page $page = null): string
255
    {
256 17
        $result = '';
257 17
        $sections = $this->getSectionsText($this->content);
258 17
        $current_font = $this->getDefaultFont($page);
259 17
        $clipped_font = $current_font;
260
261 17
        $current_position_td = ['x' => false, 'y' => false];
262 17
        $current_position_tm = ['x' => false, 'y' => false];
263
264 17
        self::$recursionStack[] = $this->getUniqueId();
265
266 17
        foreach ($sections as $section) {
267 15
            $commands = $this->getCommandsText($section);
268 15
            $reverse_text = false;
269 15
            $text = '';
270
271 15
            foreach ($commands as $command) {
272 15
                switch ($command[self::OPERATOR]) {
273 15
                    case 'BMC':
274 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
275 1
                            $reverse_text = true;
276
                        }
277 1
                        break;
278
279
                    // set character spacing
280 15
                    case 'Tc':
281 3
                        break;
282
283
                    // move text current point
284 15
                    case 'Td':
285 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
286 11
                        $y = array_pop($args);
287 11
                        $x = array_pop($args);
288 11
                        if (((float) $x <= 0) ||
289 11
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
290
                        ) {
291
                            // vertical offset
292 7
                            $text .= "\n";
293 11
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
294 11
                                $current_position_td['x']
295
                            )
296
                        ) {
297 8
                            $text .= $this->config->getHorizontalOffset();
298
                        }
299 11
                        $current_position_td = ['x' => $x, 'y' => $y];
300 11
                        break;
301
302
                    // move text current point and set leading
303 15
                    case 'TD':
304 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
305 1
                        $y = array_pop($args);
306 1
                        $x = array_pop($args);
307 1
                        if ((float) $y < 0) {
308 1
                            $text .= "\n";
309
                        } elseif ((float) $x <= 0) {
310
                            $text .= ' ';
311
                        }
312 1
                        break;
313
314 15
                    case 'Tf':
315 14
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
316 14
                        $id = trim($id, '/');
317 14
                        if (null !== $page) {
318 14
                            $new_font = $page->getFont($id);
319
                            // If an invalid font ID is given, do not update the font.
320
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
321
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
322
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
323
                            // But we want to make sure that malformed PDFs do not simply crash.
324 14
                            if (null !== $new_font) {
325 13
                                $current_font = $new_font;
326
                            }
327
                        }
328 14
                        break;
329
330 15
                    case 'Q':
331
                        // Use clip: restore font.
332 3
                        $current_font = $clipped_font;
333 3
                        break;
334
335 15
                    case 'q':
336
                        // Use clip: save font.
337 14
                        $clipped_font = $current_font;
338 14
                        break;
339
340 15
                    case "'":
341 15
                    case 'Tj':
342 10
                        $command[self::COMMAND] = [$command];
343
                        // no break
344 14
                    case 'TJ':
345 14
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
346 14
                        $text .= $sub_text;
347 14
                        break;
348
349
                    // set leading
350 12
                    case 'TL':
351 1
                        $text .= ' ';
352 1
                        break;
353
354 12
                    case 'Tm':
355 12
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
356 12
                        $y = array_pop($args);
357 12
                        $x = array_pop($args);
358 12
                        if (false !== $current_position_tm['x']) {
359 12
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
360 12
                            if ($delta > 10) {
361 10
                                $text .= "\t";
362
                            }
363
                        }
364 12
                        if (false !== $current_position_tm['y']) {
365 12
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
366 12
                            if ($delta > 10) {
367 8
                                $text .= "\n";
368
                            }
369
                        }
370 12
                        $current_position_tm = ['x' => $x, 'y' => $y];
371 12
                        break;
372
373
                    // set super/subscripting text rise
374 9
                    case 'Ts':
375
                        break;
376
377
                    // set word spacing
378 9
                    case 'Tw':
379 2
                        break;
380
381
                    // set horizontal scaling
382 9
                    case 'Tz':
383
                        $text .= "\n";
384
                        break;
385
386
                    // move to start of next line
387 9
                    case 'T*':
388 2
                        $text .= "\n";
389 2
                        break;
390
391 8
                    case 'Da':
392
                        break;
393
394 8
                    case 'Do':
395 4
                        if (null !== $page) {
396 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
397 4
                            $id = trim(array_pop($args), '/ ');
398 4
                            $xobject = $page->getXObject($id);
399
400
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
401 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
402
                                // Not a circular reference.
403 4
                                $text .= $xobject->getText($page);
404
                            }
405
                        }
406 4
                        break;
407
408 6
                    case 'rg':
409 6
                    case 'RG':
410 1
                        break;
411
412 6
                    case 're':
413
                        break;
414
415 6
                    case 'co':
416
                        break;
417
418 6
                    case 'cs':
419 1
                        break;
420
421 6
                    case 'gs':
422 3
                        break;
423
424 5
                    case 'en':
425
                        break;
426
427 5
                    case 'sc':
428 5
                    case 'SC':
429
                        break;
430
431 5
                    case 'g':
432 5
                    case 'G':
433 1
                        break;
434
435 4
                    case 'V':
436
                        break;
437
438 4
                    case 'vo':
439 4
                    case 'Vo':
440
                        break;
441
442
                    default:
443
                }
444
            }
445
446
            // Fix Hebrew and other reverse text oriented languages.
447
            // @see: https://github.com/smalot/pdfparser/issues/398
448 15
            if ($reverse_text) {
449 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

449
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
450 1
                $text = implode('', array_reverse($chars));
451
            }
452
453 15
            $result .= $text;
454
        }
455
456 17
        return $result.' ';
457
    }
458
459
    /**
460
     * @throws \Exception
461
     */
462 5
    public function getTextArray(?Page $page = null): array
463
    {
464 5
        $text = [];
465 5
        $sections = $this->getSectionsText($this->content);
466 5
        $current_font = new Font($this->document, null, null, $this->config);
467
468 5
        foreach ($sections as $section) {
469 5
            $commands = $this->getCommandsText($section);
470
471 5
            foreach ($commands as $command) {
472 5
                switch ($command[self::OPERATOR]) {
473
                    // set character spacing
474 5
                    case 'Tc':
475 3
                        break;
476
477
                    // move text current point
478 5
                    case 'Td':
479 5
                        break;
480
481
                    // move text current point and set leading
482 5
                    case 'TD':
483
                        break;
484
485 5
                    case 'Tf':
486 5
                        if (null !== $page) {
487 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
488 5
                            $id = trim($id, '/');
489 5
                            $current_font = $page->getFont($id);
490
                        }
491 5
                        break;
492
493 5
                    case "'":
494 5
                    case 'Tj':
495 4
                        $command[self::COMMAND] = [$command];
496
                        // no break
497 5
                    case 'TJ':
498 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
499 5
                        $text[] = $sub_text;
500 5
                        break;
501
502
                    // set leading
503 5
                    case 'TL':
504 4
                        break;
505
506 5
                    case 'Tm':
507 4
                        break;
508
509
                    // set super/subscripting text rise
510 5
                    case 'Ts':
511
                        break;
512
513
                    // set word spacing
514 5
                    case 'Tw':
515 1
                        break;
516
517
                    // set horizontal scaling
518 5
                    case 'Tz':
519
                        //$text .= "\n";
520
                        break;
521
522
                    // move to start of next line
523 5
                    case 'T*':
524
                        //$text .= "\n";
525 4
                        break;
526
527 5
                    case 'Da':
528
                        break;
529
530 5
                    case 'Do':
531
                        if (null !== $page) {
532
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
533
                            $id = trim(array_pop($args), '/ ');
534
                            if ($xobject = $page->getXObject($id)) {
535
                                $text[] = $xobject->getText($page);
536
                            }
537
                        }
538
                        break;
539
540 5
                    case 'rg':
541 5
                    case 'RG':
542 2
                        break;
543
544 5
                    case 're':
545
                        break;
546
547 5
                    case 'co':
548
                        break;
549
550 5
                    case 'cs':
551
                        break;
552
553 5
                    case 'gs':
554 1
                        break;
555
556 5
                    case 'en':
557
                        break;
558
559 5
                    case 'sc':
560 5
                    case 'SC':
561
                        break;
562
563 5
                    case 'g':
564 5
                    case 'G':
565 2
                        break;
566
567 5
                    case 'V':
568
                        break;
569
570 5
                    case 'vo':
571 5
                    case 'Vo':
572
                        break;
573
574
                    default:
575
                }
576
            }
577
        }
578
579 5
        return $text;
580
    }
581
582 25
    public function getCommandsText(string $text_part, int &$offset = 0): array
583
    {
584 25
        $commands = $matches = [];
585
586 25
        while ($offset < \strlen($text_part)) {
587 25
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
588 25
            $char = $text_part[$offset];
589
590 25
            $operator = '';
591 25
            $type = '';
592 25
            $command = false;
593
594 25
            switch ($char) {
595 25
                case '/':
596 25
                    $type = $char;
597 25
                    if (preg_match(
598 25
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
599 25
                        substr($text_part, $offset),
600
                        $matches
601
                    )
602
                    ) {
603 24
                        $operator = $matches[2];
604 24
                        $command = $matches[1];
605 24
                        $offset += \strlen($matches[0]);
606 9
                    } elseif (preg_match(
607 9
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
608 9
                        substr($text_part, $offset),
609
                        $matches
610
                    )
611
                    ) {
612 9
                        $operator = $matches[2];
613 9
                        $command = $matches[1];
614 9
                        $offset += \strlen($matches[0]);
615
                    }
616 25
                    break;
617
618 25
                case '[':
619 25
                case ']':
620
                    // array object
621 21
                    $type = $char;
622 21
                    if ('[' == $char) {
623 21
                        ++$offset;
624
                        // get elements
625 21
                        $command = $this->getCommandsText($text_part, $offset);
626
627 21
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
628 21
                            $operator = trim($matches[0]);
629 21
                            $offset += \strlen($matches[0]);
630
                        }
631
                    } else {
632 21
                        ++$offset;
633 21
                        break;
634
                    }
635 21
                    break;
636
637 25
                case '<':
638 25
                case '>':
639
                    // array object
640 10
                    $type = $char;
641 10
                    ++$offset;
642 10
                    if ('<' == $char) {
643 10
                        $strpos = strpos($text_part, '>', $offset);
644 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
645 10
                        $offset = $strpos + 1;
646
                    }
647
648 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
649 8
                        $operator = trim($matches[0]);
650 8
                        $offset += \strlen($matches[0]);
651
                    }
652 10
                    break;
653
654 25
                case '(':
655 25
                case ')':
656 19
                    ++$offset;
657 19
                    $type = $char;
658 19
                    $strpos = $offset;
659 19
                    if ('(' == $char) {
660 19
                        $open_bracket = 1;
661 19
                        while ($open_bracket > 0) {
662 19
                            if (!isset($text_part[$strpos])) {
663
                                break;
664
                            }
665 19
                            $ch = $text_part[$strpos];
666 19
                            switch ($ch) {
667 19
                                case '\\':
668
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
669
                                    // skip next character
670 13
                                    ++$strpos;
671 13
                                    break;
672
673 19
                                case '(':
674
                                 // LEFT PARENHESIS (28h)
675
                                    ++$open_bracket;
676
                                    break;
677
678 19
                                case ')':
679
                                 // RIGHT PARENTHESIS (29h)
680 19
                                    --$open_bracket;
681 19
                                    break;
682
                            }
683 19
                            ++$strpos;
684
                        }
685 19
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
686 19
                        $offset = $strpos;
687
688 19
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
689 15
                            $operator = $matches[1];
690 15
                            $offset += \strlen($matches[0]);
691
                        }
692
                    }
693 19
                    break;
694
695
                default:
696 25
                    if ('ET' == substr($text_part, $offset, 2)) {
697 1
                        break;
698 25
                    } elseif (preg_match(
699 25
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
700 25
                        substr($text_part, $offset),
701
                        $matches
702
                    )
703
                    ) {
704 25
                        $operator = trim($matches['id']);
705 25
                        $command = trim($matches['data']);
706 25
                        $offset += \strlen($matches[0]);
707 24
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
708 18
                        $type = 'n';
709 18
                        $command = trim($matches[0]);
710 18
                        $offset += \strlen($matches[0]);
711 24
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
712 24
                        $type = '';
713 24
                        $operator = $matches[1];
714 24
                        $command = '';
715 24
                        $offset += \strlen($matches[0]);
716
                    }
717
            }
718
719 25
            if (false !== $command) {
720 25
                $commands[] = [
721 25
                    self::TYPE => $type,
722 25
                    self::OPERATOR => $operator,
723 25
                    self::COMMAND => $command,
724
                ];
725
            } else {
726 22
                break;
727
            }
728
        }
729
730 25
        return $commands;
731
    }
732
733 38
    public static function factory(
734
        Document $document,
735
        Header $header,
736
        ?string $content,
737
        ?Config $config = null
738
    ): self {
739 38
        switch ($header->get('Type')->getContent()) {
740 38
            case 'XObject':
741 8
                switch ($header->get('Subtype')->getContent()) {
742 8
                    case 'Image':
743 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

743
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
744
745 6
                    case 'Form':
746 6
                        return new Form($document, $header, $content, $config);
747
                }
748
749
                return new self($document, $header, $content, $config);
750
751 38
            case 'Pages':
752 37
                return new Pages($document, $header, $content, $config);
753
754 38
            case 'Page':
755 37
                return new Page($document, $header, $content, $config);
756
757 38
            case 'Encoding':
758 5
                return new Encoding($document, $header, $content, $config);
759
760 38
            case 'Font':
761 37
                $subtype = $header->get('Subtype')->getContent();
762 37
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
763
764 37
                if (class_exists($classname)) {
765 37
                    return new $classname($document, $header, $content, $config);
766
                }
767
768
                return new Font($document, $header, $content, $config);
769
770
            default:
771 38
                return new self($document, $header, $content, $config);
772
        }
773
    }
774
775
    /**
776
     * Returns unique id identifying the object.
777
     */
778 17
    protected function getUniqueId(): string
779
    {
780 17
        return spl_object_hash($this);
781
    }
782
}
783