Test Failed
Pull Request — master (#500)
by
unknown
08:29
created

PDFObject::getTextArray()   D

Complexity

Conditions 35
Paths 85

Size

Total Lines 118
Code Lines 73

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 55
CRAP Score 55.7453

Importance

Changes 0
Metric Value
cc 35
eloc 73
c 0
b 0
f 0
nc 85
nop 1
dl 0
loc 118
ccs 55
cts 74
cp 0.7432
crap 55.7453
rs 4.1666

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 53
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 53
        $this->document = $document;
81 53
        $this->header = null !== $header ? $header : new Header();
82 53
        $this->content = $content;
83 53
        $this->config = $config;
84 53
    }
85
86 40
    public function init()
87
    {
88 40
    }
89
90
    public function getDocument(): Document
91
    {
92
        return $this->document;
93
    }
94
95 40
    public function getHeader(): ?Header
96
    {
97 40
        return $this->header;
98
    }
99
100
    public function getConfig(): ?Config
101
    {
102
        return $this->config;
103
    }
104
105
    /**
106
     * @return Element|PDFObject|Header
107
     */
108 41
    public function get(string $name)
109
    {
110 41
        return $this->header->get($name);
111
    }
112
113 38
    public function has(string $name): bool
114
    {
115 38
        return $this->header->has($name);
116
    }
117
118 2
    public function getDetails(bool $deep = true): array
119
    {
120 2
        return $this->header->getDetails($deep);
121
    }
122
123 30
    public function getContent(): ?string
124
    {
125 30
        return $this->content;
126
    }
127
128 24
    public function cleanContent(string $content, string $char = 'X')
129
    {
130 24
        $char = $char[0];
131 24
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
132
133
        // Remove image bloc with binary content
134 24
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
135 24
        foreach ($matches[0] as $part) {
136
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
137
        }
138
139
        // Clean content in square brackets [.....]
140 24
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

140
        /** @scrutinizer ignore-call */ 
141
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
141 24
        foreach ($matches[1] as $part) {
142 16
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
143
        }
144
145
        // Clean content in round brackets (.....)
146 24
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
147 24
        foreach ($matches[1] as $part) {
148 13
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
149
        }
150
151
        // Clean structure
152 24
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

152
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
153 24
            $content = '';
154 24
            $level = 0;
155 24
            foreach ($parts as $part) {
156 24
                if ('<' == $part) {
157 14
                    ++$level;
158
                }
159
160 24
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
161
162 24
                if ('>' == $part) {
163 14
                    --$level;
164
                }
165
            }
166
        }
167
168
        // Clean BDC and EMC markup
169 24
        preg_match_all(
170 24
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
171
            $content,
172
            $matches,
173 24
            \PREG_OFFSET_CAPTURE
174
        );
175 24
        foreach ($matches[1] as $part) {
176 3
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
177
        }
178
179 24
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
180 24
        foreach ($matches[1] as $part) {
181 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
182
        }
183
184 24
        return $content;
185
    }
186
187 23
    public function getSectionsText(?string $content): array
188
    {
189 23
        $sections = [];
190 23
        $content = ' '.$content.' ';
191 23
        $textCleaned = $this->cleanContent($content, '_');
192
193
        // Extract text blocks.
194 23
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

194
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
195 21
            foreach ($matches[2] as $pos => $part) {
196 21
                $text = $part[0];
197 21
                if ('' === $text) {
198
                    continue;
199
                }
200 21
                $offset = $part[1];
201 21
                $section = substr($content, $offset, \strlen($text));
202
203
                // Removes BDC and EMC markup.
204 21
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
205
206
                // Add Q and q flags if detected around BT/ET.
207
                // @see: https://github.com/smalot/pdfparser/issues/387
208 21
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
209
210 21
                $sections[] = $section;
211
            }
212
        }
213
214
        // Extract 'do' commands.
215 23
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
216 4
            foreach ($matches[1] as $part) {
217 4
                $text = $part[0];
218 4
                $offset = $part[1];
219 4
                $section = substr($content, $offset, \strlen($text));
220
221 4
                $sections[] = $section;
222
            }
223
        }
224
225 23
        return $sections;
226
    }
227
228 13
    private function getDefaultFont(Page $page = null): Font
229
    {
230 13
        $fonts = [];
231 13
        if (null !== $page) {
232 12
            $fonts = $page->getFonts();
233
        }
234
235 13
        $firstFont = $this->document->getFirstFont();
236 13
        if (null !== $firstFont) {
237 11
            $fonts[] = $firstFont;
238
        }
239
240 13
        if (\count($fonts) > 0) {
241 11
            return reset($fonts);
242
        }
243
244 2
        return new Font($this->document, null, null, $this->config);
245
    }
246
247
    /**
248
     * @throws \Exception
249
     */
250 13
    public function getText(?Page $page = null): string
251
    {
252 13
        $result = '';
253 13
        $sections = $this->getSectionsText($this->content);
254 13
        $current_font = $this->getDefaultFont($page);
255 13
        $clipped_font = $current_font;
256
257 13
        $current_position_td = ['x' => false, 'y' => false];
258 13
        $current_position_tm = ['x' => false, 'y' => false];
259
260 13
        self::$recursionStack[] = $this->getUniqueId();
261
262 13
        foreach ($sections as $section) {
263 11
            $commands = $this->getCommandsText($section);
264 11
            $reverse_text = false;
265 11
            $text = '';
266
267 11
            foreach ($commands as $command) {
268 11
                switch ($command[self::OPERATOR]) {
269 11
                    case 'BMC':
270 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
271 1
                            $reverse_text = true;
272
                        }
273 1
                        break;
274
275
                    // set character spacing
276 11
                    case 'Tc':
277 2
                        break;
278
279
                    // move text current point
280 11
                    case 'Td':
281 8
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
282 8
                        $y = array_pop($args);
283 8
                        $x = array_pop($args);
284 8
                        if (((float) $x <= 0) ||
285 8
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
286
                        ) {
287
                            // vertical offset
288 6
                            $text .= "\n";
289 8
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
290 8
                                $current_position_td['x']
291
                            )
292
                        ) {
293
                            // horizontal offset
294 5
                            $text .= ' ';
295
                        }
296 8
                        $current_position_td = ['x' => $x, 'y' => $y];
297 8
                        break;
298
299
                    // move text current point and set leading
300 11
                    case 'TD':
301 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
302 1
                        $y = array_pop($args);
303 1
                        $x = array_pop($args);
304 1
                        if ((float) $y < 0) {
305 1
                            $text .= "\n";
306
                        } elseif ((float) $x <= 0) {
307
                            $text .= ' ';
308
                        }
309 1
                        break;
310
311 11
                    case 'Tf':
312 11
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
313 11
                        $id = trim($id, '/');
314 11
                        if (null !== $page) {
315 11
                            $new_font = $page->getFont($id);
316
                            // If an invalid font ID is given, do not update the font.
317
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
318
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
319
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
320
                            // But we want to make sure that malformed PDFs do not simply crash.
321 11
                            if (null !== $new_font) {
322 10
                                $current_font = $new_font;
323
                            }
324
                        }
325 11
                        break;
326
327 11
                    case 'Q':
328
                        // Use clip: restore font.
329 3
                        $current_font = $clipped_font;
330 3
                        break;
331
332 11
                    case 'q':
333
                        // Use clip: save font.
334 3
                        $clipped_font = $current_font;
335 3
                        break;
336
337 11
                    case "'":
338 11
                    case 'Tj':
339 6
                        $command[self::COMMAND] = [$command];
340
                        // no break
341 11
                    case 'TJ':
342 11
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
343 11
                        $text .= $sub_text;
344 11
                        break;
345
346
                    // set leading
347 9
                    case 'TL':
348 1
                        $text .= ' ';
349 1
                        break;
350
351 9
                    case 'Tm':
352 9
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
353 9
                        $y = array_pop($args);
354 9
                        $x = array_pop($args);
355 9
                        if (false !== $current_position_tm['x']) {
356 9
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
357 9
                            if ($delta > 10) {
358 7
                                $text .= "\t";
359
                            }
360
                        }
361 9
                        if (false !== $current_position_tm['y']) {
362 9
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
363 9
                            if ($delta > 10) {
364 7
                                $text .= "\n";
365
                            }
366
                        }
367 9
                        $current_position_tm = ['x' => $x, 'y' => $y];
368 9
                        break;
369
370
                    // set super/subscripting text rise
371 8
                    case 'Ts':
372
                        break;
373
374
                    // set word spacing
375 8
                    case 'Tw':
376 1
                        break;
377
378
                    // set horizontal scaling
379 8
                    case 'Tz':
380
                        $text .= "\n";
381
                        break;
382
383
                    // move to start of next line
384 8
                    case 'T*':
385 2
                        $text .= "\n";
386 2
                        break;
387
388 7
                    case 'Da':
389
                        break;
390
391 7
                    case 'Do':
392 4
                        if (null !== $page) {
393 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
394 4
                            $id = trim(array_pop($args), '/ ');
395 4
                            $xobject = $page->getXObject($id);
396
397
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
398 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
399
                                // Not a circular reference.
400 4
                                $text .= $xobject->getText($page);
401
                            }
402
                        }
403 4
                        break;
404
405 5
                    case 'rg':
406 5
                    case 'RG':
407 1
                        break;
408
409 5
                    case 're':
410
                        break;
411
412 5
                    case 'co':
413
                        break;
414
415 5
                    case 'cs':
416
                        break;
417
418 5
                    case 'gs':
419 3
                        break;
420
421 4
                    case 'en':
422
                        break;
423
424 4
                    case 'sc':
425 4
                    case 'SC':
426
                        break;
427
428 4
                    case 'g':
429 4
                    case 'G':
430 1
                        break;
431
432 3
                    case 'V':
433
                        break;
434
435 3
                    case 'vo':
436 3
                    case 'Vo':
437
                        break;
438
439
                    default:
440
                }
441
            }
442
443
            // Fix Hebrew and other reverse text oriented languages.
444
            // @see: https://github.com/smalot/pdfparser/issues/398
445 11
            if ($reverse_text) {
446 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

446
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
447 1
                $text = implode('', array_reverse($chars));
448
            }
449
450 11
            $result .= $text;
451
        }
452
453 13
        return $result.' ';
454
    }
455
456
    /**
457
     * @throws \Exception
458
     */
459 5
    public function getTextArray(?Page $page = null): array
460
    {
461 5
        $text = [];
462 5
        $sections = $this->getSectionsText($this->content);
463 5
        $current_font = new Font($this->document, null, null, $this->config);
464
465 5
        foreach ($sections as $section) {
466 5
            $commands = $this->getCommandsText($section);
467
468 5
            foreach ($commands as $command) {
469 5
                switch ($command[self::OPERATOR]) {
470
                    // set character spacing
471 5
                    case 'Tc':
472 2
                        break;
473
474
                    // move text current point
475 5
                    case 'Td':
476 5
                        break;
477
478
                    // move text current point and set leading
479 5
                    case 'TD':
480
                        break;
481
482 5
                    case 'Tf':
483 5
                        if (null !== $page) {
484 5
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
485 5
                            $id = trim($id, '/');
486 5
                            $current_font = $page->getFont($id);
487
                        }
488 5
                        break;
489
490 5
                    case "'":
491 5
                    case 'Tj':
492 4
                        $command[self::COMMAND] = [$command];
493
                        // no break
494 5
                    case 'TJ':
495 5
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
496 5
                        $text[] = $sub_text;
497 5
                        break;
498
499
                    // set leading
500 4
                    case 'TL':
501 3
                        break;
502
503 4
                    case 'Tm':
504 3
                        break;
505
506
                    // set super/subscripting text rise
507 4
                    case 'Ts':
508
                        break;
509
510
                    // set word spacing
511 4
                    case 'Tw':
512 1
                        break;
513
514
                    // set horizontal scaling
515 4
                    case 'Tz':
516
                        //$text .= "\n";
517
                        break;
518
519
                    // move to start of next line
520 4
                    case 'T*':
521
                        //$text .= "\n";
522 3
                        break;
523
524 3
                    case 'Da':
525
                        break;
526
527 3
                    case 'Do':
528
                        if (null !== $page) {
529
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
530
                            $id = trim(array_pop($args), '/ ');
531
                            if ($xobject = $page->getXObject($id)) {
532
                                $text[] = $xobject->getText($page);
533
                            }
534
                        }
535
                        break;
536
537 3
                    case 'rg':
538 3
                    case 'RG':
539 2
                        break;
540
541 3
                    case 're':
542
                        break;
543
544 3
                    case 'co':
545
                        break;
546
547 3
                    case 'cs':
548
                        break;
549
550 3
                    case 'gs':
551
                        break;
552
553 3
                    case 'en':
554
                        break;
555
556 3
                    case 'sc':
557 3
                    case 'SC':
558
                        break;
559
560 3
                    case 'g':
561 3
                    case 'G':
562 2
                        break;
563
564 1
                    case 'V':
565
                        break;
566
567 1
                    case 'vo':
568 1
                    case 'Vo':
569
                        break;
570
571
                    default:
572
                }
573
            }
574
        }
575
576 5
        return $text;
577
    }
578
579 21
    public function getCommandsText(string $text_part, int &$offset = 0): array
580
    {
581 21
        $commands = $matches = [];
582
583 21
        while ($offset < \strlen($text_part)) {
584 21
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
585 21
            $char = $text_part[$offset];
586
587 21
            $operator = '';
588 21
            $type = '';
589 21
            $command = false;
590
591 21
            switch ($char) {
592 21
                case '/':
593 21
                    $type = $char;
594 21
                    if (preg_match(
595 21
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
596 21
                        substr($text_part, $offset),
597
                        $matches
598
                    )
599
                    ) {
600 21
                        $operator = $matches[2];
601 21
                        $command = $matches[1];
602 21
                        $offset += \strlen($matches[0]);
603 7
                    } elseif (preg_match(
604 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
605 7
                        substr($text_part, $offset),
606
                        $matches
607
                    )
608
                    ) {
609 7
                        $operator = $matches[2];
610 7
                        $command = $matches[1];
611 7
                        $offset += \strlen($matches[0]);
612
                    }
613 21
                    break;
614
615 21
                case '[':
616 21
                case ']':
617
                    // array object
618 19
                    $type = $char;
619 19
                    if ('[' == $char) {
620 19
                        ++$offset;
621
                        // get elements
622 19
                        $command = $this->getCommandsText($text_part, $offset);
623
624 19
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
625 19
                            $operator = trim($matches[0]);
626 19
                            $offset += \strlen($matches[0]);
627
                        }
628
                    } else {
629 19
                        ++$offset;
630 19
                        break;
631
                    }
632 19
                    break;
633
634 21
                case '<':
635 21
                case '>':
636
                    // array object
637 10
                    $type = $char;
638 10
                    ++$offset;
639 10
                    if ('<' == $char) {
640 10
                        $strpos = strpos($text_part, '>', $offset);
641 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
642 10
                        $offset = $strpos + 1;
643
                    }
644
645 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
646 7
                        $operator = trim($matches[0]);
647 7
                        $offset += \strlen($matches[0]);
648
                    }
649 10
                    break;
650
651 21
                case '(':
652 21
                case ')':
653 14
                    ++$offset;
654 14
                    $type = $char;
655 14
                    $strpos = $offset;
656 14
                    if ('(' == $char) {
657 14
                        $open_bracket = 1;
658 14
                        while ($open_bracket > 0) {
659 14
                            if (!isset($text_part[$strpos])) {
660
                                break;
661
                            }
662 14
                            $ch = $text_part[$strpos];
663 14
                            switch ($ch) {
664 14
                                case '\\':
665
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
666
                                    // skip next character
667 9
                                    ++$strpos;
668 9
                                    break;
669
670 14
                                case '(':
671
                                 // LEFT PARENHESIS (28h)
672
                                    ++$open_bracket;
673
                                    break;
674
675 14
                                case ')':
676
                                 // RIGHT PARENTHESIS (29h)
677 14
                                    --$open_bracket;
678 14
                                    break;
679
                            }
680 14
                            ++$strpos;
681
                        }
682 14
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
683 14
                        $offset = $strpos;
684
685 14
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
686 10
                            $operator = $matches[1];
687 10
                            $offset += \strlen($matches[0]);
688
                        }
689
                    }
690 14
                    break;
691
692
                default:
693 21
                    if ('ET' == substr($text_part, $offset, 2)) {
694 1
                        break;
695 21
                    } elseif (preg_match(
696 21
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
697 21
                        substr($text_part, $offset),
698
                        $matches
699
                    )
700
                    ) {
701 21
                        $operator = trim($matches['id']);
702 21
                        $command = trim($matches['data']);
703 21
                        $offset += \strlen($matches[0]);
704 17
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
705 16
                        $type = 'n';
706 16
                        $command = trim($matches[0]);
707 16
                        $offset += \strlen($matches[0]);
708 12
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
709 12
                        $type = '';
710 12
                        $operator = $matches[1];
711 12
                        $command = '';
712 12
                        $offset += \strlen($matches[0]);
713
                    }
714
            }
715
716 21
            if (false !== $command) {
717 21
                $commands[] = [
718 21
                    self::TYPE => $type,
719 21
                    self::OPERATOR => $operator,
720 21
                    self::COMMAND => $command,
721
                ];
722
            } else {
723 19
                break;
724
            }
725
        }
726
727 21
        return $commands;
728
    }
729
730 33
    public static function factory(
731
        Document $document,
732
        Header $header,
733
        ?string $content,
734
        ?Config $config = null
735
    ): self {
736 33
        switch ($header->get('Type')->getContent()) {
737 33
            case 'XObject':
738 8
                switch ($header->get('Subtype')->getContent()) {
739 8
                    case 'Image':
740 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

740
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
741
742 6
                    case 'Form':
743 6
                        return new Form($document, $header, $content, $config);
744
                }
745
746
                return new self($document, $header, $content, $config);
747
748 33
            case 'Pages':
749 32
                return new Pages($document, $header, $content, $config);
750
751 33
            case 'Page':
752 32
                return new Page($document, $header, $content, $config);
753
754 33
            case 'Encoding':
755 3
                return new Encoding($document, $header, $content, $config);
756
757 33
            case 'Font':
758 32
                $subtype = $header->get('Subtype')->getContent();
759 32
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
760
761 32
                if (class_exists($classname)) {
762 32
                    return new $classname($document, $header, $content, $config);
763
                }
764
765
                return new Font($document, $header, $content, $config);
766
767
            default:
768 33
                return new self($document, $header, $content, $config);
769
        }
770
    }
771
772
    /**
773
     * Returns unique id identifying the object.
774
     */
775 13
    protected function getUniqueId(): string
776
    {
777 13
        return spl_object_hash($this);
778
    }
779
}
780