Passed
Pull Request — master (#457)
by
unknown
02:33
created

PDFObject::getCommandsText()   F

Complexity

Conditions 27
Paths 65

Size

Total Lines 149
Code Lines 105

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 97
CRAP Score 27.0196

Importance

Changes 0
Metric Value
cc 27
eloc 105
nc 65
nop 2
dl 0
loc 149
rs 3.3333
c 0
b 0
f 0
ccs 97
cts 100
cp 0.97
crap 27.0196

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74 48
    public function __construct(
75
        Document $document,
76
        ?Header $header = null,
77
        ?string $content = null,
78
        ?Config $config = null
79
    ) {
80 48
        $this->document = $document;
81 48
        $this->header = null !== $header ? $header : new Header();
82 48
        $this->content = $content;
83 48
        $this->config = $config;
84 48
    }
85
86 38
    public function init()
87
    {
88 38
    }
89
90 38
    public function getHeader(): ?Header
91
    {
92 38
        return $this->header;
93
    }
94
95
    /**
96
     * @return Element|PDFObject|Header
97
     */
98 38
    public function get(string $name)
99
    {
100 38
        return $this->header->get($name);
101
    }
102
103 36
    public function has(string $name): bool
104
    {
105 36
        return $this->header->has($name);
106
    }
107
108 2
    public function getDetails(bool $deep = true): array
109
    {
110 2
        return $this->header->getDetails($deep);
111
    }
112
113 29
    public function getContent(): ?string
114
    {
115 29
        return $this->content;
116
    }
117
118 23
    public function cleanContent(string $content, string $char = 'X')
119
    {
120 23
        $char = $char[0];
121 23
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
122
123
        // Remove image bloc with binary content
124 23
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
125 23
        foreach ($matches[0] as $part) {
126
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
127
        }
128
129
        // Clean content in square brackets [.....]
130 23
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

130
        /** @scrutinizer ignore-call */ 
131
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
131 23
        foreach ($matches[1] as $part) {
132 17
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
133
        }
134
135
        // Clean content in round brackets (.....)
136 23
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
137 23
        foreach ($matches[1] as $part) {
138 14
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
139
        }
140
141
        // Clean structure
142 23
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

142
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
143 23
            $content = '';
144 23
            $level = 0;
145 23
            foreach ($parts as $part) {
146 23
                if ('<' == $part) {
147 14
                    ++$level;
148
                }
149
150 23
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
151
152 23
                if ('>' == $part) {
153 14
                    --$level;
154
                }
155
            }
156
        }
157
158
        // Clean BDC and EMC markup
159 23
        preg_match_all(
160 23
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
161
            $content,
162
            $matches,
163 23
            \PREG_OFFSET_CAPTURE
164
        );
165 23
        foreach ($matches[1] as $part) {
166 3
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
167
        }
168
169 23
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
170 23
        foreach ($matches[1] as $part) {
171 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
172
        }
173
174 23
        return $content;
175
    }
176
177 22
    public function getSectionsText(?string $content): array
178
    {
179 22
        $sections = [];
180 22
        $content = ' '.$content.' ';
181 22
        $textCleaned = $this->cleanContent($content, '_');
182
183
        // Extract text blocks.
184 22
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

184
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
185 22
            foreach ($matches[2] as $pos => $part) {
186 22
                $text = $part[0];
187 22
                if ('' === $text) {
188
                    continue;
189
                }
190 22
                $offset = $part[1];
191 22
                $section = substr($content, $offset, \strlen($text));
192
193
                // Removes BDC and EMC markup.
194 22
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
195
196
                // Add Q and q flags if detected around BT/ET.
197
                // @see: https://github.com/smalot/pdfparser/issues/387
198 22
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
199
200 22
                $sections[] = $section;
201
            }
202
        }
203
204
        // Extract 'do' commands.
205 22
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
206 4
            foreach ($matches[1] as $part) {
207 4
                $text = $part[0];
208 4
                $offset = $part[1];
209 4
                $section = substr($content, $offset, \strlen($text));
210
211 4
                $sections[] = $section;
212
            }
213
        }
214
215 22
        return $sections;
216
    }
217
218 13
    private function getDefaultFont(Page $page = null): Font
219
    {
220 13
        $fonts = [];
221 13
        if (null !== $page) {
222 13
            $fonts = $page->getFonts();
223
        }
224
225 13
        $fonts[] = $this->document->getFirstFont();
226
227 13
        if (\count($fonts) > 0) {
228 13
            return reset($fonts);
229
        }
230
231
        return new Font($this->document, null, null, $this->config);
232
    }
233
234
    /**
235
     * @throws \Exception
236
     */
237 13
    public function getText(?Page $page = null): string
238
    {
239 13
        $result = '';
240 13
        $sections = $this->getSectionsText($this->content);
241 13
        $current_font = $this->getDefaultFont($page);
242 13
        $clipped_font = $current_font;
243
244 13
        $current_position_td = ['x' => false, 'y' => false];
245 13
        $current_position_tm = ['x' => false, 'y' => false];
246
247 13
        self::$recursionStack[] = $this->getUniqueId();
248
249 13
        foreach ($sections as $section) {
250 13
            $commands = $this->getCommandsText($section);
251 13
            $reverse_text = false;
252 13
            $text = '';
253
254 13
            foreach ($commands as $command) {
255 13
                switch ($command[self::OPERATOR]) {
256 13
                    case 'BMC':
257 1
                        if ('ReversedChars' == $command[self::COMMAND]) {
258 1
                            $reverse_text = true;
259
                        }
260 1
                        break;
261
262
                    // set character spacing
263 13
                    case 'Tc':
264 2
                        break;
265
266
                    // move text current point
267 13
                    case 'Td':
268 10
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
269 10
                        $y = array_pop($args);
270 10
                        $x = array_pop($args);
271 10
                        if (((float) $x <= 0) ||
272 10
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
273
                        ) {
274
                            // vertical offset
275 6
                            $text .= "\n";
276 10
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
277 10
                                $current_position_td['x']
278
                            )
279
                        ) {
280
                            // horizontal offset
281 7
                            $text .= ' ';
282
                        }
283 10
                        $current_position_td = ['x' => $x, 'y' => $y];
284 10
                        break;
285
286
                    // move text current point and set leading
287 13
                    case 'TD':
288 1
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
289 1
                        $y = array_pop($args);
290 1
                        $x = array_pop($args);
291 1
                        if ((float) $y < 0) {
292 1
                            $text .= "\n";
293
                        } elseif ((float) $x <= 0) {
294
                            $text .= ' ';
295
                        }
296 1
                        break;
297
298 13
                    case 'Tf':
299 13
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
300 13
                        $id = trim($id, '/');
301 13
                        if (null !== $page) {
302 13
                            $new_font = $page->getFont($id);
303
                            // If an invalid font ID is given, do not update the font.
304
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
305
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
306
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
307
                            // But we want to make sure that malformed PDFs do not simply crash.
308 13
                            if (null !== $new_font) {
309 12
                                $current_font = $new_font;
310
                            }
311
                        }
312 13
                        break;
313
314 13
                    case 'Q':
315
                        // Use clip: restore font.
316 3
                        $current_font = $clipped_font;
317 3
                        break;
318
319 13
                    case 'q':
320
                        // Use clip: save font.
321 3
                        $clipped_font = $current_font;
322 3
                        break;
323
324 13
                    case "'":
325 13
                    case 'Tj':
326 8
                        $command[self::COMMAND] = [$command];
327
                        // no break
328 13
                    case 'TJ':
329 13
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
330 13
                        $text .= $sub_text;
331 13
                        break;
332
333
                    // set leading
334 11
                    case 'TL':
335 1
                        $text .= ' ';
336 1
                        break;
337
338 11
                    case 'Tm':
339 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
340 11
                        $y = array_pop($args);
341 11
                        $x = array_pop($args);
342 11
                        if (false !== $current_position_tm['x']) {
343 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
344 11
                            if ($delta > 10) {
345 9
                                $text .= "\t";
346
                            }
347
                        }
348 11
                        if (false !== $current_position_tm['y']) {
349 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
350 11
                            if ($delta > 10) {
351 7
                                $text .= "\n";
352
                            }
353
                        }
354 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
355 11
                        break;
356
357
                    // set super/subscripting text rise
358 8
                    case 'Ts':
359
                        break;
360
361
                    // set word spacing
362 8
                    case 'Tw':
363 1
                        break;
364
365
                    // set horizontal scaling
366 8
                    case 'Tz':
367
                        $text .= "\n";
368
                        break;
369
370
                    // move to start of next line
371 8
                    case 'T*':
372 2
                        $text .= "\n";
373 2
                        break;
374
375 7
                    case 'Da':
376
                        break;
377
378 7
                    case 'Do':
379 4
                        if (null !== $page) {
380 4
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
381 4
                            $id = trim(array_pop($args), '/ ');
382 4
                            $xobject = $page->getXObject($id);
383
384
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
385 4
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
386
                                // Not a circular reference.
387 4
                                $text .= $xobject->getText($page);
388
                            }
389
                        }
390 4
                        break;
391
392 5
                    case 'rg':
393 5
                    case 'RG':
394 1
                        break;
395
396 5
                    case 're':
397
                        break;
398
399 5
                    case 'co':
400
                        break;
401
402 5
                    case 'cs':
403
                        break;
404
405 5
                    case 'gs':
406 3
                        break;
407
408 4
                    case 'en':
409
                        break;
410
411 4
                    case 'sc':
412 4
                    case 'SC':
413
                        break;
414
415 4
                    case 'g':
416 4
                    case 'G':
417 1
                        break;
418
419 3
                    case 'V':
420
                        break;
421
422 3
                    case 'vo':
423 3
                    case 'Vo':
424
                        break;
425
426
                    default:
427
                }
428
            }
429
430
            // Fix Hebrew and other reverse text oriented languages.
431
            // @see: https://github.com/smalot/pdfparser/issues/398
432 13
            if ($reverse_text) {
433 1
                $chars = mb_str_split($text, 1, mb_internal_encoding());
0 ignored issues
show
Bug introduced by
It seems like mb_internal_encoding() can also be of type true; however, parameter $encoding of mb_str_split() does only seem to accept null|string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

433
                $chars = mb_str_split($text, 1, /** @scrutinizer ignore-type */ mb_internal_encoding());
Loading history...
434 1
                $text = implode('', array_reverse($chars));
435
            }
436
437 13
            $result .= $text;
438
        }
439
440 13
        return $result.' ';
441
    }
442
443
    /**
444
     * @throws \Exception
445
     */
446 4
    public function getTextArray(?Page $page = null): array
447
    {
448 4
        $text = [];
449 4
        $sections = $this->getSectionsText($this->content);
450 4
        $current_font = new Font($this->document, null, null, $this->config);
451
452 4
        foreach ($sections as $section) {
453 4
            $commands = $this->getCommandsText($section);
454
455 4
            foreach ($commands as $command) {
456 4
                switch ($command[self::OPERATOR]) {
457
                    // set character spacing
458 4
                    case 'Tc':
459 2
                        break;
460
461
                    // move text current point
462 4
                    case 'Td':
463 4
                        break;
464
465
                    // move text current point and set leading
466 4
                    case 'TD':
467
                        break;
468
469 4
                    case 'Tf':
470 4
                        if (null !== $page) {
471 4
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
472 4
                            $id = trim($id, '/');
473 4
                            $current_font = $page->getFont($id);
474
                        }
475 4
                        break;
476
477 4
                    case "'":
478 4
                    case 'Tj':
479 3
                        $command[self::COMMAND] = [$command];
480
                        // no break
481 4
                    case 'TJ':
482 4
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
483 4
                        $text[] = $sub_text;
484 4
                        break;
485
486
                    // set leading
487 3
                    case 'TL':
488 2
                        break;
489
490 3
                    case 'Tm':
491 2
                        break;
492
493
                    // set super/subscripting text rise
494 3
                    case 'Ts':
495
                        break;
496
497
                    // set word spacing
498 3
                    case 'Tw':
499 1
                        break;
500
501
                    // set horizontal scaling
502 3
                    case 'Tz':
503
                        //$text .= "\n";
504
                        break;
505
506
                    // move to start of next line
507 3
                    case 'T*':
508
                        //$text .= "\n";
509 2
                        break;
510
511 3
                    case 'Da':
512
                        break;
513
514 3
                    case 'Do':
515
                        if (null !== $page) {
516
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
517
                            $id = trim(array_pop($args), '/ ');
518
                            if ($xobject = $page->getXObject($id)) {
519
                                $text[] = $xobject->getText($page);
520
                            }
521
                        }
522
                        break;
523
524 3
                    case 'rg':
525 3
                    case 'RG':
526 2
                        break;
527
528 3
                    case 're':
529
                        break;
530
531 3
                    case 'co':
532
                        break;
533
534 3
                    case 'cs':
535
                        break;
536
537 3
                    case 'gs':
538
                        break;
539
540 3
                    case 'en':
541
                        break;
542
543 3
                    case 'sc':
544 3
                    case 'SC':
545
                        break;
546
547 3
                    case 'g':
548 3
                    case 'G':
549 2
                        break;
550
551 1
                    case 'V':
552
                        break;
553
554 1
                    case 'vo':
555 1
                    case 'Vo':
556
                        break;
557
558
                    default:
559
                }
560
            }
561
        }
562
563 4
        return $text;
564
    }
565
566 22
    public function getCommandsText(string $text_part, int &$offset = 0): array
567
    {
568 22
        $commands = $matches = [];
569
570 22
        while ($offset < \strlen($text_part)) {
571 22
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
572 22
            $char = $text_part[$offset];
573
574 22
            $operator = '';
575 22
            $type = '';
576 22
            $command = false;
577
578 22
            switch ($char) {
579 22
                case '/':
580 22
                    $type = $char;
581 22
                    if (preg_match(
582 22
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
583 22
                        substr($text_part, $offset),
584
                        $matches
585
                    )
586
                    ) {
587 22
                        $operator = $matches[2];
588 22
                        $command = $matches[1];
589 22
                        $offset += \strlen($matches[0]);
590 7
                    } elseif (preg_match(
591 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
592 7
                        substr($text_part, $offset),
593
                        $matches
594
                    )
595
                    ) {
596 7
                        $operator = $matches[2];
597 7
                        $command = $matches[1];
598 7
                        $offset += \strlen($matches[0]);
599
                    }
600 22
                    break;
601
602 22
                case '[':
603 22
                case ']':
604
                    // array object
605 20
                    $type = $char;
606 20
                    if ('[' == $char) {
607 20
                        ++$offset;
608
                        // get elements
609 20
                        $command = $this->getCommandsText($text_part, $offset);
610
611 20
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
612 20
                            $operator = trim($matches[0]);
613 20
                            $offset += \strlen($matches[0]);
614
                        }
615
                    } else {
616 20
                        ++$offset;
617 20
                        break;
618
                    }
619 20
                    break;
620
621 22
                case '<':
622 22
                case '>':
623
                    // array object
624 10
                    $type = $char;
625 10
                    ++$offset;
626 10
                    if ('<' == $char) {
627 10
                        $strpos = strpos($text_part, '>', $offset);
628 10
                        $command = substr($text_part, $offset, ($strpos - $offset));
629 10
                        $offset = $strpos + 1;
630
                    }
631
632 10
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
633 7
                        $operator = trim($matches[0]);
634 7
                        $offset += \strlen($matches[0]);
635
                    }
636 10
                    break;
637
638 22
                case '(':
639 22
                case ')':
640 15
                    ++$offset;
641 15
                    $type = $char;
642 15
                    $strpos = $offset;
643 15
                    if ('(' == $char) {
644 15
                        $open_bracket = 1;
645 15
                        while ($open_bracket > 0) {
646 15
                            if (!isset($text_part[$strpos])) {
647
                                break;
648
                            }
649 15
                            $ch = $text_part[$strpos];
650 15
                            switch ($ch) {
651 15
                                case '\\':
652
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
653
                                    // skip next character
654 10
                                    ++$strpos;
655 10
                                    break;
656
657 15
                                case '(':
658
                                 // LEFT PARENHESIS (28h)
659
                                    ++$open_bracket;
660
                                    break;
661
662 15
                                case ')':
663
                                 // RIGHT PARENTHESIS (29h)
664 15
                                    --$open_bracket;
665 15
                                    break;
666
                            }
667 15
                            ++$strpos;
668
                        }
669 15
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
670 15
                        $offset = $strpos;
671
672 15
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
673 11
                            $operator = $matches[1];
674 11
                            $offset += \strlen($matches[0]);
675
                        }
676
                    }
677 15
                    break;
678
679
                default:
680 22
                    if ('ET' == substr($text_part, $offset, 2)) {
681 1
                        break;
682 22
                    } elseif (preg_match(
683 22
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
684 22
                        substr($text_part, $offset),
685
                        $matches
686
                    )
687
                    ) {
688 22
                        $operator = trim($matches['id']);
689 22
                        $command = trim($matches['data']);
690 22
                        $offset += \strlen($matches[0]);
691 18
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
692 17
                        $type = 'n';
693 17
                        $command = trim($matches[0]);
694 17
                        $offset += \strlen($matches[0]);
695 11
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
696 11
                        $type = '';
697 11
                        $operator = $matches[1];
698 11
                        $command = '';
699 11
                        $offset += \strlen($matches[0]);
700
                    }
701
            }
702
703 22
            if (false !== $command) {
704 22
                $commands[] = [
705 22
                    self::TYPE => $type,
706 22
                    self::OPERATOR => $operator,
707 22
                    self::COMMAND => $command,
708
                ];
709
            } else {
710 20
                break;
711
            }
712
        }
713
714 22
        return $commands;
715
    }
716
717 31
    public static function factory(
718
        Document $document,
719
        Header $header,
720
        ?string $content,
721
        ?Config $config = null
722
    ): self {
723 31
        switch ($header->get('Type')->getContent()) {
724 31
            case 'XObject':
725 5
                switch ($header->get('Subtype')->getContent()) {
726 5
                    case 'Image':
727 3
                        return new Image($document, $header, $config->getRetainImageContent() ? $content : null, $config);
0 ignored issues
show
Bug introduced by
The method getRetainImageContent() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

727
                        return new Image($document, $header, $config->/** @scrutinizer ignore-call */ getRetainImageContent() ? $content : null, $config);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
728
729 3
                    case 'Form':
730 3
                        return new Form($document, $header, $content, $config);
731
                }
732
733
                return new self($document, $header, $content, $config);
734
735 31
            case 'Pages':
736 30
                return new Pages($document, $header, $content, $config);
737
738 31
            case 'Page':
739 30
                return new Page($document, $header, $content, $config);
740
741 31
            case 'Encoding':
742 5
                return new Encoding($document, $header, $content, $config);
743
744 31
            case 'Font':
745 30
                $subtype = $header->get('Subtype')->getContent();
746 30
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
747
748 30
                if (class_exists($classname)) {
749 30
                    return new $classname($document, $header, $content, $config);
750
                }
751
752
                return new Font($document, $header, $content, $config);
753
754
            default:
755 31
                return new self($document, $header, $content, $config);
756
        }
757
    }
758
759
    /**
760
     * Returns unique id identifying the object.
761
     */
762 13
    protected function getUniqueId(): string
763
    {
764 13
        return spl_object_hash($this);
765
    }
766
}
767