Passed
Push — master ( 7d3c67...74a074 )
by Konrad
02:29
created

PDFObject::cleanContent()   B

Complexity

Conditions 11
Paths 64

Size

Total Lines 57
Code Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 11.0044

Importance

Changes 0
Metric Value
cc 11
eloc 31
c 0
b 0
f 0
nc 64
nop 2
dl 0
loc 57
ccs 29
cts 30
cp 0.9667
crap 11.0044
rs 7.3166

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\XObject\Form;
34
use Smalot\PdfParser\XObject\Image;
35
36
/**
37
 * Class PDFObject
38
 */
39
class PDFObject
40
{
41
    const TYPE = 't';
42
43
    const OPERATOR = 'o';
44
45
    const COMMAND = 'c';
46
47
    /**
48
     * The recursion stack.
49
     *
50
     * @var array
51
     */
52
    public static $recursionStack = [];
53
54
    /**
55
     * @var Document
56
     */
57
    protected $document = null;
58
59
    /**
60
     * @var Header
61
     */
62
    protected $header = null;
63
64
    /**
65
     * @var string
66
     */
67
    protected $content = null;
68
69
    /**
70
     * @var Config
71
     */
72
    protected $config;
73
74
    /**
75
     * @param Header $header
76
     * @param string $content
77
     * @param Config $config
78
     */
79 43
    public function __construct(
80
        Document $document,
81
        Header $header = null,
82
        $content = null,
83
        Config $config = null
84
    ) {
85 43
        $this->document = $document;
86 43
        $this->header = null !== $header ? $header : new Header();
87 43
        $this->content = $content;
88 43
        $this->config = $config;
89 43
    }
90
91 35
    public function init()
92
    {
93 35
    }
94
95
    /**
96
     * @return Header|null
97
     */
98 35
    public function getHeader()
99
    {
100 35
        return $this->header;
101
    }
102
103
    /**
104
     * @param string $name
105
     *
106
     * @return Element|PDFObject
107
     */
108 33
    public function get($name)
109
    {
110 33
        return $this->header->get($name);
111
    }
112
113
    /**
114
     * @param string $name
115
     *
116
     * @return bool
117
     */
118 32
    public function has($name)
119
    {
120 32
        return $this->header->has($name);
121
    }
122
123
    /**
124
     * @param bool $deep
125
     *
126
     * @return array
127
     */
128 3
    public function getDetails($deep = true)
129
    {
130 3
        return $this->header->getDetails($deep);
131
    }
132
133
    /**
134
     * @return string|null
135
     */
136 26
    public function getContent()
137
    {
138 26
        return $this->content;
139
    }
140
141
    /**
142
     * @param string $content
143
     */
144 20
    public function cleanContent($content, $char = 'X')
145
    {
146 20
        $char = $char[0];
147 20
        $content = str_replace(['\\\\', '\\)', '\\('], $char.$char, $content);
148
149
        // Remove image bloc with binary content
150 20
        preg_match_all('/\s(BI\s.*?(\sID\s).*?(\sEI))\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
151 20
        foreach ($matches[0] as $part) {
152
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
153
        }
154
155
        // Clean content in square brackets [.....]
156 20
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

156
        /** @scrutinizer ignore-call */ 
157
        preg_match_all('/\[((\(.*?\)|[0-9\.\-\s]*)*)\]/s', $content, $matches, \PREG_OFFSET_CAPTURE);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
157 20
        foreach ($matches[1] as $part) {
158 16
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
159
        }
160
161
        // Clean content in round brackets (.....)
162 20
        preg_match_all('/\((.*?)\)/s', $content, $matches, \PREG_OFFSET_CAPTURE);
163 20
        foreach ($matches[1] as $part) {
164 15
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
165
        }
166
167
        // Clean structure
168 20
        if ($parts = preg_split('/(<|>)/s', $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
0 ignored issues
show
Bug introduced by
It seems like $content can also be of type array; however, parameter $subject of preg_split() does only seem to accept string, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

168
        if ($parts = preg_split('/(<|>)/s', /** @scrutinizer ignore-type */ $content, -1, \PREG_SPLIT_NO_EMPTY | \PREG_SPLIT_DELIM_CAPTURE)) {
Loading history...
169 20
            $content = '';
170 20
            $level = 0;
171 20
            foreach ($parts as $part) {
172 20
                if ('<' == $part) {
173 13
                    ++$level;
174
                }
175
176 20
                $content .= (0 == $level ? $part : str_repeat($char, \strlen($part)));
177
178 20
                if ('>' == $part) {
179 13
                    --$level;
180
                }
181
            }
182
        }
183
184
        // Clean BDC and EMC markup
185 20
        preg_match_all(
186 20
            '/(\/[A-Za-z0-9\_]*\s*'.preg_quote($char).'*BDC)/s',
187
            $content,
188
            $matches,
189 20
            \PREG_OFFSET_CAPTURE
190
        );
191 20
        foreach ($matches[1] as $part) {
192 4
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
193
        }
194
195 20
        preg_match_all('/\s(EMC)\s/s', $content, $matches, \PREG_OFFSET_CAPTURE);
196 20
        foreach ($matches[1] as $part) {
197 7
            $content = substr_replace($content, str_repeat($char, \strlen($part[0])), $part[1], \strlen($part[0]));
198
        }
199
200 20
        return $content;
201
    }
202
203
    /**
204
     * @param string $content
205
     *
206
     * @return array
207
     */
208 19
    public function getSectionsText($content)
209
    {
210 19
        $sections = [];
211 19
        $content = ' '.$content.' ';
212 19
        $textCleaned = $this->cleanContent($content, '_');
213
214
        // Extract text blocks.
215 19
        if (preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
0 ignored issues
show
Unused Code introduced by
The call to preg_match_all() has too many arguments starting with PREG_OFFSET_CAPTURE. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

215
        if (/** @scrutinizer ignore-call */ preg_match_all('/(\sQ)?\s+BT[\s|\(|\[]+(.*?)\s*ET(\sq)?/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
216 19
            foreach ($matches[2] as $pos => $part) {
217 19
                $text = $part[0];
218 19
                if ('' === $text) {
219
                    continue;
220
                }
221 19
                $offset = $part[1];
222 19
                $section = substr($content, $offset, \strlen($text));
223
224
                // Removes BDC and EMC markup.
225 19
                $section = preg_replace('/(\/[A-Za-z0-9]+\s*<<.*?)(>>\s*BDC)(.*?)(EMC\s+)/s', '${3}', $section.' ');
226
227
                // Add Q and q flags if detected around BT/ET.
228
                // @see: https://github.com/smalot/pdfparser/issues/387
229 19
                $section = trim((!empty($matches[1][$pos][0]) ? "Q\n" : '').$section).(!empty($matches[3][$pos][0]) ? "\nq" : '');
230
231 19
                $sections[] = $section;
232
            }
233
        }
234
235
        // Extract 'do' commands.
236 19
        if (preg_match_all('/(\/[A-Za-z0-9\.\-_]+\s+Do)\s/s', $textCleaned, $matches, \PREG_OFFSET_CAPTURE)) {
237 5
            foreach ($matches[1] as $part) {
238 5
                $text = $part[0];
239 5
                $offset = $part[1];
240 5
                $section = substr($content, $offset, \strlen($text));
241
242 5
                $sections[] = $section;
243
            }
244
        }
245
246 19
        return $sections;
247
    }
248
249 12
    private function getDefaultFont(Page $page = null)
250
    {
251 12
        $fonts = [];
252 12
        if (null !== $page) {
253 12
            $fonts = $page->getFonts();
254
        }
255
256 12
        $fonts = array_merge($fonts, array_values($this->document->getFonts()));
257
258 12
        if (\count($fonts) > 0) {
259 12
            return reset($fonts);
260
        }
261
262
        return new Font($this->document);
263
    }
264
265
    /**
266
     * @param Page $page
267
     *
268
     * @return string
269
     *
270
     * @throws \Exception
271
     */
272 12
    public function getText(Page $page = null)
273
    {
274 12
        $text = '';
275 12
        $sections = $this->getSectionsText($this->content);
276 12
        $current_font = $this->getDefaultFont($page);
277 12
        $clipped_font = $current_font;
278
279 12
        $current_position_td = ['x' => false, 'y' => false];
280 12
        $current_position_tm = ['x' => false, 'y' => false];
281
282 12
        self::$recursionStack[] = $this->getUniqueId();
283
284 12
        foreach ($sections as $section) {
285 12
            $commands = $this->getCommandsText($section);
286
287 12
            foreach ($commands as $command) {
288 12
                switch ($command[self::OPERATOR]) {
289
                    // set character spacing
290 12
                    case 'Tc':
291 3
                        break;
292
293
                    // move text current point
294 12
                    case 'Td':
295 9
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
296 9
                        $y = array_pop($args);
297 9
                        $x = array_pop($args);
298 9
                        if (((float) $x <= 0) ||
299 9
                            (false !== $current_position_td['y'] && (float) $y < (float) ($current_position_td['y']))
300
                        ) {
301
                            // vertical offset
302 6
                            $text .= "\n";
303 9
                        } elseif (false !== $current_position_td['x'] && (float) $x > (float) (
304 9
                                $current_position_td['x']
305
                            )
306
                        ) {
307
                            // horizontal offset
308 7
                            $text .= ' ';
309
                        }
310 9
                        $current_position_td = ['x' => $x, 'y' => $y];
311 9
                        break;
312
313
                    // move text current point and set leading
314 12
                    case 'TD':
315 2
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
316 2
                        $y = array_pop($args);
317 2
                        $x = array_pop($args);
318 2
                        if ((float) $y < 0) {
319 2
                            $text .= "\n";
320
                        } elseif ((float) $x <= 0) {
321
                            $text .= ' ';
322
                        }
323 2
                        break;
324
325 12
                    case 'Tf':
326 12
                        list($id) = preg_split('/\s/s', $command[self::COMMAND]);
327 12
                        $id = trim($id, '/');
328 12
                        if (null !== $page) {
329 12
                            $new_font = $page->getFont($id);
330
                            // If an invalid font ID is given, do not update the font.
331
                            // This should theoretically never happen, as the PDF spec states for the Tf operator:
332
                            // "The specified font value shall match a resource name in the Font entry of the default resource dictionary"
333
                            // (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 435)
334
                            // But we want to make sure that malformed PDFs do not simply crash.
335 12
                            if (null !== $new_font) {
336 11
                                $current_font = $new_font;
337
                            }
338
                        }
339 12
                        break;
340
341 12
                    case 'Q':
342
                        // Use clip: restore font.
343 4
                        $current_font = $clipped_font;
344 4
                        break;
345
346 12
                    case 'q':
347
                        // Use clip: save font.
348 4
                        $clipped_font = $current_font;
349 4
                        break;
350
351 12
                    case "'":
352 12
                    case 'Tj':
353 8
                        $command[self::COMMAND] = [$command];
354
                        // no break
355 12
                    case 'TJ':
356 12
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
357 12
                        $text .= $sub_text;
358 12
                        break;
359
360
                    // set leading
361 11
                    case 'TL':
362 1
                        $text .= ' ';
363 1
                        break;
364
365 11
                    case 'Tm':
366 11
                        $args = preg_split('/\s/s', $command[self::COMMAND]);
367 11
                        $y = array_pop($args);
368 11
                        $x = array_pop($args);
369 11
                        if (false !== $current_position_tm['x']) {
370 11
                            $delta = abs((float) $x - (float) ($current_position_tm['x']));
371 11
                            if ($delta > 10) {
372 9
                                $text .= "\t";
373
                            }
374
                        }
375 11
                        if (false !== $current_position_tm['y']) {
376 11
                            $delta = abs((float) $y - (float) ($current_position_tm['y']));
377 11
                            if ($delta > 10) {
378 7
                                $text .= "\n";
379
                            }
380
                        }
381 11
                        $current_position_tm = ['x' => $x, 'y' => $y];
382 11
                        break;
383
384
                    // set super/subscripting text rise
385 8
                    case 'Ts':
386
                        break;
387
388
                    // set word spacing
389 8
                    case 'Tw':
390 2
                        break;
391
392
                    // set horizontal scaling
393 8
                    case 'Tz':
394
                        $text .= "\n";
395
                        break;
396
397
                    // move to start of next line
398 8
                    case 'T*':
399 3
                        $text .= "\n";
400 3
                        break;
401
402 7
                    case 'Da':
403
                        break;
404
405 7
                    case 'Do':
406 5
                        if (null !== $page) {
407 5
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
408 5
                            $id = trim(array_pop($args), '/ ');
409 5
                            $xobject = $page->getXObject($id);
410
411
                            // @todo $xobject could be a ElementXRef object, which would then throw an error
412 5
                            if (\is_object($xobject) && $xobject instanceof self && !\in_array($xobject->getUniqueId(), self::$recursionStack)) {
413
                                // Not a circular reference.
414 5
                                $text .= $xobject->getText($page);
415
                            }
416
                        }
417 5
                        break;
418
419 5
                    case 'rg':
420 5
                    case 'RG':
421 2
                        break;
422
423 5
                    case 're':
424
                        break;
425
426 5
                    case 'co':
427
                        break;
428
429 5
                    case 'cs':
430 1
                        break;
431
432 5
                    case 'gs':
433 4
                        break;
434
435 4
                    case 'en':
436
                        break;
437
438 4
                    case 'sc':
439 4
                    case 'SC':
440
                        break;
441
442 4
                    case 'g':
443 4
                    case 'G':
444 2
                        break;
445
446 3
                    case 'V':
447
                        break;
448
449 3
                    case 'vo':
450 3
                    case 'Vo':
451
                        break;
452
453
                    default:
454
                }
455
            }
456
        }
457
458 12
        array_pop(self::$recursionStack);
459
460 12
        return $text.' ';
461
    }
462
463
    /**
464
     * @param Page $page
465
     *
466
     * @return array
467
     *
468
     * @throws \Exception
469
     */
470 3
    public function getTextArray(Page $page = null)
471
    {
472 3
        $text = [];
473 3
        $sections = $this->getSectionsText($this->content);
474 3
        $current_font = new Font($this->document);
475
476 3
        foreach ($sections as $section) {
477 3
            $commands = $this->getCommandsText($section);
478
479 3
            foreach ($commands as $command) {
480 3
                switch ($command[self::OPERATOR]) {
481
                    // set character spacing
482 3
                    case 'Tc':
483 2
                        break;
484
485
                    // move text current point
486 3
                    case 'Td':
487 3
                        break;
488
489
                    // move text current point and set leading
490 3
                    case 'TD':
491
                        break;
492
493 3
                    case 'Tf':
494 3
                        if (null !== $page) {
495 3
                            list($id) = preg_split('/\s/s', $command[self::COMMAND]);
496 3
                            $id = trim($id, '/');
497 3
                            $current_font = $page->getFont($id);
498
                        }
499 3
                        break;
500
501 3
                    case "'":
502 3
                    case 'Tj':
503 3
                        $command[self::COMMAND] = [$command];
504
                        // no break
505 3
                    case 'TJ':
506 3
                        $sub_text = $current_font->decodeText($command[self::COMMAND]);
507 3
                        $text[] = $sub_text;
508 3
                        break;
509
510
                    // set leading
511 3
                    case 'TL':
512 2
                        break;
513
514 3
                    case 'Tm':
515 2
                        break;
516
517
                    // set super/subscripting text rise
518 3
                    case 'Ts':
519
                        break;
520
521
                    // set word spacing
522 3
                    case 'Tw':
523 1
                        break;
524
525
                    // set horizontal scaling
526 3
                    case 'Tz':
527
                        //$text .= "\n";
528
                        break;
529
530
                    // move to start of next line
531 3
                    case 'T*':
532
                        //$text .= "\n";
533 2
                        break;
534
535 3
                    case 'Da':
536
                        break;
537
538 3
                    case 'Do':
539
                        if (null !== $page) {
540
                            $args = preg_split('/\s/s', $command[self::COMMAND]);
541
                            $id = trim(array_pop($args), '/ ');
542
                            if ($xobject = $page->getXObject($id)) {
543
                                $text[] = $xobject->getText($page);
544
                            }
545
                        }
546
                        break;
547
548 3
                    case 'rg':
549 3
                    case 'RG':
550 2
                        break;
551
552 3
                    case 're':
553
                        break;
554
555 3
                    case 'co':
556
                        break;
557
558 3
                    case 'cs':
559
                        break;
560
561 3
                    case 'gs':
562
                        break;
563
564 3
                    case 'en':
565
                        break;
566
567 3
                    case 'sc':
568 3
                    case 'SC':
569
                        break;
570
571 3
                    case 'g':
572 3
                    case 'G':
573 2
                        break;
574
575 1
                    case 'V':
576
                        break;
577
578 1
                    case 'vo':
579 1
                    case 'Vo':
580
                        break;
581
582
                    default:
583
                }
584
            }
585
        }
586
587 3
        return $text;
588
    }
589
590
    /**
591
     * @param string $text_part
592
     * @param int    $offset
593
     *
594
     * @return array
595
     */
596 19
    public function getCommandsText($text_part, &$offset = 0)
597
    {
598 19
        $commands = $matches = [];
599
600 19
        while ($offset < \strlen($text_part)) {
601 19
            $offset += strspn($text_part, "\x00\x09\x0a\x0c\x0d\x20", $offset);
602 19
            $char = $text_part[$offset];
603
604 19
            $operator = '';
605 19
            $type = '';
606 19
            $command = false;
607
608 19
            switch ($char) {
609 19
                case '/':
610 19
                    $type = $char;
611 19
                    if (preg_match(
612 19
                        '/^\/([A-Z0-9\._,\+]+\s+[0-9.\-]+)\s+([A-Z]+)\s*/si',
613 19
                        substr($text_part, $offset),
614
                        $matches
615
                    )
616
                    ) {
617 19
                        $operator = $matches[2];
618 19
                        $command = $matches[1];
619 19
                        $offset += \strlen($matches[0]);
620 7
                    } elseif (preg_match(
621 7
                        '/^\/([A-Z0-9\._,\+]+)\s+([A-Z]+)\s*/si',
622 7
                        substr($text_part, $offset),
623
                        $matches
624
                    )
625
                    ) {
626 7
                        $operator = $matches[2];
627 7
                        $command = $matches[1];
628 7
                        $offset += \strlen($matches[0]);
629
                    }
630 19
                    break;
631
632 19
                case '[':
633 19
                case ']':
634
                    // array object
635 18
                    $type = $char;
636 18
                    if ('[' == $char) {
637 18
                        ++$offset;
638
                        // get elements
639 18
                        $command = $this->getCommandsText($text_part, $offset);
640
641 18
                        if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
642 18
                            $operator = trim($matches[0]);
643 18
                            $offset += \strlen($matches[0]);
644
                        }
645
                    } else {
646 18
                        ++$offset;
647 18
                        break;
648
                    }
649 18
                    break;
650
651 19
                case '<':
652 19
                case '>':
653
                    // array object
654 9
                    $type = $char;
655 9
                    ++$offset;
656 9
                    if ('<' == $char) {
657 9
                        $strpos = strpos($text_part, '>', $offset);
658 9
                        $command = substr($text_part, $offset, ($strpos - $offset));
659 9
                        $offset = $strpos + 1;
660
                    }
661
662 9
                    if (preg_match('/^\s*[A-Z]{1,2}\s*/si', substr($text_part, $offset), $matches)) {
663 7
                        $operator = trim($matches[0]);
664 7
                        $offset += \strlen($matches[0]);
665
                    }
666 9
                    break;
667
668 19
                case '(':
669 19
                case ')':
670 14
                    ++$offset;
671 14
                    $type = $char;
672 14
                    $strpos = $offset;
673 14
                    if ('(' == $char) {
674 14
                        $open_bracket = 1;
675 14
                        while ($open_bracket > 0) {
676 14
                            if (!isset($text_part[$strpos])) {
677
                                break;
678
                            }
679 14
                            $ch = $text_part[$strpos];
680 14
                            switch ($ch) {
681 14
                                case '\\':
682
                                 // REVERSE SOLIDUS (5Ch) (Backslash)
683
                                    // skip next character
684 11
                                    ++$strpos;
685 11
                                    break;
686
687 14
                                case '(':
688
                                 // LEFT PARENHESIS (28h)
689
                                    ++$open_bracket;
690
                                    break;
691
692 14
                                case ')':
693
                                 // RIGHT PARENTHESIS (29h)
694 14
                                    --$open_bracket;
695 14
                                    break;
696
                            }
697 14
                            ++$strpos;
698
                        }
699 14
                        $command = substr($text_part, $offset, ($strpos - $offset - 1));
700 14
                        $offset = $strpos;
701
702 14
                        if (preg_match('/^\s*([A-Z\']{1,2})\s*/si', substr($text_part, $offset), $matches)) {
703 12
                            $operator = $matches[1];
704 12
                            $offset += \strlen($matches[0]);
705
                        }
706
                    }
707 14
                    break;
708
709
                default:
710 19
                    if ('ET' == substr($text_part, $offset, 2)) {
711 1
                        break;
712 19
                    } elseif (preg_match(
713 19
                        '/^\s*(?P<data>([0-9\.\-]+\s*?)+)\s+(?P<id>[A-Z]{1,3})\s*/si',
714 19
                        substr($text_part, $offset),
715
                        $matches
716
                    )
717
                    ) {
718 19
                        $operator = trim($matches['id']);
719 19
                        $command = trim($matches['data']);
720 19
                        $offset += \strlen($matches[0]);
721 17
                    } elseif (preg_match('/^\s*([0-9\.\-]+\s*?)+\s*/si', substr($text_part, $offset), $matches)) {
722 17
                        $type = 'n';
723 17
                        $command = trim($matches[0]);
724 17
                        $offset += \strlen($matches[0]);
725 11
                    } elseif (preg_match('/^\s*([A-Z\*]+)\s*/si', substr($text_part, $offset), $matches)) {
726 11
                        $type = '';
727 11
                        $operator = $matches[1];
728 11
                        $command = '';
729 11
                        $offset += \strlen($matches[0]);
730
                    }
731
            }
732
733 19
            if (false !== $command) {
734 19
                $commands[] = [
735 19
                    self::TYPE => $type,
736 19
                    self::OPERATOR => $operator,
737 19
                    self::COMMAND => $command,
738
                ];
739
            } else {
740 18
                break;
741
            }
742
        }
743
744 19
        return $commands;
745
    }
746
747
    /**
748
     * @param string $content
749
     *
750
     * @return PDFObject
751
     */
752 28
    public static function factory(
753
        Document $document,
754
        Header $header,
755
        $content,
756
        Config $config = null
757
    ) {
758 28
        switch ($header->get('Type')->getContent()) {
759 28
            case 'XObject':
760 6
                switch ($header->get('Subtype')->getContent()) {
761 6
                    case 'Image':
762 4
                        return new Image($document, $header, $content, $config);
763
764 4
                    case 'Form':
765 4
                        return new Form($document, $header, $content, $config);
766
                }
767
768
                return new self($document, $header, $content, $config);
769
770 28
            case 'Pages':
771 27
                return new Pages($document, $header, $content, $config);
772
773 28
            case 'Page':
774 27
                return new Page($document, $header, $content, $config);
775
776 28
            case 'Encoding':
777 6
                return new Encoding($document, $header, $content, $config);
778
779 28
            case 'Font':
780 27
                $subtype = $header->get('Subtype')->getContent();
781 27
                $classname = '\Smalot\PdfParser\Font\Font'.$subtype;
782
783 27
                if (class_exists($classname)) {
784 27
                    return new $classname($document, $header, $content, $config);
785
                }
786
787
                return new Font($document, $header, $content, $config);
788
789
            default:
790 28
                return new self($document, $header, $content, $config);
791
        }
792
    }
793
794
    /**
795
     * Returns unique id identifying the object.
796
     *
797
     * @return string
798
     */
799 12
    protected function getUniqueId()
800
    {
801 12
        return spl_object_hash($this);
802
    }
803
}
804