Passed
Push — fix/replace-pr-403-getFontSpac... ( 933b8a...86462a )
by Konrad
02:41
created

Page   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 765
Duplicated Lines 0 %

Test Coverage

Coverage 75.08%

Importance

Changes 13
Bugs 2 Features 2
Metric Value
eloc 314
c 13
b 2
f 2
dl 0
loc 765
ccs 235
cts 313
cp 0.7508
rs 2
wmc 115

11 Methods

Rating   Name   Duplication   Size   Complexity  
B getFonts() 0 37 9
B getText() 0 40 10
B getTextArray() 0 42 10
A getXObject() 0 9 2
B extractRawData() 0 33 8
B getXObjects() 0 31 7
A getFont() 0 22 4
C getDataCommands() 0 130 15
C extractDecodedRawData() 0 63 17
C getDataTm() 0 178 15
D getTextXY() 0 51 18

How to fix   Complexity   

Complex Class

Complex classes like Page often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Page, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\Element\ElementArray;
34
use Smalot\PdfParser\Element\ElementMissing;
35
use Smalot\PdfParser\Element\ElementNull;
36
use Smalot\PdfParser\Element\ElementXRef;
37
38
class Page extends PDFObject
39
{
40
    /**
41
     * @var Font[]
42
     */
43
    protected $fonts = null;
44
45
    /**
46
     * @var PDFObject[]
47
     */
48
    protected $xobjects = null;
49
50
    /**
51
     * @var array
52
     */
53
    protected $dataTm = null;
54
55
    /**
56
     * @return Font[]
57
     */
58 20
    public function getFonts()
59
    {
60 20
        if (null !== $this->fonts) {
61 18
            return $this->fonts;
62
        }
63
64 20
        $resources = $this->get('Resources');
65
66 20
        if (method_exists($resources, 'has') && $resources->has('Font')) {
67 19
            if ($resources->get('Font') instanceof ElementMissing) {
0 ignored issues
show
Bug introduced by
The method get() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

67
            if ($resources->/** @scrutinizer ignore-call */ get('Font') instanceof ElementMissing) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
68 1
                return [];
69
            }
70
71 18
            if ($resources->get('Font') instanceof Header) {
72 13
                $fonts = $resources->get('Font')->getElements();
73
            } else {
74 7
                $fonts = $resources->get('Font')->getHeader()->getElements();
75
            }
76
77 18
            $table = [];
78
79 18
            foreach ($fonts as $id => $font) {
80 18
                if ($font instanceof Font) {
81 18
                    $table[$id] = $font;
82
83
                    // Store too on cleaned id value (only numeric)
84 18
                    $id = preg_replace('/[^0-9\.\-_]/', '', $id);
85 18
                    if ('' != $id) {
86 18
                        $table[$id] = $font;
87
                    }
88
                }
89
            }
90
91 18
            return $this->fonts = $table;
92
        }
93
94 4
        return [];
95
    }
96
97
    /**
98
     * @param string $id
99
     *
100
     * @return Font|null
101
     */
102 18
    public function getFont($id)
103
    {
104 18
        $fonts = $this->getFonts();
105
106 18
        if (isset($fonts[$id])) {
107 17
            return $fonts[$id];
108
        }
109
110
        // According to the PDF specs (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 238)
111
        // "The font resource name presented to the Tf operator is arbitrary, as are the names for all kinds of resources"
112
        // Instead, we search for the unfiltered name first and then do this cleaning as a fallback, so all tests still pass.
113
114 3
        if (isset($fonts[$id])) {
115
            return $fonts[$id];
116
        } else {
117 3
            $id = preg_replace('/[^0-9\.\-_]/', '', $id);
118 3
            if (isset($fonts[$id])) {
119 1
                return $fonts[$id];
120
            }
121
        }
122
123 2
        return null;
124
    }
125
126
    /**
127
     * Support for XObject
128
     *
129
     * @return PDFObject[]
130
     */
131 5
    public function getXObjects()
132
    {
133 5
        if (null !== $this->xobjects) {
134 4
            return $this->xobjects;
135
        }
136
137 5
        $resources = $this->get('Resources');
138
139 5
        if (method_exists($resources, 'has') && $resources->has('XObject')) {
140 5
            if ($resources->get('XObject') instanceof Header) {
141 5
                $xobjects = $resources->get('XObject')->getElements();
142
            } else {
143
                $xobjects = $resources->get('XObject')->getHeader()->getElements();
144
            }
145
146 5
            $table = [];
147
148 5
            foreach ($xobjects as $id => $xobject) {
149 5
                $table[$id] = $xobject;
150
151
                // Store too on cleaned id value (only numeric)
152 5
                $id = preg_replace('/[^0-9\.\-_]/', '', $id);
153 5
                if ('' != $id) {
154 5
                    $table[$id] = $xobject;
155
                }
156
            }
157
158 5
            return $this->xobjects = $table;
159
        }
160
161
        return [];
162
    }
163
164
    /**
165
     * @param string $id
166
     *
167
     * @return PDFObject|null
168
     */
169 5
    public function getXObject($id)
170
    {
171 5
        $xobjects = $this->getXObjects();
172
173 5
        if (isset($xobjects[$id])) {
174 5
            return $xobjects[$id];
175
        }
176
177
        return null;
178
        /*$id = preg_replace('/[^0-9\.\-_]/', '', $id);
179
180
        if (isset($xobjects[$id])) {
181
            return $xobjects[$id];
182
        } else {
183
            return null;
184
        }*/
185
    }
186
187
    /**
188
     * @param Page $page
189
     *
190
     * @return string
191
     */
192 12
    public function getText(self $page = null)
193
    {
194 12
        if ($contents = $this->get('Contents')) {
195 12
            if ($contents instanceof ElementMissing) {
196
                return '';
197 12
            } elseif ($contents instanceof ElementNull) {
198
                return '';
199 12
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
200 9
                $elements = $contents->getHeader()->getElements();
201
202 9
                if (is_numeric(key($elements))) {
203
                    $new_content = '';
204
205
                    foreach ($elements as $element) {
206
                        if ($element instanceof ElementXRef) {
207
                            $new_content .= $element->getObject()->getContent();
208
                        } else {
209
                            $new_content .= $element->getContent();
210
                        }
211
                    }
212
213
                    $header = new Header([], $this->document);
214 9
                    $contents = new PDFObject($this->document, $header, $new_content);
215
                }
216 4
            } elseif ($contents instanceof ElementArray) {
217
                // Create a virtual global content.
218 4
                $new_content = '';
219
220 4
                foreach ($contents->getContent() as $content) {
221 4
                    $new_content .= $content->getContent()."\n";
222
                }
223
224 4
                $header = new Header([], $this->document);
225 4
                $contents = new PDFObject($this->document, $header, $new_content);
226
            }
227
228 12
            return $contents->getText($this);
0 ignored issues
show
Bug introduced by
The method getText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

228
            return $contents->/** @scrutinizer ignore-call */ getText($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
229
        }
230
231
        return '';
232
    }
233
234
    /**
235
     * @param Page $page
236
     *
237
     * @return array
238
     */
239 3
    public function getTextArray(self $page = null)
240
    {
241 3
        if ($contents = $this->get('Contents')) {
242 3
            if ($contents instanceof ElementMissing) {
243
                return [];
244 3
            } elseif ($contents instanceof ElementNull) {
245
                return [];
246 3
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
247 3
                $elements = $contents->getHeader()->getElements();
248
249 3
                if (is_numeric(key($elements))) {
250
                    $new_content = '';
251
252
                    /** @var PDFObject $element */
253
                    foreach ($elements as $element) {
254
                        if ($element instanceof ElementXRef) {
255
                            $new_content .= $element->getObject()->getContent();
256
                        } else {
257
                            $new_content .= $element->getContent();
258
                        }
259
                    }
260
261
                    $header = new Header([], $this->document);
262 3
                    $contents = new PDFObject($this->document, $header, $new_content);
263
                }
264
            } elseif ($contents instanceof ElementArray) {
265
                // Create a virtual global content.
266
                $new_content = '';
267
268
                /** @var PDFObject $content */
269
                foreach ($contents->getContent() as $content) {
270
                    $new_content .= $content->getContent()."\n";
271
                }
272
273
                $header = new Header([], $this->document);
274
                $contents = new PDFObject($this->document, $header, $new_content);
275
            }
276
277 3
            return $contents->getTextArray($this);
0 ignored issues
show
Bug introduced by
The method getTextArray() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

277
            return $contents->/** @scrutinizer ignore-call */ getTextArray($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
278
        }
279
280
        return [];
281
    }
282
283
    /**
284
     * Gets all the text data with its internal representation of the page.
285
     *
286
     * @return array An array with the data and the internal representation
287
     */
288 6
    public function extractRawData()
289
    {
290
        /*
291
         * Now you can get the complete content of the object with the text on it
292
         */
293 6
        $extractedData = [];
294 6
        $content = $this->get('Contents');
295 6
        $values = $content->getContent();
296 6
        if (isset($values) && \is_array($values)) {
297
            $text = '';
298
            foreach ($values as $section) {
299
                $text .= $section->getContent();
300
            }
301
            $sectionsText = $this->getSectionsText($text);
302
            foreach ($sectionsText as $sectionText) {
303
                $commandsText = $this->getCommandsText($sectionText);
304
                foreach ($commandsText as $command) {
305
                    $extractedData[] = $command;
306
                }
307
            }
308
        } else {
309 6
            $sectionsText = $content->getSectionsText($content->getContent());
0 ignored issues
show
Bug introduced by
The method getSectionsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

309
            /** @scrutinizer ignore-call */ 
310
            $sectionsText = $content->getSectionsText($content->getContent());

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
310 6
            foreach ($sectionsText as $sectionText) {
311 6
                $extractedData[] = ['t' => '', 'o' => 'BT', 'c' => ''];
312
313 6
                $commandsText = $content->getCommandsText($sectionText);
0 ignored issues
show
Bug introduced by
The method getCommandsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

313
                /** @scrutinizer ignore-call */ 
314
                $commandsText = $content->getCommandsText($sectionText);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
314 6
                foreach ($commandsText as $command) {
315 6
                    $extractedData[] = $command;
316
                }
317
            }
318
        }
319
320 6
        return $extractedData;
321
    }
322
323
    /**
324
     * Gets all the decoded text data with it internal representation from a page.
325
     *
326
     * @param array $extractedRawData the extracted data return by extractRawData or
327
     *                                null if extractRawData should be called
328
     *
329
     * @return array An array with the data and the internal representation
330
     */
331 5
    public function extractDecodedRawData($extractedRawData = null)
332
    {
333 5
        if (!isset($extractedRawData) || !$extractedRawData) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $extractedRawData of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
334 5
            $extractedRawData = $this->extractRawData();
335
        }
336 5
        $currentFont = null;
337 5
        $clippedFont = null;
338 5
        foreach ($extractedRawData as &$command) {
339 5
            if ('Tj' == $command['o'] || 'TJ' == $command['o']) {
340 5
                $data = $command['c'];
341 5
                if (!\is_array($data)) {
342 5
                    $tmpText = '';
343 5
                    if (isset($currentFont)) {
344 5
                        $tmpText = $currentFont->decodeOctal($data);
345
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
346
                    }
347 5
                    $tmpText = str_replace(
348 5
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
349 5
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
350
                            $tmpText
351
                    );
352 5
                    $tmpText = utf8_encode($tmpText);
353 5
                    if (isset($currentFont)) {
354 5
                        $tmpText = $currentFont->decodeContent($tmpText);
355
                    }
356 5
                    $command['c'] = $tmpText;
357 5
                    continue;
358
                }
359 5
                $numText = \count($data);
360 5
                for ($i = 0; $i < $numText; ++$i) {
361 5
                    if (0 != ($i % 2)) {
362 5
                        continue;
363
                    }
364 5
                    $tmpText = $data[$i]['c'];
365 5
                    $decodedText = '';
366 5
                    if (isset($currentFont)) {
367 5
                        $decodedText = $currentFont->decodeOctal($tmpText);
368
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
369
                    }
370 5
                    $decodedText = str_replace(
371 5
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
372 5
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
373
                            $decodedText
374
                    );
375 5
                    $decodedText = utf8_encode($decodedText);
376 5
                    if (isset($currentFont)) {
377 5
                        $decodedText = $currentFont->decodeContent($decodedText);
378
                    }
379 5
                    $command['c'][$i]['c'] = $decodedText;
380 5
                    continue;
381
                }
382 5
            } elseif ('Tf' == $command['o'] || 'TF' == $command['o']) {
383 5
                $fontId = explode(' ', $command['c'])[0];
384 5
                $currentFont = $this->getFont($fontId);
385 5
                continue;
386 5
            } elseif ('Q' == $command['o']) {
387
                $currentFont = $clippedFont;
388 5
            } elseif ('q' == $command['o']) {
389
                $clippedFont = $currentFont;
390
            }
391
        }
392
393 5
        return $extractedRawData;
394
    }
395
396
    /**
397
     * Gets just the Text commands that are involved in text positions and
398
     * Text Matrix (Tm)
399
     *
400
     * It extract just the PDF commands that are involved with text positions, and
401
     * the Text Matrix (Tm). These are: BT, ET, TL, Td, TD, Tm, T*, Tj, ', ", and TJ
402
     *
403
     * @param array $extractedDecodedRawData The data extracted by extractDecodeRawData.
404
     *                                       If it is null, the method extractDecodeRawData is called.
405
     *
406
     * @return array An array with the text command of the page
407
     */
408 4
    public function getDataCommands($extractedDecodedRawData = null)
409
    {
410 4
        if (!isset($extractedDecodedRawData) || !$extractedDecodedRawData) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $extractedDecodedRawData of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
411 4
            $extractedDecodedRawData = $this->extractDecodedRawData();
412
        }
413 4
        $extractedData = [];
414 4
        foreach ($extractedDecodedRawData as $command) {
415 4
            switch ($command['o']) {
416
                /*
417
                 * BT
418
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
419
                 */
420 4
                case 'BT':
421 4
                    $extractedData[] = $command;
422 4
                    break;
423
424
                /*
425
                 * ET
426
                 * End a text object, discarding the text matrix
427
                 */
428 4
                case 'ET':
429
                    $extractedData[] = $command;
430
                    break;
431
432
                /*
433
                 * leading TL
434
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
435
                 * Initial value: 0
436
                 */
437 4
                case 'TL':
438 3
                    $extractedData[] = $command;
439 3
                    break;
440
441
                /*
442
                 * tx ty Td
443
                 * Move to the start of the next line, offset form the start of the
444
                 * current line by tx, ty.
445
                 */
446 4
                case 'Td':
447 4
                    $extractedData[] = $command;
448 4
                    break;
449
450
                /*
451
                 * tx ty TD
452
                 * Move to the start of the next line, offset form the start of the
453
                 * current line by tx, ty. As a side effect, this operator set the leading
454
                 * parameter in the text state. This operator has the same effect as the
455
                 * code:
456
                 * -ty TL
457
                 * tx ty Td
458
                 */
459 4
                case 'TD':
460
                    $extractedData[] = $command;
461
                    break;
462
463
                /*
464
                 * a b c d e f Tm
465
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
466
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
467
                 * [1 0 0 1 0 0]
468
                 */
469 4
                case 'Tm':
470 3
                    $extractedData[] = $command;
471 3
                    break;
472
473
                /*
474
                 * T*
475
                 * Move to the start of the next line. This operator has the same effect
476
                 * as the code:
477
                 * 0 Tl Td
478
                 * Where Tl is the current leading parameter in the text state.
479
                 */
480 4
                case 'T*':
481 3
                    $extractedData[] = $command;
482 3
                    break;
483
484
                /*
485
                 * string Tj
486
                 * Show a Text String
487
                 */
488 4
                case 'Tj':
489 4
                    $extractedData[] = $command;
490 4
                    break;
491
492
                /*
493
                 * string '
494
                 * Move to the next line and show a text string. This operator has the
495
                 * same effect as the code:
496
                 * T*
497
                 * string Tj
498
                 */
499 4
                case "'":
500
                    $extractedData[] = $command;
501
                    break;
502
503
                /*
504
                 * aw ac string "
505
                 * Move to the next lkine and show a text string, using aw as the word
506
                 * spacing and ac as the character spacing. This operator has the same
507
                 * effect as the code:
508
                 * aw Tw
509
                 * ac Tc
510
                 * string '
511
                 * Tw set the word spacing, Tw, to wordSpace.
512
                 * Tc Set the character spacing, Tc, to charsSpace.
513
                 */
514 4
                case '"':
515
                    $extractedData[] = $command;
516
                    break;
517
518
                /*
519
                 * array TJ
520
                 * Show one or more text strings allow individual glyph positioning.
521
                 * Each lement of array con be a string or a number. If the element is
522
                 * a string, this operator shows the string. If it is a number, the
523
                 * operator adjust the text position by that amount; that is, it translates
524
                 * the text matrix, Tm. This amount is substracted form the current
525
                 * horizontal or vertical coordinate, depending on the writing mode.
526
                 * in the default coordinate system, a positive adjustment has the effect
527
                 * of moving the next glyph painted either to the left or down by the given
528
                 * amount.
529
                 */
530 4
                case 'TJ':
531 4
                    $extractedData[] = $command;
532 4
                    break;
533
                default:
534
            }
535
        }
536
537 4
        return $extractedData;
538
    }
539
540
    /**
541
     * Gets the Text Matrix of the text in the page
542
     *
543
     * Return an array where every item is an array where the first item is the
544
     * Text Matrix (Tm) and the second is a string with the text data.  The Text matrix
545
     * is an array of 6 numbers. The last 2 numbers are the coordinates X and Y of the
546
     * text. The first 4 numbers has to be with Scalation, Rotation and Skew of the text.
547
     *
548
     * @param array $dataCommands the data extracted by getDataCommands
549
     *                            if null getDataCommands is called
550
     *
551
     * @return array an array with the data of the page including the Tm information
552
     *               of any text in the page
553
     */
554 3
    public function getDataTm($dataCommands = null)
555
    {
556 3
        if (!isset($dataCommands) || !$dataCommands) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $dataCommands of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
557 3
            $dataCommands = $this->getDataCommands();
558
        }
559
560
        /*
561
         * At the beginning of a text object Tm is the identity matrix
562
         */
563 3
        $defaultTm = ['1', '0', '0', '1', '0', '0'];
564
565
        /*
566
         *  Set the text leading used by T*, ' and " operators
567
         */
568 3
        $defaultTl = 0;
569
570
        /*
571
         * Setting where are the X and Y coordinates in the matrix (Tm)
572
         */
573 3
        $x = 4;
574 3
        $y = 5;
575 3
        $Tx = 0;
576 3
        $Ty = 0;
577
578 3
        $Tm = $defaultTm;
579 3
        $Tl = $defaultTl;
580
581 3
        $extractedTexts = $this->getTextArray();
582 3
        $extractedData = [];
583 3
        foreach ($dataCommands as $command) {
584 3
            $currentText = $extractedTexts[\count($extractedData)];
585 3
            switch ($command['o']) {
586
                /*
587
                 * BT
588
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
589
                 */
590 3
                case 'BT':
591 3
                    $Tm = $defaultTm;
592 3
                    $Tl = $defaultTl; //review this.
593 3
                    $Tx = 0;
594 3
                    $Ty = 0;
595 3
                    break;
596
597
                /*
598
                 * ET
599
                 * End a text object, discarding the text matrix
600
                 */
601 3
                case 'ET':
602
                    $Tm = $defaultTm;
603
                    $Tl = $defaultTl;  //review this
604
                    $Tx = 0;
605
                    $Ty = 0;
606
                    break;
607
608
                /*
609
                 * leading TL
610
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
611
                 * Initial value: 0
612
                 */
613 3
                case 'TL':
614 2
                    $Tl = (float) $command['c'];
615 2
                    break;
616
617
                /*
618
                 * tx ty Td
619
                 * Move to the start of the next line, offset form the start of the
620
                 * current line by tx, ty.
621
                 */
622 3
                case 'Td':
623 3
                    $coord = explode(' ', $command['c']);
624 3
                    $Tx += (float) $coord[0];
625 3
                    $Ty += (float) $coord[1];
626 3
                    $Tm[$x] = (string) $Tx;
627 3
                    $Tm[$y] = (string) $Ty;
628 3
                    break;
629
630
                /*
631
                 * tx ty TD
632
                 * Move to the start of the next line, offset form the start of the
633
                 * current line by tx, ty. As a side effect, this operator set the leading
634
                 * parameter in the text state. This operator has the same effect as the
635
                 * code:
636
                 * -ty TL
637
                 * tx ty Td
638
                 */
639 3
                case 'TD':
640
                    $coord = explode(' ', $command['c']);
641
                    $Tl = (float) $coord[1];
642
                    $Tx += (float) $coord[0];
643
                    $Ty -= (float) $coord[1];
644
                    $Tm[$x] = (string) $Tx;
645
                    $Tm[$y] = (string) $Ty;
646
                    break;
647
648
                /*
649
                 * a b c d e f Tm
650
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
651
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
652
                 * [1 0 0 1 0 0]
653
                 */
654 3
                case 'Tm':
655 2
                    $Tm = explode(' ', $command['c']);
656 2
                    $Tx = (float) $Tm[$x];
657 2
                    $Ty = (float) $Tm[$y];
658 2
                    break;
659
660
                /*
661
                 * T*
662
                 * Move to the start of the next line. This operator has the same effect
663
                 * as the code:
664
                 * 0 Tl Td
665
                 * Where Tl is the current leading parameter in the text state.
666
                 */
667 3
                case 'T*':
668 2
                    $Ty -= $Tl;
669 2
                    $Tm[$y] = (string) $Ty;
670 2
                    break;
671
672
                /*
673
                 * string Tj
674
                 * Show a Text String
675
                 */
676 3
                case 'Tj':
677 3
                    $extractedData[] = [$Tm, $currentText];
678 3
                    break;
679
680
                /*
681
                 * string '
682
                 * Move to the next line and show a text string. This operator has the
683
                 * same effect as the code:
684
                 * T*
685
                 * string Tj
686
                 */
687 3
                case "'":
688
                    $Ty -= $Tl;
689
                    $Tm[$y] = (string) $Ty;
690
                    $extractedData[] = [$Tm, $currentText];
691
                    break;
692
693
                /*
694
                 * aw ac string "
695
                 * Move to the next line and show a text string, using aw as the word
696
                 * spacing and ac as the character spacing. This operator has the same
697
                 * effect as the code:
698
                 * aw Tw
699
                 * ac Tc
700
                 * string '
701
                 * Tw set the word spacing, Tw, to wordSpace.
702
                 * Tc Set the character spacing, Tc, to charsSpace.
703
                 */
704 3
                case '"':
705
                    $data = explode(' ', $currentText);
706
                    $Ty -= $Tl;
707
                    $Tm[$y] = (string) $Ty;
708
                    $extractedData[] = [$Tm, $data[2]]; //Verify
709
                    break;
710
711
                /*
712
                 * array TJ
713
                 * Show one or more text strings allow individual glyph positioning.
714
                 * Each lement of array con be a string or a number. If the element is
715
                 * a string, this operator shows the string. If it is a number, the
716
                 * operator adjust the text position by that amount; that is, it translates
717
                 * the text matrix, Tm. This amount is substracted form the current
718
                 * horizontal or vertical coordinate, depending on the writing mode.
719
                 * in the default coordinate system, a positive adjustment has the effect
720
                 * of moving the next glyph painted either to the left or down by the given
721
                 * amount.
722
                 */
723 3
                case 'TJ':
724 3
                    $extractedData[] = [$Tm, $currentText];
725 3
                    break;
726
                default:
727
            }
728
        }
729 3
        $this->dataTm = $extractedData;
730
731 3
        return $extractedData;
732
    }
733
734
    /**
735
     * Gets text data that are around the given coordinates (X,Y)
736
     *
737
     * If the text is in near the given coordinates (X,Y) (or the TM info),
738
     * the text is returned.  The extractedData return by getDataTm, could be use to see
739
     * where is the coordinates of a given text, using the TM info for it.
740
     *
741
     * @param float $x      The X value of the coordinate to search for. if null
742
     *                      just the Y value is considered (same Row)
743
     * @param float $y      The Y value of the coordinate to search for
744
     *                      just the X value is considered (same column)
745
     * @param float $xError The value less or more to consider an X to be "near"
746
     * @param float $yError The value less or more to consider an Y to be "near"
747
     *
748
     * @return array An array of text that are near the given coordinates. If no text
749
     *               "near" the x,y coordinate, an empty array is returned. If Both, x
750
     *               and y coordinates are null, null is returned.
751
     */
752 1
    public function getTextXY($x = null, $y = null, $xError = 0, $yError = 0)
753
    {
754 1
        if (!isset($this->dataTm) || !$this->dataTm) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->dataTm of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
755 1
            $this->getDataTm();
756
        }
757
758 1
        if (null !== $x) {
759 1
            $x = (float) $x;
760
        }
761
762 1
        if (null !== $y) {
763 1
            $y = (float) $y;
764
        }
765
766 1
        if (null === $x && null === $y) {
767
            return [];
768
        }
769
770 1
        $xError = (float) $xError;
771 1
        $yError = (float) $yError;
772
773 1
        $extractedData = [];
774 1
        foreach ($this->dataTm as $item) {
775 1
            $tm = $item[0];
776 1
            $xTm = (float) $tm[4];
777 1
            $yTm = (float) $tm[5];
778 1
            $text = $item[1];
779 1
            if (null === $y) {
780
                if (($xTm >= ($x - $xError)) &&
781
                    ($xTm <= ($x + $xError))) {
782
                    $extractedData[] = [$tm, $text];
783
                    continue;
784
                }
785
            }
786 1
            if (null === $x) {
787
                if (($yTm >= ($y - $yError)) &&
788
                    ($yTm <= ($y + $yError))) {
789
                    $extractedData[] = [$tm, $text];
790
                    continue;
791
                }
792
            }
793 1
            if (($xTm >= ($x - $xError)) &&
794 1
                ($xTm <= ($x + $xError)) &&
795 1
                ($yTm >= ($y - $yError)) &&
796 1
                ($yTm <= ($y + $yError))) {
797 1
                $extractedData[] = [$tm, $text];
798 1
                continue;
799
            }
800
        }
801
802 1
        return $extractedData;
803
    }
804
}
805