Passed
Push — fix/replace-pr-403-getFontSpac... ( 933b8a...86462a )
by Konrad
02:41
created

Page::getTextXY()   D

Complexity

Conditions 18
Paths 104

Size

Total Lines 51
Code Lines 33

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 25
CRAP Score 24.009

Importance

Changes 2
Bugs 0 Features 0
Metric Value
cc 18
eloc 33
c 2
b 0
f 0
nc 104
nop 4
dl 0
loc 51
ccs 25
cts 34
cp 0.7352
crap 24.009
rs 4.8333

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\Element\ElementArray;
34
use Smalot\PdfParser\Element\ElementMissing;
35
use Smalot\PdfParser\Element\ElementNull;
36
use Smalot\PdfParser\Element\ElementXRef;
37
38
class Page extends PDFObject
39
{
40
    /**
41
     * @var Font[]
42
     */
43
    protected $fonts = null;
44
45
    /**
46
     * @var PDFObject[]
47
     */
48
    protected $xobjects = null;
49
50
    /**
51
     * @var array
52
     */
53
    protected $dataTm = null;
54
55
    /**
56
     * @return Font[]
57
     */
58 20
    public function getFonts()
59
    {
60 20
        if (null !== $this->fonts) {
61 18
            return $this->fonts;
62
        }
63
64 20
        $resources = $this->get('Resources');
65
66 20
        if (method_exists($resources, 'has') && $resources->has('Font')) {
67 19
            if ($resources->get('Font') instanceof ElementMissing) {
0 ignored issues
show
Bug introduced by
The method get() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

67
            if ($resources->/** @scrutinizer ignore-call */ get('Font') instanceof ElementMissing) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
68 1
                return [];
69
            }
70
71 18
            if ($resources->get('Font') instanceof Header) {
72 13
                $fonts = $resources->get('Font')->getElements();
73
            } else {
74 7
                $fonts = $resources->get('Font')->getHeader()->getElements();
75
            }
76
77 18
            $table = [];
78
79 18
            foreach ($fonts as $id => $font) {
80 18
                if ($font instanceof Font) {
81 18
                    $table[$id] = $font;
82
83
                    // Store too on cleaned id value (only numeric)
84 18
                    $id = preg_replace('/[^0-9\.\-_]/', '', $id);
85 18
                    if ('' != $id) {
86 18
                        $table[$id] = $font;
87
                    }
88
                }
89
            }
90
91 18
            return $this->fonts = $table;
92
        }
93
94 4
        return [];
95
    }
96
97
    /**
98
     * @param string $id
99
     *
100
     * @return Font|null
101
     */
102 18
    public function getFont($id)
103
    {
104 18
        $fonts = $this->getFonts();
105
106 18
        if (isset($fonts[$id])) {
107 17
            return $fonts[$id];
108
        }
109
110
        // According to the PDF specs (https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, page 238)
111
        // "The font resource name presented to the Tf operator is arbitrary, as are the names for all kinds of resources"
112
        // Instead, we search for the unfiltered name first and then do this cleaning as a fallback, so all tests still pass.
113
114 3
        if (isset($fonts[$id])) {
115
            return $fonts[$id];
116
        } else {
117 3
            $id = preg_replace('/[^0-9\.\-_]/', '', $id);
118 3
            if (isset($fonts[$id])) {
119 1
                return $fonts[$id];
120
            }
121
        }
122
123 2
        return null;
124
    }
125
126
    /**
127
     * Support for XObject
128
     *
129
     * @return PDFObject[]
130
     */
131 5
    public function getXObjects()
132
    {
133 5
        if (null !== $this->xobjects) {
134 4
            return $this->xobjects;
135
        }
136
137 5
        $resources = $this->get('Resources');
138
139 5
        if (method_exists($resources, 'has') && $resources->has('XObject')) {
140 5
            if ($resources->get('XObject') instanceof Header) {
141 5
                $xobjects = $resources->get('XObject')->getElements();
142
            } else {
143
                $xobjects = $resources->get('XObject')->getHeader()->getElements();
144
            }
145
146 5
            $table = [];
147
148 5
            foreach ($xobjects as $id => $xobject) {
149 5
                $table[$id] = $xobject;
150
151
                // Store too on cleaned id value (only numeric)
152 5
                $id = preg_replace('/[^0-9\.\-_]/', '', $id);
153 5
                if ('' != $id) {
154 5
                    $table[$id] = $xobject;
155
                }
156
            }
157
158 5
            return $this->xobjects = $table;
159
        }
160
161
        return [];
162
    }
163
164
    /**
165
     * @param string $id
166
     *
167
     * @return PDFObject|null
168
     */
169 5
    public function getXObject($id)
170
    {
171 5
        $xobjects = $this->getXObjects();
172
173 5
        if (isset($xobjects[$id])) {
174 5
            return $xobjects[$id];
175
        }
176
177
        return null;
178
        /*$id = preg_replace('/[^0-9\.\-_]/', '', $id);
179
180
        if (isset($xobjects[$id])) {
181
            return $xobjects[$id];
182
        } else {
183
            return null;
184
        }*/
185
    }
186
187
    /**
188
     * @param Page $page
189
     *
190
     * @return string
191
     */
192 12
    public function getText(self $page = null)
193
    {
194 12
        if ($contents = $this->get('Contents')) {
195 12
            if ($contents instanceof ElementMissing) {
196
                return '';
197 12
            } elseif ($contents instanceof ElementNull) {
198
                return '';
199 12
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
200 9
                $elements = $contents->getHeader()->getElements();
201
202 9
                if (is_numeric(key($elements))) {
203
                    $new_content = '';
204
205
                    foreach ($elements as $element) {
206
                        if ($element instanceof ElementXRef) {
207
                            $new_content .= $element->getObject()->getContent();
208
                        } else {
209
                            $new_content .= $element->getContent();
210
                        }
211
                    }
212
213
                    $header = new Header([], $this->document);
214 9
                    $contents = new PDFObject($this->document, $header, $new_content);
215
                }
216 4
            } elseif ($contents instanceof ElementArray) {
217
                // Create a virtual global content.
218 4
                $new_content = '';
219
220 4
                foreach ($contents->getContent() as $content) {
221 4
                    $new_content .= $content->getContent()."\n";
222
                }
223
224 4
                $header = new Header([], $this->document);
225 4
                $contents = new PDFObject($this->document, $header, $new_content);
226
            }
227
228 12
            return $contents->getText($this);
0 ignored issues
show
Bug introduced by
The method getText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

228
            return $contents->/** @scrutinizer ignore-call */ getText($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
229
        }
230
231
        return '';
232
    }
233
234
    /**
235
     * @param Page $page
236
     *
237
     * @return array
238
     */
239 3
    public function getTextArray(self $page = null)
240
    {
241 3
        if ($contents = $this->get('Contents')) {
242 3
            if ($contents instanceof ElementMissing) {
243
                return [];
244 3
            } elseif ($contents instanceof ElementNull) {
245
                return [];
246 3
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
247 3
                $elements = $contents->getHeader()->getElements();
248
249 3
                if (is_numeric(key($elements))) {
250
                    $new_content = '';
251
252
                    /** @var PDFObject $element */
253
                    foreach ($elements as $element) {
254
                        if ($element instanceof ElementXRef) {
255
                            $new_content .= $element->getObject()->getContent();
256
                        } else {
257
                            $new_content .= $element->getContent();
258
                        }
259
                    }
260
261
                    $header = new Header([], $this->document);
262 3
                    $contents = new PDFObject($this->document, $header, $new_content);
263
                }
264
            } elseif ($contents instanceof ElementArray) {
265
                // Create a virtual global content.
266
                $new_content = '';
267
268
                /** @var PDFObject $content */
269
                foreach ($contents->getContent() as $content) {
270
                    $new_content .= $content->getContent()."\n";
271
                }
272
273
                $header = new Header([], $this->document);
274
                $contents = new PDFObject($this->document, $header, $new_content);
275
            }
276
277 3
            return $contents->getTextArray($this);
0 ignored issues
show
Bug introduced by
The method getTextArray() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

277
            return $contents->/** @scrutinizer ignore-call */ getTextArray($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
278
        }
279
280
        return [];
281
    }
282
283
    /**
284
     * Gets all the text data with its internal representation of the page.
285
     *
286
     * @return array An array with the data and the internal representation
287
     */
288 6
    public function extractRawData()
289
    {
290
        /*
291
         * Now you can get the complete content of the object with the text on it
292
         */
293 6
        $extractedData = [];
294 6
        $content = $this->get('Contents');
295 6
        $values = $content->getContent();
296 6
        if (isset($values) && \is_array($values)) {
297
            $text = '';
298
            foreach ($values as $section) {
299
                $text .= $section->getContent();
300
            }
301
            $sectionsText = $this->getSectionsText($text);
302
            foreach ($sectionsText as $sectionText) {
303
                $commandsText = $this->getCommandsText($sectionText);
304
                foreach ($commandsText as $command) {
305
                    $extractedData[] = $command;
306
                }
307
            }
308
        } else {
309 6
            $sectionsText = $content->getSectionsText($content->getContent());
0 ignored issues
show
Bug introduced by
The method getSectionsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

309
            /** @scrutinizer ignore-call */ 
310
            $sectionsText = $content->getSectionsText($content->getContent());

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
310 6
            foreach ($sectionsText as $sectionText) {
311 6
                $extractedData[] = ['t' => '', 'o' => 'BT', 'c' => ''];
312
313 6
                $commandsText = $content->getCommandsText($sectionText);
0 ignored issues
show
Bug introduced by
The method getCommandsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

313
                /** @scrutinizer ignore-call */ 
314
                $commandsText = $content->getCommandsText($sectionText);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
314 6
                foreach ($commandsText as $command) {
315 6
                    $extractedData[] = $command;
316
                }
317
            }
318
        }
319
320 6
        return $extractedData;
321
    }
322
323
    /**
324
     * Gets all the decoded text data with it internal representation from a page.
325
     *
326
     * @param array $extractedRawData the extracted data return by extractRawData or
327
     *                                null if extractRawData should be called
328
     *
329
     * @return array An array with the data and the internal representation
330
     */
331 5
    public function extractDecodedRawData($extractedRawData = null)
332
    {
333 5
        if (!isset($extractedRawData) || !$extractedRawData) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $extractedRawData of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
334 5
            $extractedRawData = $this->extractRawData();
335
        }
336 5
        $currentFont = null;
337 5
        $clippedFont = null;
338 5
        foreach ($extractedRawData as &$command) {
339 5
            if ('Tj' == $command['o'] || 'TJ' == $command['o']) {
340 5
                $data = $command['c'];
341 5
                if (!\is_array($data)) {
342 5
                    $tmpText = '';
343 5
                    if (isset($currentFont)) {
344 5
                        $tmpText = $currentFont->decodeOctal($data);
345
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
346
                    }
347 5
                    $tmpText = str_replace(
348 5
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
349 5
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
350
                            $tmpText
351
                    );
352 5
                    $tmpText = utf8_encode($tmpText);
353 5
                    if (isset($currentFont)) {
354 5
                        $tmpText = $currentFont->decodeContent($tmpText);
355
                    }
356 5
                    $command['c'] = $tmpText;
357 5
                    continue;
358
                }
359 5
                $numText = \count($data);
360 5
                for ($i = 0; $i < $numText; ++$i) {
361 5
                    if (0 != ($i % 2)) {
362 5
                        continue;
363
                    }
364 5
                    $tmpText = $data[$i]['c'];
365 5
                    $decodedText = '';
366 5
                    if (isset($currentFont)) {
367 5
                        $decodedText = $currentFont->decodeOctal($tmpText);
368
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
369
                    }
370 5
                    $decodedText = str_replace(
371 5
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
372 5
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
373
                            $decodedText
374
                    );
375 5
                    $decodedText = utf8_encode($decodedText);
376 5
                    if (isset($currentFont)) {
377 5
                        $decodedText = $currentFont->decodeContent($decodedText);
378
                    }
379 5
                    $command['c'][$i]['c'] = $decodedText;
380 5
                    continue;
381
                }
382 5
            } elseif ('Tf' == $command['o'] || 'TF' == $command['o']) {
383 5
                $fontId = explode(' ', $command['c'])[0];
384 5
                $currentFont = $this->getFont($fontId);
385 5
                continue;
386 5
            } elseif ('Q' == $command['o']) {
387
                $currentFont = $clippedFont;
388 5
            } elseif ('q' == $command['o']) {
389
                $clippedFont = $currentFont;
390
            }
391
        }
392
393 5
        return $extractedRawData;
394
    }
395
396
    /**
397
     * Gets just the Text commands that are involved in text positions and
398
     * Text Matrix (Tm)
399
     *
400
     * It extract just the PDF commands that are involved with text positions, and
401
     * the Text Matrix (Tm). These are: BT, ET, TL, Td, TD, Tm, T*, Tj, ', ", and TJ
402
     *
403
     * @param array $extractedDecodedRawData The data extracted by extractDecodeRawData.
404
     *                                       If it is null, the method extractDecodeRawData is called.
405
     *
406
     * @return array An array with the text command of the page
407
     */
408 4
    public function getDataCommands($extractedDecodedRawData = null)
409
    {
410 4
        if (!isset($extractedDecodedRawData) || !$extractedDecodedRawData) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $extractedDecodedRawData of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
411 4
            $extractedDecodedRawData = $this->extractDecodedRawData();
412
        }
413 4
        $extractedData = [];
414 4
        foreach ($extractedDecodedRawData as $command) {
415 4
            switch ($command['o']) {
416
                /*
417
                 * BT
418
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
419
                 */
420 4
                case 'BT':
421 4
                    $extractedData[] = $command;
422 4
                    break;
423
424
                /*
425
                 * ET
426
                 * End a text object, discarding the text matrix
427
                 */
428 4
                case 'ET':
429
                    $extractedData[] = $command;
430
                    break;
431
432
                /*
433
                 * leading TL
434
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
435
                 * Initial value: 0
436
                 */
437 4
                case 'TL':
438 3
                    $extractedData[] = $command;
439 3
                    break;
440
441
                /*
442
                 * tx ty Td
443
                 * Move to the start of the next line, offset form the start of the
444
                 * current line by tx, ty.
445
                 */
446 4
                case 'Td':
447 4
                    $extractedData[] = $command;
448 4
                    break;
449
450
                /*
451
                 * tx ty TD
452
                 * Move to the start of the next line, offset form the start of the
453
                 * current line by tx, ty. As a side effect, this operator set the leading
454
                 * parameter in the text state. This operator has the same effect as the
455
                 * code:
456
                 * -ty TL
457
                 * tx ty Td
458
                 */
459 4
                case 'TD':
460
                    $extractedData[] = $command;
461
                    break;
462
463
                /*
464
                 * a b c d e f Tm
465
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
466
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
467
                 * [1 0 0 1 0 0]
468
                 */
469 4
                case 'Tm':
470 3
                    $extractedData[] = $command;
471 3
                    break;
472
473
                /*
474
                 * T*
475
                 * Move to the start of the next line. This operator has the same effect
476
                 * as the code:
477
                 * 0 Tl Td
478
                 * Where Tl is the current leading parameter in the text state.
479
                 */
480 4
                case 'T*':
481 3
                    $extractedData[] = $command;
482 3
                    break;
483
484
                /*
485
                 * string Tj
486
                 * Show a Text String
487
                 */
488 4
                case 'Tj':
489 4
                    $extractedData[] = $command;
490 4
                    break;
491
492
                /*
493
                 * string '
494
                 * Move to the next line and show a text string. This operator has the
495
                 * same effect as the code:
496
                 * T*
497
                 * string Tj
498
                 */
499 4
                case "'":
500
                    $extractedData[] = $command;
501
                    break;
502
503
                /*
504
                 * aw ac string "
505
                 * Move to the next lkine and show a text string, using aw as the word
506
                 * spacing and ac as the character spacing. This operator has the same
507
                 * effect as the code:
508
                 * aw Tw
509
                 * ac Tc
510
                 * string '
511
                 * Tw set the word spacing, Tw, to wordSpace.
512
                 * Tc Set the character spacing, Tc, to charsSpace.
513
                 */
514 4
                case '"':
515
                    $extractedData[] = $command;
516
                    break;
517
518
                /*
519
                 * array TJ
520
                 * Show one or more text strings allow individual glyph positioning.
521
                 * Each lement of array con be a string or a number. If the element is
522
                 * a string, this operator shows the string. If it is a number, the
523
                 * operator adjust the text position by that amount; that is, it translates
524
                 * the text matrix, Tm. This amount is substracted form the current
525
                 * horizontal or vertical coordinate, depending on the writing mode.
526
                 * in the default coordinate system, a positive adjustment has the effect
527
                 * of moving the next glyph painted either to the left or down by the given
528
                 * amount.
529
                 */
530 4
                case 'TJ':
531 4
                    $extractedData[] = $command;
532 4
                    break;
533
                default:
534
            }
535
        }
536
537 4
        return $extractedData;
538
    }
539
540
    /**
541
     * Gets the Text Matrix of the text in the page
542
     *
543
     * Return an array where every item is an array where the first item is the
544
     * Text Matrix (Tm) and the second is a string with the text data.  The Text matrix
545
     * is an array of 6 numbers. The last 2 numbers are the coordinates X and Y of the
546
     * text. The first 4 numbers has to be with Scalation, Rotation and Skew of the text.
547
     *
548
     * @param array $dataCommands the data extracted by getDataCommands
549
     *                            if null getDataCommands is called
550
     *
551
     * @return array an array with the data of the page including the Tm information
552
     *               of any text in the page
553
     */
554 3
    public function getDataTm($dataCommands = null)
555
    {
556 3
        if (!isset($dataCommands) || !$dataCommands) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $dataCommands of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
557 3
            $dataCommands = $this->getDataCommands();
558
        }
559
560
        /*
561
         * At the beginning of a text object Tm is the identity matrix
562
         */
563 3
        $defaultTm = ['1', '0', '0', '1', '0', '0'];
564
565
        /*
566
         *  Set the text leading used by T*, ' and " operators
567
         */
568 3
        $defaultTl = 0;
569
570
        /*
571
         * Setting where are the X and Y coordinates in the matrix (Tm)
572
         */
573 3
        $x = 4;
574 3
        $y = 5;
575 3
        $Tx = 0;
576 3
        $Ty = 0;
577
578 3
        $Tm = $defaultTm;
579 3
        $Tl = $defaultTl;
580
581 3
        $extractedTexts = $this->getTextArray();
582 3
        $extractedData = [];
583 3
        foreach ($dataCommands as $command) {
584 3
            $currentText = $extractedTexts[\count($extractedData)];
585 3
            switch ($command['o']) {
586
                /*
587
                 * BT
588
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
589
                 */
590 3
                case 'BT':
591 3
                    $Tm = $defaultTm;
592 3
                    $Tl = $defaultTl; //review this.
593 3
                    $Tx = 0;
594 3
                    $Ty = 0;
595 3
                    break;
596
597
                /*
598
                 * ET
599
                 * End a text object, discarding the text matrix
600
                 */
601 3
                case 'ET':
602
                    $Tm = $defaultTm;
603
                    $Tl = $defaultTl;  //review this
604
                    $Tx = 0;
605
                    $Ty = 0;
606
                    break;
607
608
                /*
609
                 * leading TL
610
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
611
                 * Initial value: 0
612
                 */
613 3
                case 'TL':
614 2
                    $Tl = (float) $command['c'];
615 2
                    break;
616
617
                /*
618
                 * tx ty Td
619
                 * Move to the start of the next line, offset form the start of the
620
                 * current line by tx, ty.
621
                 */
622 3
                case 'Td':
623 3
                    $coord = explode(' ', $command['c']);
624 3
                    $Tx += (float) $coord[0];
625 3
                    $Ty += (float) $coord[1];
626 3
                    $Tm[$x] = (string) $Tx;
627 3
                    $Tm[$y] = (string) $Ty;
628 3
                    break;
629
630
                /*
631
                 * tx ty TD
632
                 * Move to the start of the next line, offset form the start of the
633
                 * current line by tx, ty. As a side effect, this operator set the leading
634
                 * parameter in the text state. This operator has the same effect as the
635
                 * code:
636
                 * -ty TL
637
                 * tx ty Td
638
                 */
639 3
                case 'TD':
640
                    $coord = explode(' ', $command['c']);
641
                    $Tl = (float) $coord[1];
642
                    $Tx += (float) $coord[0];
643
                    $Ty -= (float) $coord[1];
644
                    $Tm[$x] = (string) $Tx;
645
                    $Tm[$y] = (string) $Ty;
646
                    break;
647
648
                /*
649
                 * a b c d e f Tm
650
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
651
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
652
                 * [1 0 0 1 0 0]
653
                 */
654 3
                case 'Tm':
655 2
                    $Tm = explode(' ', $command['c']);
656 2
                    $Tx = (float) $Tm[$x];
657 2
                    $Ty = (float) $Tm[$y];
658 2
                    break;
659
660
                /*
661
                 * T*
662
                 * Move to the start of the next line. This operator has the same effect
663
                 * as the code:
664
                 * 0 Tl Td
665
                 * Where Tl is the current leading parameter in the text state.
666
                 */
667 3
                case 'T*':
668 2
                    $Ty -= $Tl;
669 2
                    $Tm[$y] = (string) $Ty;
670 2
                    break;
671
672
                /*
673
                 * string Tj
674
                 * Show a Text String
675
                 */
676 3
                case 'Tj':
677 3
                    $extractedData[] = [$Tm, $currentText];
678 3
                    break;
679
680
                /*
681
                 * string '
682
                 * Move to the next line and show a text string. This operator has the
683
                 * same effect as the code:
684
                 * T*
685
                 * string Tj
686
                 */
687 3
                case "'":
688
                    $Ty -= $Tl;
689
                    $Tm[$y] = (string) $Ty;
690
                    $extractedData[] = [$Tm, $currentText];
691
                    break;
692
693
                /*
694
                 * aw ac string "
695
                 * Move to the next line and show a text string, using aw as the word
696
                 * spacing and ac as the character spacing. This operator has the same
697
                 * effect as the code:
698
                 * aw Tw
699
                 * ac Tc
700
                 * string '
701
                 * Tw set the word spacing, Tw, to wordSpace.
702
                 * Tc Set the character spacing, Tc, to charsSpace.
703
                 */
704 3
                case '"':
705
                    $data = explode(' ', $currentText);
706
                    $Ty -= $Tl;
707
                    $Tm[$y] = (string) $Ty;
708
                    $extractedData[] = [$Tm, $data[2]]; //Verify
709
                    break;
710
711
                /*
712
                 * array TJ
713
                 * Show one or more text strings allow individual glyph positioning.
714
                 * Each lement of array con be a string or a number. If the element is
715
                 * a string, this operator shows the string. If it is a number, the
716
                 * operator adjust the text position by that amount; that is, it translates
717
                 * the text matrix, Tm. This amount is substracted form the current
718
                 * horizontal or vertical coordinate, depending on the writing mode.
719
                 * in the default coordinate system, a positive adjustment has the effect
720
                 * of moving the next glyph painted either to the left or down by the given
721
                 * amount.
722
                 */
723 3
                case 'TJ':
724 3
                    $extractedData[] = [$Tm, $currentText];
725 3
                    break;
726
                default:
727
            }
728
        }
729 3
        $this->dataTm = $extractedData;
730
731 3
        return $extractedData;
732
    }
733
734
    /**
735
     * Gets text data that are around the given coordinates (X,Y)
736
     *
737
     * If the text is in near the given coordinates (X,Y) (or the TM info),
738
     * the text is returned.  The extractedData return by getDataTm, could be use to see
739
     * where is the coordinates of a given text, using the TM info for it.
740
     *
741
     * @param float $x      The X value of the coordinate to search for. if null
742
     *                      just the Y value is considered (same Row)
743
     * @param float $y      The Y value of the coordinate to search for
744
     *                      just the X value is considered (same column)
745
     * @param float $xError The value less or more to consider an X to be "near"
746
     * @param float $yError The value less or more to consider an Y to be "near"
747
     *
748
     * @return array An array of text that are near the given coordinates. If no text
749
     *               "near" the x,y coordinate, an empty array is returned. If Both, x
750
     *               and y coordinates are null, null is returned.
751
     */
752 1
    public function getTextXY($x = null, $y = null, $xError = 0, $yError = 0)
753
    {
754 1
        if (!isset($this->dataTm) || !$this->dataTm) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->dataTm of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
755 1
            $this->getDataTm();
756
        }
757
758 1
        if (null !== $x) {
759 1
            $x = (float) $x;
760
        }
761
762 1
        if (null !== $y) {
763 1
            $y = (float) $y;
764
        }
765
766 1
        if (null === $x && null === $y) {
767
            return [];
768
        }
769
770 1
        $xError = (float) $xError;
771 1
        $yError = (float) $yError;
772
773 1
        $extractedData = [];
774 1
        foreach ($this->dataTm as $item) {
775 1
            $tm = $item[0];
776 1
            $xTm = (float) $tm[4];
777 1
            $yTm = (float) $tm[5];
778 1
            $text = $item[1];
779 1
            if (null === $y) {
780
                if (($xTm >= ($x - $xError)) &&
781
                    ($xTm <= ($x + $xError))) {
782
                    $extractedData[] = [$tm, $text];
783
                    continue;
784
                }
785
            }
786 1
            if (null === $x) {
787
                if (($yTm >= ($y - $yError)) &&
788
                    ($yTm <= ($y + $yError))) {
789
                    $extractedData[] = [$tm, $text];
790
                    continue;
791
                }
792
            }
793 1
            if (($xTm >= ($x - $xError)) &&
794 1
                ($xTm <= ($x + $xError)) &&
795 1
                ($yTm >= ($y - $yError)) &&
796 1
                ($yTm <= ($y + $yError))) {
797 1
                $extractedData[] = [$tm, $text];
798 1
                continue;
799
            }
800
        }
801
802 1
        return $extractedData;
803
    }
804
}
805