Passed
Push — master ( a90ece...2f3895 )
by Jeremy
01:27 queued 11s
created

Page::getTextXY()   D

Complexity

Conditions 18
Paths 104

Size

Total Lines 51
Code Lines 33

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 25
CRAP Score 24.009

Importance

Changes 2
Bugs 0 Features 0
Metric Value
cc 18
eloc 33
c 2
b 0
f 0
nc 104
nop 4
dl 0
loc 51
ccs 25
cts 34
cp 0.7352
crap 24.009
rs 4.8333

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
3
/**
4
 * @file
5
 *          This file is part of the PdfParser library.
6
 *
7
 * @author  Sébastien MALOT <[email protected]>
8
 * @date    2017-01-03
9
 *
10
 * @license LGPLv3
11
 * @url     <https://github.com/smalot/pdfparser>
12
 *
13
 *  PdfParser is a pdf library written in PHP, extraction oriented.
14
 *  Copyright (C) 2017 - Sébastien MALOT <[email protected]>
15
 *
16
 *  This program is free software: you can redistribute it and/or modify
17
 *  it under the terms of the GNU Lesser General Public License as published by
18
 *  the Free Software Foundation, either version 3 of the License, or
19
 *  (at your option) any later version.
20
 *
21
 *  This program is distributed in the hope that it will be useful,
22
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
23
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
24
 *  GNU Lesser General Public License for more details.
25
 *
26
 *  You should have received a copy of the GNU Lesser General Public License
27
 *  along with this program.
28
 *  If not, see <http://www.pdfparser.org/sites/default/LICENSE.txt>.
29
 */
30
31
namespace Smalot\PdfParser;
32
33
use Smalot\PdfParser\Element\ElementArray;
34
use Smalot\PdfParser\Element\ElementMissing;
35
use Smalot\PdfParser\Element\ElementNull;
36
use Smalot\PdfParser\Element\ElementXRef;
37
38
class Page extends PDFObject
39
{
40
    /**
41
     * @var Font[]
42
     */
43
    protected $fonts = null;
44
45
    /**
46
     * @var PDFObject[]
47
     */
48
    protected $xobjects = null;
49
50
    /**
51
     * @var array
52
     */
53
    protected $dataTm = null;
54
55
    /**
56
     * @return Font[]
57
     */
58 9
    public function getFonts()
59
    {
60 9
        if (null !== $this->fonts) {
61 8
            return $this->fonts;
62
        }
63
64 9
        $resources = $this->get('Resources');
65
66 9
        if (method_exists($resources, 'has') && $resources->has('Font')) {
67 9
            if ($resources->get('Font') instanceof Header) {
0 ignored issues
show
Bug introduced by
The method get() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

67
            if ($resources->/** @scrutinizer ignore-call */ get('Font') instanceof Header) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
68 4
                $fonts = $resources->get('Font')->getElements();
69
            } else {
70 7
                $fonts = $resources->get('Font')->getHeader()->getElements();
71
            }
72
73 9
            $table = [];
74
75 9
            foreach ($fonts as $id => $font) {
76 9
                if ($font instanceof Font) {
77 9
                    $table[$id] = $font;
78
79
                    // Store too on cleaned id value (only numeric)
80 9
                    $id = preg_replace('/[^0-9\.\-_]/', '', $id);
81 9
                    if ('' != $id) {
82 9
                        $table[$id] = $font;
83
                    }
84
                }
85
            }
86
87 9
            return $this->fonts = $table;
88
        }
89
90 1
        return [];
91
    }
92
93
    /**
94
     * @param string $id
95
     *
96
     * @return Font|null
97
     */
98 8
    public function getFont($id)
99
    {
100 8
        $fonts = $this->getFonts();
101
102 8
        if (isset($fonts[$id])) {
103 8
            return $fonts[$id];
104
        }
105
106 2
        $id = preg_replace('/[^0-9\.\-_]/', '', $id);
107
108 2
        if (isset($fonts[$id])) {
109 1
            return $fonts[$id];
110
        }
111
112 1
        return null;
113
    }
114
115
    /**
116
     * Support for XObject
117
     *
118
     * @return PDFObject[]
119
     */
120
    public function getXObjects()
121
    {
122
        if (null !== $this->xobjects) {
123
            return $this->xobjects;
124
        }
125
126
        $resources = $this->get('Resources');
127
128
        if (method_exists($resources, 'has') && $resources->has('XObject')) {
129
            if ($resources->get('XObject') instanceof Header) {
130
                $xobjects = $resources->get('XObject')->getElements();
131
            } else {
132
                $xobjects = $resources->get('XObject')->getHeader()->getElements();
133
            }
134
135
            $table = [];
136
137
            foreach ($xobjects as $id => $xobject) {
138
                $table[$id] = $xobject;
139
140
                // Store too on cleaned id value (only numeric)
141
                $id = preg_replace('/[^0-9\.\-_]/', '', $id);
142
                if ('' != $id) {
143
                    $table[$id] = $xobject;
144
                }
145
            }
146
147
            return $this->xobjects = $table;
148
        }
149
150
        return [];
151
    }
152
153
    /**
154
     * @param string $id
155
     *
156
     * @return PDFObject|null
157
     */
158
    public function getXObject($id)
159
    {
160
        $xobjects = $this->getXObjects();
161
162
        if (isset($xobjects[$id])) {
163
            return $xobjects[$id];
164
        }
165
166
        return null;
167
        /*$id = preg_replace('/[^0-9\.\-_]/', '', $id);
168
169
        if (isset($xobjects[$id])) {
170
            return $xobjects[$id];
171
        } else {
172
            return null;
173
        }*/
174
    }
175
176
    /**
177
     * @param Page $page
178
     *
179
     * @return string
180
     */
181 3
    public function getText(self $page = null)
182
    {
183 3
        if ($contents = $this->get('Contents')) {
184 3
            if ($contents instanceof ElementMissing) {
185
                return '';
186 3
            } elseif ($contents instanceof ElementNull) {
187
                return '';
188 3
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
189 3
                $elements = $contents->getHeader()->getElements();
190
191 3
                if (is_numeric(key($elements))) {
192
                    $new_content = '';
193
194
                    foreach ($elements as $element) {
195
                        if ($element instanceof ElementXRef) {
196
                            $new_content .= $element->getObject()->getContent();
197
                        } else {
198
                            $new_content .= $element->getContent();
199
                        }
200
                    }
201
202
                    $header = new Header([], $this->document);
203 3
                    $contents = new PDFObject($this->document, $header, $new_content);
204
                }
205 1
            } elseif ($contents instanceof ElementArray) {
206
                // Create a virtual global content.
207 1
                $new_content = '';
208
209 1
                foreach ($contents->getContent() as $content) {
210 1
                    $new_content .= $content->getContent()."\n";
211
                }
212
213 1
                $header = new Header([], $this->document);
214 1
                $contents = new PDFObject($this->document, $header, $new_content);
215
            }
216
217 3
            return $contents->getText($this);
0 ignored issues
show
Bug introduced by
The method getText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

217
            return $contents->/** @scrutinizer ignore-call */ getText($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
218
        }
219
220
        return '';
221
    }
222
223
    /**
224
     * @param Page $page
225
     *
226
     * @return array
227
     */
228
    public function getTextArray(self $page = null)
229
    {
230
        if ($contents = $this->get('Contents')) {
231
            if ($contents instanceof ElementMissing) {
232
                return [];
233
            } elseif ($contents instanceof ElementNull) {
234
                return [];
235
            } elseif ($contents instanceof PDFObject) {
0 ignored issues
show
introduced by
$contents is never a sub-type of Smalot\PdfParser\PDFObject.
Loading history...
236
                $elements = $contents->getHeader()->getElements();
237
238
                if (is_numeric(key($elements))) {
239
                    $new_content = '';
240
241
                    /** @var PDFObject $element */
242
                    foreach ($elements as $element) {
243
                        if ($element instanceof ElementXRef) {
244
                            $new_content .= $element->getObject()->getContent();
245
                        } else {
246
                            $new_content .= $element->getContent();
247
                        }
248
                    }
249
250
                    $header = new Header([], $this->document);
251
                    $contents = new PDFObject($this->document, $header, $new_content);
252
                }
253
            } elseif ($contents instanceof ElementArray) {
254
                // Create a virtual global content.
255
                $new_content = '';
256
257
                /** @var PDFObject $content */
258
                foreach ($contents->getContent() as $content) {
259
                    $new_content .= $content->getContent()."\n";
260
                }
261
262
                $header = new Header([], $this->document);
263
                $contents = new PDFObject($this->document, $header, $new_content);
264
            }
265
266
            return $contents->getTextArray($this);
0 ignored issues
show
Bug introduced by
The method getTextArray() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

266
            return $contents->/** @scrutinizer ignore-call */ getTextArray($this);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
267
        }
268
269
        return [];
270
    }
271
272
    /**
273
     * Gets all the text data with its internal representation of the page.
274
     *
275
     * @return array An array with the data and the internal representation
276
     */
277 5
    public function extractRawData()
278
    {
279
        /*
280
         * Now you can get the complete content of the object with the text on it
281
         */
282 5
        $extractedData = [];
283 5
        $content = $this->get('Contents');
284 5
        $values = $content->getContent();
285 5
        if (isset($values) and \is_array($values)) {
286
            $text = '';
287
            foreach ($values as $section) {
288
                $text .= $section->getContent();
289
            }
290
            $sectionsText = $this->getSectionsText($text);
291
            foreach ($sectionsText as $sectionText) {
292
                $commandsText = $this->getCommandsText($sectionText);
293
                foreach ($commandsText as $command) {
294
                    $extractedData[] = $command;
295
                }
296
            }
297
        } else {
298 5
            $sectionsText = $content->getSectionsText($content->getContent());
0 ignored issues
show
Bug introduced by
The method getSectionsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

298
            /** @scrutinizer ignore-call */ 
299
            $sectionsText = $content->getSectionsText($content->getContent());

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
299 5
            foreach ($sectionsText as $sectionText) {
300 5
                $commandsText = $content->getCommandsText($sectionText);
0 ignored issues
show
Bug introduced by
The method getCommandsText() does not exist on Smalot\PdfParser\Element. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

300
                /** @scrutinizer ignore-call */ 
301
                $commandsText = $content->getCommandsText($sectionText);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
301 5
                foreach ($commandsText as $command) {
302 5
                    $extractedData[] = $command;
303
                }
304
            }
305
        }
306
307 5
        return $extractedData;
308
    }
309
310
    /**
311
     * Gets all the decoded text data with it internal representation from a page.
312
     *
313
     * @param array $extractedRawData the extracted data return by extractRawData or
314
     *                                null if extractRawData should be called
315
     *
316
     * @return array An array with the data and the internal representation
317
     */
318 4
    public function extractDecodedRawData($extractedRawData = null)
319
    {
320 4
        if (!isset($extractedRawData) or !$extractedRawData) {
321 4
            $extractedRawData = $this->extractRawData();
322
        }
323 4
        $unicode = true;
324 4
        $currentFont = null;
325 4
        foreach ($extractedRawData as &$command) {
326 4
            if ('Tj' == $command['o'] or 'TJ' == $command['o']) {
327 4
                $data = $command['c'];
328 4
                if (!\is_array($data)) {
329 4
                    $tmpText = '';
330 4
                    if (isset($currentFont)) {
331 4
                        $tmpText = $currentFont->decodeOctal($data);
332
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
333
                    }
334 4
                    $tmpText = str_replace(
335 4
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
336 4
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
337 4
                            $tmpText
338
                    );
339 4
                    $tmpText = utf8_encode($tmpText);
340 4
                    if (isset($currentFont)) {
341 4
                        $tmpText = $currentFont->decodeContent($tmpText, $unicode);
342
                    }
343 4
                    $command['c'] = $tmpText;
344 4
                    continue;
345
                }
346 4
                $numText = \count($data);
347 4
                for ($i = 0; $i < $numText; ++$i) {
348 4
                    if (0 != ($i % 2)) {
349 4
                        continue;
350
                    }
351 4
                    $tmpText = $data[$i]['c'];
352 4
                    $decodedText = '';
353 4
                    if (isset($currentFont)) {
354 4
                        $decodedText = $currentFont->decodeOctal($tmpText);
355
                        //$tmpText = $currentFont->decodeHexadecimal($tmpText, false);
356
                    }
357 4
                    $decodedText = str_replace(
358 4
                            ['\\\\', '\(', '\)', '\n', '\r', '\t', '\ '],
359 4
                            ['\\', '(', ')', "\n", "\r", "\t", ' '],
360 4
                            $decodedText
361
                    );
362 4
                    $decodedText = utf8_encode($decodedText);
363 4
                    if (isset($currentFont)) {
364 4
                        $decodedText = $currentFont->decodeContent($decodedText, $unicode);
365
                    }
366 4
                    $command['c'][$i]['c'] = $decodedText;
367 4
                    continue;
368
                }
369 4
            } elseif ('Tf' == $command['o'] or 'TF' == $command['o']) {
370 4
                $fontId = explode(' ', $command['c'])[0];
371 4
                $currentFont = $this->getFont($fontId);
372 4
                continue;
373
            }
374
        }
375
376 4
        return $extractedRawData;
377
    }
378
379
    /**
380
     * Gets just the Text commands that are involved in text positions and
381
     * Text Matrix (Tm)
382
     *
383
     * It extract just the PDF commands that are involved with text positions, and
384
     * the Text Matrix (Tm). These are: BT, ET, TL, Td, TD, Tm, T*, Tj, ', ", and TJ
385
     *
386
     * @param array $extractedDecodedRawData The data extracted by extractDecodeRawData.
387
     *                                       If it is null, the method extractDecodeRawData is called.
388
     *
389
     * @return array An array with the text command of the page
390
     */
391 3
    public function getDataCommands($extractedDecodedRawData = null)
392
    {
393 3
        if (!isset($extractedDecodedRawData) or !$extractedDecodedRawData) {
394 3
            $extractedDecodedRawData = $this->extractDecodedRawData();
395
        }
396 3
        $extractedData = [];
397 3
        foreach ($extractedDecodedRawData as $command) {
398 3
            switch ($command['o']) {
399
                /*
400
                 * BT
401
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
402
                 */
403 3
                case 'BT':
404
                    $extractedData[] = $command;
405
                    break;
406
407
                /*
408
                 * ET
409
                 * End a text object, discarding the text matrix
410
                 */
411 3
                case 'ET':
412
                    $extractedData[] = $command;
413
                    break;
414
415
                /*
416
                 * leading TL
417
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
418
                 * Initial value: 0
419
                 */
420 3
                case 'TL':
421 3
                    $extractedData[] = $command;
422 3
                    break;
423
424
                /*
425
                 * tx ty Td
426
                 * Move to the start of the next line, offset form the start of the
427
                 * current line by tx, ty.
428
                 */
429 3
                case 'Td':
430 3
                    $extractedData[] = $command;
431 3
                    break;
432
433
                /*
434
                 * tx ty TD
435
                 * Move to the start of the next line, offset form the start of the
436
                 * current line by tx, ty. As a side effect, this operator set the leading
437
                 * parameter in the text state. This operator has the same effect as the
438
                 * code:
439
                 * -ty TL
440
                 * tx ty Td
441
                 */
442 3
                case 'TD':
443
                    $extractedData[] = $command;
444
                    break;
445
446
                /*
447
                 * a b c d e f Tm
448
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
449
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
450
                 * [1 0 0 1 0 0]
451
                 */
452 3
                case 'Tm':
453 3
                    $extractedData[] = $command;
454 3
                    break;
455
456
                /*
457
                 * T*
458
                 * Move to the start of the next line. This operator has the same effect
459
                 * as the code:
460
                 * 0 Tl Td
461
                 * Where Tl is the current leading parameter in the text state.
462
                 */
463 3
                case 'T*':
464 3
                    $extractedData[] = $command;
465 3
                    break;
466
467
                /*
468
                 * string Tj
469
                 * Show a Text String
470
                 */
471 3
                case 'Tj':
472 3
                    $extractedData[] = $command;
473 3
                    break;
474
475
                /*
476
                 * string '
477
                 * Move to the next line and show a text string. This operator has the
478
                 * same effect as the code:
479
                 * T*
480
                 * string Tj
481
                 */
482 3
                case "'":
483
                    $extractedData[] = $command;
484
                    break;
485
486
                /*
487
                 * aw ac string "
488
                 * Move to the next lkine and show a text string, using aw as the word
489
                 * spacing and ac as the character spacing. This operator has the same
490
                 * effect as the code:
491
                 * aw Tw
492
                 * ac Tc
493
                 * string '
494
                 * Tw set the word spacing, Tw, to wordSpace.
495
                 * Tc Set the character spacing, Tc, to charsSpace.
496
                 */
497 3
                case '"':
498
                    $extractedData[] = $command;
499
                    break;
500
501
                /*
502
                 * array TJ
503
                 * Show one or more text strings allow individual glyph positioning.
504
                 * Each lement of array con be a string or a number. If the element is
505
                 * a string, this operator shows the string. If it is a number, the
506
                 * operator adjust the text position by that amount; that is, it translates
507
                 * the text matrix, Tm. This amount is substracted form the current
508
                 * horizontal or vertical coordinate, depending on the writing mode.
509
                 * in the default coordinate system, a positive adjustment has the effect
510
                 * of moving the next glyph painted either to the left or down by the given
511
                 * amount.
512
                 */
513 3
                case 'TJ':
514 3
                    $extractedData[] = $command;
515 3
                    break;
516 3
                default:
517
            }
518
        }
519
520 3
        return $extractedData;
521
    }
522
523
    /**
524
     * Gets the Text Matrix of the text in the page
525
     *
526
     * Return an array where every item is an array where the first item is the
527
     * Text Matrix (Tm) and the second is a string with the text data.  The Text matrix
528
     * is an array of 6 numbers. The last 2 numbers are the coordinates X and Y of the
529
     * text. The first 4 numbers has to be with Scalation, Rotation and Skew of the text.
530
     *
531
     * @param array $dataCommands the data extracted by getDataCommands
532
     *                            if null getDataCommands is called
533
     *
534
     * @return array an array with the data of the page including the Tm information
535
     *               of any text in the page
536
     */
537 2
    public function getDataTm($dataCommands = null)
538
    {
539 2
        if (!isset($dataCommands) or !$dataCommands) {
540 2
            $dataCommands = $this->getDataCommands();
541
        }
542
543
        /*
544
         * At the beginning of a text object Tm is the identity matrix
545
         */
546 2
        $defaultTm = ['1', '0', '0', '1', '0', '0'];
547
548
        /*
549
         *  Set the text leading used by T*, ' and " operators
550
         */
551 2
        $defaultTl = 0;
552
553
        /*
554
         * Setting where are the X and Y coordinates in the matrix (Tm)
555
         */
556 2
        $x = 4;
557 2
        $y = 5;
558 2
        $Tx = 0;
559 2
        $Ty = 0;
560
561 2
        $Tm = $defaultTm;
562 2
        $Tl = $defaultTl;
563
564 2
        $extractedData = [];
565 2
        foreach ($dataCommands as $command) {
566 2
            switch ($command['o']) {
567
                /*
568
                 * BT
569
                 * Begin a text object, inicializind the Tm and Tlm to identity matrix
570
                 */
571 2
                case 'BT':
572
                    $Tm = $defaultTl;
573
                    $Tl = $defaultTl; //review this.
574
                    $Tx = 0;
575
                    $Ty = 0;
576
                    break;
577
578
                /*
579
                 * ET
580
                 * End a text object, discarding the text matrix
581
                 */
582 2
                case 'ET':
583
                    $Tm = $defaultTl;
584
                    $Tl = $defaultTl;  //review this
585
                    $Tx = 0;
586
                    $Ty = 0;
587
                    break;
588
589
                /*
590
                 * leading TL
591
                 * Set the text leading, Tl, to leading. Tl is used by the T*, ' and " operators.
592
                 * Initial value: 0
593
                 */
594 2
                case 'TL':
595 2
                    $Tl = (float) $command['c'];
596 2
                    break;
597
598
                /*
599
                 * tx ty Td
600
                 * Move to the start of the next line, offset form the start of the
601
                 * current line by tx, ty.
602
                 */
603 2
                case 'Td':
604 2
                    $coord = explode(' ', $command['c']);
605 2
                    $Tx += (float) $coord[0];
606 2
                    $Ty += (float) $coord[1];
607 2
                    $Tm[$x] = (string) $Tx;
608 2
                    $Tm[$y] = (string) $Ty;
609 2
                    break;
610
611
                /*
612
                 * tx ty TD
613
                 * Move to the start of the next line, offset form the start of the
614
                 * current line by tx, ty. As a side effect, this operator set the leading
615
                 * parameter in the text state. This operator has the same effect as the
616
                 * code:
617
                 * -ty TL
618
                 * tx ty Td
619
                 */
620 2
                case 'TD':
621
                    $coord = explode(' ', $command['c']);
622
                    $Tl = (float) $coord[1];
623
                    $Tx += (float) $coord[0];
624
                    $Ty -= (float) $coord[1];
625
                    $Tm[$x] = (string) $Tx;
626
                    $Tm[$y] = (string) $Ty;
627
                    break;
628
629
                /*
630
                 * a b c d e f Tm
631
                 * Set the text matrix, Tm, and the text line matrix, Tlm. The operands are
632
                 * all numbers, and the initial value for Tm and Tlm is the identity matrix
633
                 * [1 0 0 1 0 0]
634
                 */
635 2
                case 'Tm':
636 2
                    $Tm = explode(' ', $command['c']);
637 2
                    $Tx = (float) $Tm[$x];
638 2
                    $Ty = (float) $Tm[$y];
639 2
                    break;
640
641
                /*
642
                 * T*
643
                 * Move to the start of the next line. This operator has the same effect
644
                 * as the code:
645
                 * 0 Tl Td
646
                 * Where Tl is the current leading parameter in the text state.
647
                 */
648 2
                case 'T*':
649 2
                    $Ty -= $Tl;
650 2
                    $Tm[$y] = (string) $Ty;
651 2
                    break;
652
653
                /*
654
                 * string Tj
655
                 * Show a Text String
656
                 */
657 2
                case 'Tj':
658 2
                    $extractedData[] = [$Tm, $command['c']];
659 2
                    break;
660
661
                /*
662
                 * string '
663
                 * Move to the next line and show a text string. This operator has the
664
                 * same effect as the code:
665
                 * T*
666
                 * string Tj
667
                 */
668 2
                case "'":
669
                    $Ty -= $Tl;
670
                    $Tm[$y] = (string) $Ty;
671
                    $extractedData[] = [$Tm, $command['c']];
672
                    break;
673
674
                /*
675
                 * aw ac string "
676
                 * Move to the next line and show a text string, using aw as the word
677
                 * spacing and ac as the character spacing. This operator has the same
678
                 * effect as the code:
679
                 * aw Tw
680
                 * ac Tc
681
                 * string '
682
                 * Tw set the word spacing, Tw, to wordSpace.
683
                 * Tc Set the character spacing, Tc, to charsSpace.
684
                 */
685 2
                case '"':
686
                    $data = explode(' ', $command['c']);
687
                    $Ty -= $Tl;
688
                    $Tm[$y] = (string) $Ty;
689
                    $extractedData[] = [$Tm, $data[2]]; //Verify
690
                    break;
691
692
                /*
693
                 * array TJ
694
                 * Show one or more text strings allow individual glyph positioning.
695
                 * Each lement of array con be a string or a number. If the element is
696
                 * a string, this operator shows the string. If it is a number, the
697
                 * operator adjust the text position by that amount; that is, it translates
698
                 * the text matrix, Tm. This amount is substracted form the current
699
                 * horizontal or vertical coordinate, depending on the writing mode.
700
                 * in the default coordinate system, a positive adjustment has the effect
701
                 * of moving the next glyph painted either to the left or down by the given
702
                 * amount.
703
                 */
704 2
                case 'TJ':
705 2
                    $text = [];
706 2
                    $data = $command['c'];
707 2
                    $numText = \count($data);
708 2
                    for ($i = 0; $i < $numText; ++$i) {
709 2
                        if ('n' == $data[$i]['t']) {
710 2
                            continue;
711
                        }
712 2
                        $tmpText = $data[$i]['c'];
713 2
                        $text[] = $tmpText;
714
                    }
715 2
                    $tjText = ''.implode('', $text);
716 2
                    $extractedData[] = [$Tm, $tjText];
717 2
                    break;
718 2
                default:
719
            }
720
        }
721 2
        $this->dataTm = $extractedData;
722
723 2
        return $extractedData;
724
    }
725
726
    /**
727
     * Gets text data that are around the given coordinates (X,Y)
728
     *
729
     * If the text is in near the given coordinates (X,Y) (or the TM info),
730
     * the text is returned.  The extractedData return by getDataTm, could be use to see
731
     * where is the coordinates of a given text, using the TM info for it.
732
     *
733
     * @param float $x      The X value of the coordinate to search for. if null
734
     *                      just the Y value is considered (same Row)
735
     * @param float $y      The Y value of the coordinate to search for
736
     *                      just the X value is considered (same column)
737
     * @param float $xError The value less or more to consider an X to be "near"
738
     * @param float $yError The value less or more to consider an Y to be "near"
739
     *
740
     * @return array An array of text that are near the given coordinates. If no text
741
     *               "near" the x,y coordinate, an empty array is returned. If Both, x
742
     *               and y coordinates are null, null is returned.
743
     */
744 1
    public function getTextXY($x = null, $y = null, $xError = 0, $yError = 0)
745
    {
746 1
        if (!isset($this->dataTm) or !$this->dataTm) {
747 1
            $this->getDataTm();
748
        }
749
750 1
        if (null !== $x) {
751 1
            $x = (float) $x;
752
        }
753
754 1
        if (null !== $y) {
755 1
            $y = (float) $y;
756
        }
757
758 1
        if (null === $x and null === $y) {
759
            return [];
760
        }
761
762 1
        $xError = (float) $xError;
763 1
        $yError = (float) $yError;
764
765 1
        $extractedData = [];
766 1
        foreach ($this->dataTm as $item) {
767 1
            $tm = $item[0];
768 1
            $xTm = (float) $tm[4];
769 1
            $yTm = (float) $tm[5];
770 1
            $text = $item[1];
771 1
            if (null === $y) {
772
                if (($xTm >= ($x - $xError)) and
773
                    ($xTm <= ($x + $xError))) {
774
                    $extractedData[] = [$tm, $text];
775
                    continue;
776
                }
777
            }
778 1
            if (null === $x) {
779
                if (($yTm >= ($y - $yError)) and
780
                    ($yTm <= ($y + $yError))) {
781
                    $extractedData[] = [$tm, $text];
782
                    continue;
783
                }
784
            }
785 1
            if (($xTm >= ($x - $xError)) and
786 1
                ($xTm <= ($x + $xError)) and
787 1
                ($yTm >= ($y - $yError)) and
788 1
                ($yTm <= ($y + $yError))) {
789 1
                $extractedData[] = [$tm, $text];
790 1
                continue;
791
            }
792
        }
793
794 1
        return $extractedData;
795
    }
796
}
797