Passed
Push — master ( c2bb70...f0e189 )
by David
02:16
created

Client   C

Complexity

Total Complexity 55

Size/Duplication

Total Lines 592
Duplicated Lines 0 %

Importance

Changes 10
Bugs 0 Features 1
Metric Value
eloc 100
dl 0
loc 592
rs 6
c 10
b 0
f 1
wmc 55

32 Methods

Rating   Name   Duplication   Size   Complexity  
B checkRequest() 0 24 8
A cacheResponse() 0 5 1
A getCachedResponse() 0 3 2
A getMetadata() 0 16 3
A setChecked() 0 3 1
A getText() 0 8 2
A getCallback() 0 3 1
A getVersion() 0 3 1
A getLanguage() 0 3 1
A setCallback() 0 21 3
A getRecursiveMetadata() 0 3 1
A setDownloadRemote() 0 5 1
A isVersionSupported() 0 3 1
A isCacheable() 0 3 1
A getMainText() 0 8 2
A isChecked() 0 3 1
A __construct() 0 3 2
A downloadFile() 0 31 4
A getMIME() 0 3 1
A prepare() 0 3 1
A getEncoding() 0 3 1
A setChunkSize() 0 16 4
A getSupportedMIMETypes() 0 3 1
A isCached() 0 3 1
A getDownloadRemote() 0 3 1
A getSupportedVersions() 0 3 1
A getAvailableParsers() 0 3 1
A setEncoding() 0 5 1
A getHTML() 0 8 2
A make() 0 9 2
A getAvailableDetectors() 0 3 1
A getChunkSize() 0 3 1

How to fix   Complexity   

Complex Class

Complex classes like Client often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Client, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace Vaites\ApacheTika;
4
5
use Closure;
6
use Exception;
7
8
use Vaites\ApacheTika\Clients\CLIClient;
9
use Vaites\ApacheTika\Clients\WebClient;
10
use Vaites\ApacheTika\Metadata\Metadata;
11
12
/**
13
 * Apache Tika client interface
14
 *
15
 * @author  David Martínez <[email protected]>
16
 * @link    https://tika.apache.org/1.23/formats.html
17
 */
18
abstract class Client
19
{
20
    const MODE = null;
21
22
    /**
23
     * List of supported Apache Tika versions
24
     *
25
     * @var array
26
     */
27
    protected static $supportedVersions =
28
    [
29
        '1.7', '1.8', '1.9', '1.10', '1.11', '1.12', '1.13', '1.14', '1.15',
30
        '1.16', '1.17', '1.18', '1.19', '1.19.1', '1.20', '1.21', '1.22', '1.23'
31
    ];
32
33
    /**
34
     * Checked flag
35
     *
36
     * @var bool
37
     */
38
    protected $checked = false;
39
40
    /**
41
     * Response using callbacks
42
     *
43
     * @var string
44
     */
45
    protected $response = null;
46
47
    /**
48
     * Platform (unix or win)
49
     *
50
     * @var string
51
     */
52
    protected $platform = null;
53
54
    /**
55
     * Cached responses to avoid multiple request for the same file.
56
     *
57
     * @var array
58
     */
59
    protected $cache = [];
60
61
    /**
62
     * Text encoding
63
     *
64
     * @var \Closure
65
     */
66
    protected $encoding = null;
67
68
    /**
69
     * Callback called on secuential read
70
     *
71
     * @var \Closure
72
     */
73
    protected $callback = null;
74
75
    /**
76
     * Enable or disable appending when using callback
77
     *
78
     * @var bool
79
     */
80
    protected $callbackAppend = true;
81
82
    /**
83
     * Size of chunks for callback
84
     *
85
     * @var int
86
     */
87
    protected $chunkSize = 1048576;
88
89
    /**
90
     * Remote download flag
91
     *
92
     * @var bool
93
     */
94
    protected $downloadRemote = false;
95
96
    /**
97
     * Configure client
98
     */
99
    public function __construct()
100
    {
101
        $this->platform = defined('PHP_WINDOWS_VERSION_MAJOR') ? 'win' : 'unix';
102
    }
103
104
    /**
105
     * Get a class instance throwing an exception if check fails
106
     *
107
     * @param   string  $param1     path or host
108
     * @param   int     $param2     Java binary path or port for web client
109
     * @param   array   $options    options for cURL request
110
     * @param   bool    $check      check JAR file or server connection
111
     * @return  \Vaites\ApacheTika\Clients\CLIClient|\Vaites\ApacheTika\Clients\WebClient
112
     * @throws  \Exception
113
     */
114
    public static function make($param1 = null, $param2 = null, $options = [], $check = true)
115
    {
116
        if(preg_match('/\.jar$/', func_get_arg(0)))
117
        {
118
            return new CLIClient($param1, $param2, $check);
119
        }
120
        else
121
        {
122
            return new WebClient($param1, $param2, $options, $check);
123
        }
124
    }
125
126
    /**
127
     * Get a class instance delaying the check
128
     *
129
     * @param   string  $param1     path or host
130
     * @param   int     $param2     Java binary path or port for web client
131
     * @param   array   $options    options for cURL request
132
     * @return  \Vaites\ApacheTika\Clients\CLIClient|\Vaites\ApacheTika\Clients\WebClient
133
     * @throws  \Exception
134
     */
135
    public static function prepare($param1 = null, $param2 = null, $options = [])
136
    {
137
        return self::make($param1, $param2, $options, false);
138
    }
139
140
    /**
141
     * Get the encoding
142
     *
143
     * @return  \Closure|null
144
     */
145
    public function getEncoding()
146
    {
147
        return $this->encoding;
148
    }
149
150
    /**
151
     * Set the encoding
152
     *
153
     * @param   string   $encoding
154
     * @return  $this
155
     * @throws  \Exception
156
     */
157
    public function setEncoding($encoding)
158
    {
159
        $this->encoding = $encoding;
0 ignored issues
show
Documentation Bug introduced by
It seems like $encoding of type string is incompatible with the declared type Closure of property $encoding.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
160
161
        return $this;
162
    }
163
164
    /**
165
     * Get the callback
166
     *
167
     * @return  \Closure|null
168
     */
169
    public function getCallback()
170
    {
171
        return $this->callback;
172
    }
173
174
    /**
175
     * Set the callback (callable or closure) for call on secuential read
176
     *
177
     * @param   mixed   $callback
178
     * @param   bool    $append
179
     * @return  $this
180
     * @throws  \Exception
181
     */
182
    public function setCallback($callback, $append = true)
183
    {
184
        if($callback instanceof Closure)
185
        {
186
            $this->callbackAppend = (bool) $append;
187
            $this->callback = $callback;
188
        }
189
        elseif(is_callable($callback))
190
        {
191
            $this->callbackAppend = (bool) $append;
192
            $this->callback = function($chunk) use($callback)
193
            {
194
                return call_user_func_array($callback, [$chunk]);
195
            };
196
        }
197
        else
198
        {
199
            throw new Exception('Invalid callback');
200
        }
201
202
        return $this;
203
    }
204
205
    /**
206
     * Get the chunk size
207
     *
208
     * @return  int
209
     */
210
    public function getChunkSize()
211
    {
212
        return $this->chunkSize;
213
    }
214
215
    /**
216
     * Set the chunk size for secuential read
217
     *
218
     * @param   int     $size
219
     * @return  $this
220
     * @throws  \Exception
221
     */
222
    public function setChunkSize($size)
223
    {
224
        if(static::MODE == 'cli' && is_numeric($size))
0 ignored issues
show
introduced by
The condition static::MODE == 'cli' is always false.
Loading history...
225
        {
226
            $this->chunkSize = (int)$size;
227
        }
228
        elseif(static::MODE == 'web')
0 ignored issues
show
introduced by
The condition static::MODE == 'web' is always false.
Loading history...
229
        {
230
            throw new Exception('Chunk size is not supported on web mode');
231
        }
232
        else
233
        {
234
            throw new Exception("$size is not a valid chunk size");
235
        }
236
237
        return $this;
238
    }
239
240
    /**
241
     * Get the remote download flag
242
     *
243
     * @return  bool
244
     */
245
    public function getDownloadRemote()
246
    {
247
        return $this->downloadRemote;
248
    }
249
250
    /**
251
     * Set the remote download flag
252
     *
253
     * @param   bool    $download
254
     * @return  $this
255
     */
256
    public function setDownloadRemote($download)
257
    {
258
        $this->downloadRemote = (bool) $download;
259
260
        return $this;
261
    }
262
263
    /**
264
     * Gets file metadata using recursive if specified
265
     *
266
     * @link    https://wiki.apache.org/tika/TikaJAXRS#Recursive_Metadata_and_Content
267
     * @param   string  $file
268
     * @param   string  $recursive
269
     * @return  \Vaites\ApacheTika\Metadata\Metadata|\Vaites\ApacheTika\Metadata\DocumentMetadata|\Vaites\ApacheTika\Metadata\ImageMetadata
270
     * @throws  \Exception
271
     */
272
    public function getMetadata($file, $recursive = null)
273
    {
274
        if(is_null($recursive))
275
        {
276
            $response = $this->request('meta', $file);
277
        }
278
        elseif(in_array($recursive, ['text', 'html', 'ignore']))
279
        {
280
            $response = $this->request("rmeta/$recursive", $file);
281
        }
282
        else
283
        {
284
            throw new Exception("Unknown recursive type (must be text, html, ignore or null)");
285
        }
286
287
        return Metadata::make($response, $file);
288
    }
289
290
    /**
291
     * Gets recursive file metadata (alias for getMetadata)
292
     *
293
     * @param   string  $file
294
     * @param   string  $recursive
295
     * @return  \Vaites\ApacheTika\Metadata\Metadata
296
     * @throws  \Exception
297
     */
298
    public function getRecursiveMetadata($file, $recursive)
299
    {
300
        return $this->getMetadata($file, $recursive);
301
    }
302
303
    /**
304
     * Detect language
305
     *
306
     * @param   string  $file
307
     * @return  string
308
     * @throws  \Exception
309
     */
310
    public function getLanguage($file)
311
    {
312
        return $this->request('lang', $file);
313
    }
314
315
    /**
316
     * Detect MIME type
317
     *
318
     * @param   string  $file
319
     * @return  string
320
     * @throws \Exception
321
     */
322
    public function getMIME($file)
323
    {
324
        return $this->request('mime', $file);
325
    }
326
327
    /**
328
     * Extracts HTML
329
     *
330
     * @param   string  $file
331
     * @param   mixed   $callback
332
     * @param   bool    $append
333
     * @return  string
334
     * @throws  \Exception
335
     */
336
    public function getHTML($file, $callback = null, $append = true)
337
    {
338
        if(!is_null($callback))
339
        {
340
            $this->setCallback($callback, $append);
341
        }
342
343
        return $this->request('html', $file);
344
    }
345
346
    /**
347
     * Extracts text
348
     *
349
     * @param   string  $file
350
     * @param   mixed   $callback
351
     * @param   bool    $append
352
     * @return  string
353
     * @throws  \Exception
354
     */
355
    public function getText($file, $callback = null, $append = true)
356
    {
357
        if(!is_null($callback))
358
        {
359
            $this->setCallback($callback, $append);
360
        }
361
362
        return $this->request('text', $file);
363
    }
364
365
    /**
366
     * Extracts main text
367
     *
368
     * @param   string  $file
369
     * @param   mixed   $callback
370
     * @param   bool    $append
371
     * @return  string
372
     * @throws  \Exception
373
     */
374
    public function getMainText($file, $callback = null, $append = true)
375
    {
376
        if(!is_null($callback))
377
        {
378
            $this->setCallback($callback, $append);
379
        }
380
381
        return $this->request('text-main', $file);
382
    }
383
384
    /**
385
     * Returns the supported MIME types
386
     *
387
     * @return  string
388
     * @throws  \Exception
389
     */
390
    public function getSupportedMIMETypes()
391
    {
392
        return $this->request('mime-types');
393
    }
394
395
    /**
396
     * Returns the available detectors
397
     *
398
     * @return  string
399
     * @throws  \Exception
400
     */
401
    public function getAvailableDetectors()
402
    {
403
        return $this->request('detectors');
404
    }
405
406
    /**
407
     * Returns the available parsers
408
     *
409
     * @return  string
410
     * @throws  \Exception
411
     */
412
    public function getAvailableParsers()
413
    {
414
        return $this->request('parsers');
415
    }
416
417
    /**
418
     * Returns current Tika version
419
     *
420
     * @return  string
421
     * @throws  \Exception
422
     */
423
    public function getVersion()
424
    {
425
        return $this->request('version');
426
    }
427
428
    /**
429
     * Return the list of Apache Tika supported versions
430
     *
431
     * @return array
432
     */
433
    public static function getSupportedVersions()
434
    {
435
        return self::$supportedVersions;
436
    }
437
438
    /**
439
     * Sets the checked flag
440
     *
441
     * @param   bool    $checked
442
     */
443
    public function setChecked($checked)
444
    {
445
        $this->checked = (bool) $checked;
446
    }
447
448
    /**
449
     * Checks if instance is checked
450
     *
451
     * @return  bool
452
     */
453
    public function isChecked()
454
    {
455
        return $this->checked;
456
    }
457
458
    /**
459
     * Check if a response is cached
460
     *
461
     * @param   string  $type
462
     * @param   string  $file
463
     * @return  mixed
464
     */
465
    protected function isCached($type, $file)
466
    {
467
        return isset($this->cache[sha1($file)][$type]);
468
    }
469
470
    /**
471
     * Get a cached response
472
     *
473
     * @param   string  $type
474
     * @param   string  $file
475
     * @return  mixed
476
     */
477
    protected function getCachedResponse($type, $file)
478
    {
479
        return isset($this->cache[sha1($file)][$type]) ? $this->cache[sha1($file)][$type] : null;
480
    }
481
482
    /**
483
     * Check if a request type must be cached
484
     *
485
     * @param   string  $type
486
     * @return  bool
487
     */
488
    protected function isCacheable($type)
489
    {
490
        return in_array($type, ['lang', 'meta']);
491
    }
492
493
    /**
494
     * Caches a response
495
     *
496
     * @param   string  $type
497
     * @param   mixed   $response
498
     * @param   string  $file
499
     * @return  bool
500
     */
501
    protected function cacheResponse($type, $response, $file)
502
    {
503
        $this->cache[sha1($file)][$type] = $response;
504
505
        return true;
506
    }
507
508
    /**
509
     * Checks if a specific version is supported
510
     *
511
     * @param   string  $version
512
     * @return  bool
513
     */
514
    public static function isVersionSupported($version)
515
    {
516
        return in_array($version, self::getSupportedVersions());
517
    }
518
519
    /**
520
     * Check the request before executing
521
     *
522
     * @param   string  $type
523
     * @param   string  $file
524
     * @return  string
525
     * @throws  \Exception
526
     */
527
    public function checkRequest($type, $file)
528
    {
529
        // no checks for getters
530
        if(in_array($type, ['detectors', 'mime-types', 'parsers', 'version']))
531
        {
532
            //
533
        }
534
        // invalid local file
535
        elseif(!preg_match('/^http/', $file) && !file_exists($file))
536
        {
537
            throw new Exception("File $file can't be opened");
538
        }
539
        // invalid remote file
540
        elseif(preg_match('/^http/', $file) && !preg_match('/200/', get_headers($file)[0]))
541
        {
542
            throw new Exception("File $file can't be opened", 2);
543
        }
544
        // download remote file if required only for integrated downloader
545
        elseif(preg_match('/^http/', $file) && $this->downloadRemote)
546
        {
547
            $file = $this->downloadFile($file);
548
        }
549
550
        return $file;
551
    }
552
553
    /**
554
     * Download file to a temporary folder
555
     *
556
     * @link    https://wiki.apache.org/tika/TikaJAXRS#Specifying_a_URL_Instead_of_Putting_Bytes
557
     * @param   string  $file
558
     * @return  string
559
     * @throws  \Exception
560
     */
561
    protected function downloadFile($file)
562
    {
563
        $dest = tempnam(sys_get_temp_dir(), 'TIKA');
564
565
        $fp = fopen($dest, 'w+');
566
567
        if($fp === false)
568
        {
569
            throw new Exception("$dest can't be opened");
570
        }
571
572
        $ch = curl_init($file);
573
        curl_setopt($ch, CURLOPT_FILE, $fp);
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_setopt() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

573
        curl_setopt(/** @scrutinizer ignore-type */ $ch, CURLOPT_FILE, $fp);
Loading history...
574
        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
575
        curl_exec($ch);
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_exec() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

575
        curl_exec(/** @scrutinizer ignore-type */ $ch);
Loading history...
576
577
        if(curl_errno($ch))
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_errno() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

577
        if(curl_errno(/** @scrutinizer ignore-type */ $ch))
Loading history...
578
        {
579
            throw new Exception(curl_error($ch));
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_error() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

579
            throw new Exception(curl_error(/** @scrutinizer ignore-type */ $ch));
Loading history...
580
        }
581
582
        $code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_getinfo() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

582
        $code = curl_getinfo(/** @scrutinizer ignore-type */ $ch, CURLINFO_HTTP_CODE);
Loading history...
583
584
        curl_close($ch);
0 ignored issues
show
Bug introduced by
It seems like $ch can also be of type false; however, parameter $ch of curl_close() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

584
        curl_close(/** @scrutinizer ignore-type */ $ch);
Loading history...
585
586
        if($code != 200)
587
        {
588
            throw new Exception("$file can't be downloaded", $code);
589
        }
590
591
        return $dest;
592
    }
593
594
    /**
595
     * Check Java binary, JAR path or server connection
596
     *
597
     * @return  void
598
     */
599
    abstract public function check();
600
601
    /**
602
     * Configure and make a request and return its results.
603
     *
604
     * @param   string  $type
605
     * @param   string  $file
606
     * @return  string
607
     * @throws  \Exception
608
     */
609
    abstract public function request($type, $file = null);
610
}
611