Completed
Pull Request — master (#45)
by Robbie
24:33 queued 17:24
created

TikaRestClient   A

Complexity

Total Complexity 13

Size/Duplication

Total Lines 135
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 135
rs 10
c 0
b 0
f 0
wmc 13

5 Methods

Rating   Name   Duplication   Size   Complexity  
A isAvailable() 0 15 3
A tika() 0 29 3
A __construct() 0 12 2
A getSupportedMimes() 0 14 2
A getVersion() 0 15 3
1
<?php
2
3
namespace SilverStripe\TextExtraction\Rest;
4
5
use GuzzleHttp\Client;
6
use GuzzleHttp\Exception\RequestException;
7
use Psr\Log\LoggerInterface;
8
use SilverStripe\Core\Environment;
9
use SilverStripe\Core\Injector\Injector;
10
11
class TikaRestClient extends Client
12
{
13
    /**
14
     * Authentication options to be sent to the Tika server
15
     *
16
     * @var array
17
     */
18
    protected $options = ['username' => null, 'password' => null];
19
20
    /**
21
     * @var array
22
     */
23
    protected $mimes = [];
24
25
    /**
26
     *
27
     * @param string $baseUrl
28
     * @param array $config
29
     */
30
    public function __construct($baseUrl = '', $config = [])
31
    {
32
        $password = Environment::getEnv('SS_TIKA_PASSWORD');
33
34
        if (!empty($password)) {
35
            $this->options = [
36
                'username' => Environment::getEnv('SS_TIKA_USERNAME'),
37
                'password' => $password,
38
            ];
39
        }
40
41
        parent::__construct($config);
42
    }
43
44
    /**
45
     * Detect if the service is available
46
     *
47
     * @return bool
48
     */
49
    public function isAvailable()
50
    {
51
        try {
52
            $result = $this->get(null);
53
            $result->setAuth($this->options['username'], $this->options['password']);
0 ignored issues
show
Bug introduced by
The method setAuth() does not exist on Psr\Http\Message\ResponseInterface. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

53
            $result->/** @scrutinizer ignore-call */ 
54
                     setAuth($this->options['username'], $this->options['password']);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
54
            $result->send();
0 ignored issues
show
Bug introduced by
The method send() does not exist on Psr\Http\Message\ResponseInterface. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

54
            $result->/** @scrutinizer ignore-call */ 
55
                     send();

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
55
56
            if ($result->getResponse()->getStatusCode() == 200) {
0 ignored issues
show
Bug introduced by
The method getResponse() does not exist on Psr\Http\Message\ResponseInterface. Did you maybe mean getReasonPhrase()? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

56
            if ($result->/** @scrutinizer ignore-call */ getResponse()->getStatusCode() == 200) {

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
57
                return true;
58
            }
59
        } catch (RequestException $ex) {
60
            $msg = sprintf("Tika unavailable - %s", $ex->getMessage());
61
            Injector::inst()->get(LoggerInterface::class)->info($msg);
62
63
            return false;
64
        }
65
    }
66
67
    /**
68
     * Get version code
69
     *
70
     * @return float
71
     */
72
    public function getVersion()
73
    {
74
        $response = $this->get('version');
75
        $response->setAuth($this->options['username'], $this->options['password']);
76
        $response->send();
77
        $version = 0.0;
78
79
        // Parse output
80
        if ($response->getResponse()->getStatusCode() == 200 &&
81
            preg_match('/Apache Tika (?<version>[\.\d]+)/', $response->getResponse()->getBody(), $matches)
82
        ) {
83
            $version = (float)$matches['version'];
84
        }
85
86
        return $version;
87
    }
88
89
    /**
90
     * Gets supported mime data. May include aliased mime types.
91
     *
92
     * @return array
93
     */
94
    public function getSupportedMimes()
95
    {
96
        if ($this->mimes) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->mimes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
97
            return $this->mimes;
98
        }
99
100
        $response = $this->get(
101
            'mime-types',
102
            array('Accept' => 'application/json')
103
        );
104
        $response->setAuth($this->options['username'], $this->options['password']);
105
        $response->send();
106
107
        return $this->mimes = $response->getResponse()->json();
108
    }
109
110
    /**
111
     * Extract text content from a given file.
112
     * Logs a notice-level error if the document can't be parsed.
113
     *
114
     * @param  string $file Full filesystem path to a file to post
115
     * @return string Content of the file extracted as plain text
116
     */
117
    public function tika($file)
118
    {
119
        $text = null;
120
        try {
121
            $response = $this->put(
122
                'tika',
123
                ['Accept' => 'text/plain'],
124
                file_get_contents($file)
0 ignored issues
show
Unused Code introduced by
The call to GuzzleHttp\Client::put() has too many arguments starting with file_get_contents($file). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

124
            /** @scrutinizer ignore-call */ 
125
            $response = $this->put(

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
125
            );
126
            $response->setAuth($this->options['username'], $this->options['password']);
127
            $response->send();
128
            $text = $response->getResponse()->getBody(true);
129
        } catch (RequestException $e) {
130
            $msg = sprintf(
131
                'TikaRestClient was not able to process %s. Response: %s %s.',
132
                $file,
133
                $e->getResponse()->getStatusCode(),
134
                $e->getResponse()->getReasonPhrase()
135
            );
136
            // Only available if tika-server was started with --includeStack
137
            $body = $e->getResponse()->getBody(true);
0 ignored issues
show
Unused Code introduced by
The call to Psr\Http\Message\MessageInterface::getBody() has too many arguments starting with true. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

137
            $body = $e->getResponse()->/** @scrutinizer ignore-call */ getBody(true);

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress. Please note the @ignore annotation hint above.

Loading history...
138
            if ($body) {
0 ignored issues
show
introduced by
$body is of type Psr\Http\Message\StreamInterface, thus it always evaluated to true.
Loading history...
139
                $msg .= ' Body: ' . $body;
140
            }
141
142
            Injector::inst()->get(LoggerInterface::class)->info($msg);
143
        }
144
145
        return $text;
146
    }
147
}
148