Completed
Push — prado-3.3 ( e90646...0b76d5 )
by Fabio
23:37 queued 03:01
created

Zend_Search_Lucene_Search_Query_Phrase   C

Complexity

Total Complexity 57

Size/Duplication

Total Lines 383
Duplicated Lines 8.09 %

Coupling/Cohesion

Components 1
Dependencies 7

Importance

Changes 0
Metric Value
dl 31
loc 383
rs 5.04
c 0
b 0
f 0
wmc 57
lcom 1
cbo 7

11 Methods

Rating   Name   Duplication   Size   Complexity  
B __construct() 0 31 9
A setSlop() 0 4 1
A getSlop() 0 4 1
A addTerm() 0 15 5
A getTerms() 0 4 1
A setWeight() 0 4 1
A _createWeight() 0 4 1
B _calculateResult() 31 31 8
B _exactPhraseFreq() 0 38 8
C _sloppyPhraseFreq() 0 68 15
B score() 0 37 7

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like Zend_Search_Lucene_Search_Query_Phrase often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Zend_Search_Lucene_Search_Query_Phrase, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * Zend Framework
4
 *
5
 * LICENSE
6
 *
7
 * This source file is subject to version 1.0 of the Zend Framework
8
 * license, that is bundled with this package in the file LICENSE, and
9
 * is available through the world-wide-web at the following URL:
10
 * http://www.zend.com/license/framework/1_0.txt. If you did not receive
11
 * a copy of the Zend Framework license and are unable to obtain it
12
 * through the world-wide-web, please send a note to [email protected]
13
 * so we can mail you a copy immediately.
14
 *
15
 * @package    Zend_Search_Lucene
16
 * @subpackage Search
17
 * @copyright  Copyright (c) 2005-2006 Zend Technologies USA Inc. (http://www.zend.com)
18
 * @license    http://www.zend.com/license/framework/1_0.txt Zend Framework License version 1.0
19
 */
20
21
22
/**
23
 * Zend_Search_Lucene_Search_Query
24
 */
25
require_once 'Zend/Search/Lucene/Search/Query.php';
26
27
/**
28
 * Zend_Search_Lucene_Search_Weight_MultiTerm
29
 */
30
require_once 'Zend/Search/Lucene/Search/Weight/Phrase.php';
31
32
33
/**
34
 * A Query that matches documents containing a particular sequence of terms.
35
 *
36
 * @package    Zend_Search_Lucene
37
 * @subpackage Search
38
 * @copyright  Copyright (c) 2005-2006 Zend Technologies USA Inc. (http://www.zend.com)
39
 * @license    http://www.zend.com/license/framework/1_0.txt Zend Framework License version 1.0
40
 */
41
class Zend_Search_Lucene_Search_Query_Phrase extends Zend_Search_Lucene_Search_Query
42
{
43
    /**
44
     * Terms to find.
45
     * Array of Zend_Search_Lucene_Index_Term objects.
46
     *
47
     * @var array
48
     */
49
    private $_terms;
50
51
    /**
52
     * Term positions (relative positions of terms within the phrase).
53
     * Array of integers
54
     *
55
     * @var array
56
     */
57
    private $_offsets;
58
59
    /**
60
     * Sets the number of other words permitted between words in query phrase.
61
     * If zero, then this is an exact phrase search.  For larger values this works
62
     * like a WITHIN or NEAR operator.
63
     *
64
     * The slop is in fact an edit-distance, where the units correspond to
65
     * moves of terms in the query phrase out of position.  For example, to switch
66
     * the order of two words requires two moves (the first move places the words
67
     * atop one another), so to permit re-orderings of phrases, the slop must be
68
     * at least two.
69
     * More exact matches are scored higher than sloppier matches, thus search
70
     * results are sorted by exactness.
71
     *
72
     * The slop is zero by default, requiring exact matches.
73
     *
74
     * @var unknown_type
75
     */
76
    private $_slop;
77
78
    /**
79
     * Result vector.
80
     * Bitset or array of document IDs
81
     * (depending from Bitset extension availability).
82
     *
83
     * @var mixed
84
     */
85
    private $_resVector = null;
86
87
    /**
88
     * Terms positions vectors.
89
     * Array of Arrays:
90
     * term1Id => (docId => array( pos1, pos2, ... ), ...)
91
     * term2Id => (docId => array( pos1, pos2, ... ), ...)
92
     *
93
     * @var array
94
     */
95
    private $_termsPositions = array();
96
97
    /**
98
     * Class constructor.  Create a new prase query.
99
     *
100
     * @param string $field    Field to search.
101
     * @param array  $terms    Terms to search Array of strings.
102
     * @param array  $offsets  Relative term positions. Array of integers.
103
     * @throws Zend_Search_Lucene_Exception
104
     */
105
    public function __construct($terms = null, $offsets = null, $field = null)
106
    {
107
        $this->_slop = 0;
0 ignored issues
show
Documentation Bug introduced by
It seems like 0 of type integer is incompatible with the declared type object<unknown_type> of property $_slop.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
108
109
        if (is_array($terms)) {
110
            $this->_terms = array();
111
            foreach ($terms as $termId => $termText) {
112
                $this->_terms[$termId] = ($field !== null)? new Zend_Search_Lucene_Index_Term($termText, $field):
113
                                                            new Zend_Search_Lucene_Index_Term($termText);
114
            }
115
        } else if ($terms === null) {
116
            $this->_terms = array();
117
        } else {
118
            throw new Zend_Search_Lucene_Exception('terms argument must be array of strings or null');
119
        }
120
121
        if (is_array($offsets)) {
122
            if (count($this->_terms) != count($offsets)) {
123
                throw new Zend_Search_Lucene_Exception('terms and offsets arguments must have the same size.');
124
            }
125
            $this->_offsets = $offsets;
126
        } else if ($offsets === null) {
127
            $this->_offsets = array();
128
            foreach ($this->_terms as $termId => $term) {
129
                $position = count($this->_offsets);
130
                $this->_offsets[$termId] = $position;
131
            }
132
        } else {
133
            throw new Zend_Search_Lucene_Exception('offsets argument must be array of strings or null');
134
        }
135
    }
136
137
    /**
138
     * Set slop
139
     *
140
     * @param integer $slop
141
     */
142
    public function setSlop($slop)
143
    {
144
        $this->_slop = $slop;
0 ignored issues
show
Documentation Bug introduced by
It seems like $slop of type integer is incompatible with the declared type object<unknown_type> of property $_slop.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
145
    }
146
147
148
    /**
149
     * Get slop
150
     *
151
     * @return integer
152
     */
153
    public function getSlop()
154
    {
155
        return $this->_slop;
156
    }
157
158
159
    /**
160
     * Adds a term to the end of the query phrase.
161
     * The relative position of the term is specified explicitly or the one immediately
162
     * after the last term added.
163
     *
164
     * @param Zend_Search_Lucene_Index_Term $term
165
     * @param integer $position
166
     */
167
    public function addTerm(Zend_Search_Lucene_Index_Term $term, $position = null) {
168
        if ((count($this->_terms) != 0)&&(end($this->_terms)->field != $term->field)) {
169
            throw new Zend_Search_Lucene_Exception('All phrase terms must be in the same field: ' .
170
                                                   $term->field . ':' . $term->text);
171
        }
172
173
        $this->_terms[] = $term;
174
        if ($position !== null) {
175
            $this->_offsets[] = $position;
176
        } else if (count($this->_offsets) != 0) {
177
            $this->_offsets[] = end($this->_offsets) + 1;
178
        } else {
179
            $this->_offsets[] = 0;
180
        }
181
    }
182
183
184
    /**
185
     * Returns query term
186
     *
187
     * @return array
188
     */
189
    public function getTerms()
190
    {
191
        return $this->_terms;
192
    }
193
194
195
    /**
196
     * Set weight for specified term
197
     *
198
     * @param integer $num
199
     * @param Zend_Search_Lucene_Search_Weight_Term $weight
200
     */
201
    public function setWeight($num, $weight)
202
    {
203
        $this->_weights[$num] = $weight;
0 ignored issues
show
Bug introduced by
The property _weights does not seem to exist. Did you mean _weight?

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
204
    }
205
206
207
    /**
208
     * Constructs an appropriate Weight implementation for this query.
209
     *
210
     * @param Zend_Search_Lucene $reader
211
     * @return Zend_Search_Lucene_Search_Weight
212
     */
213
    protected function _createWeight($reader)
214
    {
215
        return new Zend_Search_Lucene_Search_Weight_Phrase($this, $reader);
216
    }
217
218
219
    /**
220
     * Calculate result vector
221
     *
222
     * @param Zend_Search_Lucene $reader
223
     */
224 View Code Duplication
    private function _calculateResult($reader)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
225
    {
226
        if (extension_loaded('bitset')) {
227
            foreach( $this->_terms as $termId=>$term ) {
228
                if($this->_resVector === null) {
229
                    $this->_resVector = bitset_from_array($reader->termDocs($term));
230
                } else {
231
                    $this->_resVector = bitset_intersection(
232
                                $this->_resVector,
233
                                bitset_from_array($reader->termDocs($term)) );
234
                }
235
236
                $this->_termsPositions[$termId] = $reader->termPositions($term);
237
            }
238
        } else {
239
            foreach( $this->_terms as $termId=>$term ) {
240
                if($this->_resVector === null) {
241
                    $this->_resVector = array_flip($reader->termDocs($term));
242
                } else {
243
                    $termDocs = array_flip($reader->termDocs($term));
244
                    foreach($this->_resVector as $key=>$value) {
245
                        if (!isset( $termDocs[$key] )) {
246
                            unset( $this->_resVector[$key] );
247
                        }
248
                    }
249
                }
250
251
                $this->_termsPositions[$termId] = $reader->termPositions($term);
252
            }
253
        }
254
    }
255
256
257
    /**
258
     * Score calculator for exact phrase queries (terms sequence is fixed)
259
     *
260
     * @param integer $docId
261
     * @return float
262
     */
263
    public function _exactPhraseFreq($docId)
264
    {
265
        $freq = 0;
266
267
        // Term Id with lowest cardinality
268
        $lowCardTermId = null;
269
270
        // Calculate $lowCardTermId
271
        foreach ($this->_terms as $termId => $term) {
272
            if ($lowCardTermId === null ||
273
                count($this->_termsPositions[$termId][$docId]) <
274
                count($this->_termsPositions[$lowCardTermId][$docId]) ) {
275
                    $lowCardTermId = $termId;
276
                }
277
        }
278
279
        // Walk through positions of the term with lowest cardinality
280
        foreach ($this->_termsPositions[$lowCardTermId][$docId] as $lowCardPos) {
281
            // We expect phrase to be found
282
            $freq++;
283
284
            // Walk through other terms
285
            foreach ($this->_terms as $termId => $term) {
286
                if ($termId != $lowCardTermId) {
287
                    $expectedPosition = $lowCardPos +
288
                                            ($this->_offsets[$termId] -
289
                                             $this->_offsets[$lowCardTermId]);
290
291
                    if (!in_array($expectedPosition, $this->_termsPositions[$termId][$docId])) {
292
                        $freq--;  // Phrase wasn't found.
293
                        break;
294
                    }
295
                }
296
            }
297
        }
298
299
        return $freq;
300
    }
301
302
    /**
303
     * Score calculator for sloppy phrase queries (terms sequence is fixed)
304
     *
305
     * @param integer $docId
306
     * @param Zend_Search_Lucene $reader
307
     * @return float
308
     */
309
    public function _sloppyPhraseFreq($docId, Zend_Search_Lucene $reader)
310
    {
311
        $freq = 0;
312
313
        $phraseQueue = array();
314
        $phraseQueue[0] = array(); // empty phrase
315
        $lastTerm = null;
316
317
        // Walk through the terms to create phrases.
318
        foreach ($this->_terms as $termId => $term) {
319
            $queueSize = count($phraseQueue);
320
            $firstPass = true;
321
322
            // Walk through the term positions.
323
            // Each term position produces a set of phrases.
324
            foreach ($this->_termsPositions[$termId][$docId] as $termPosition ) {
325
                if ($firstPass) {
326
                    for ($count = 0; $count < $queueSize; $count++) {
327
                        $phraseQueue[$count][$termId] = $termPosition;
328
                    }
329
                } else {
330
                    for ($count = 0; $count < $queueSize; $count++) {
331
                        if ($lastTerm !== null &&
332
                            abs( $termPosition - $phraseQueue[$count][$lastTerm] -
333
                                 ($this->_offsets[$termId] - $this->_offsets[$lastTerm])) > $this->_slop) {
334
                            continue;
335
                        }
336
337
                        $newPhraseId = count($phraseQueue);
338
                        $phraseQueue[$newPhraseId]          = $phraseQueue[$count];
339
                        $phraseQueue[$newPhraseId][$termId] = $termPosition;
340
                    }
341
342
                }
343
344
                $firstPass = false;
345
            }
346
            $lastTerm = $termId;
347
        }
348
349
350
        foreach ($phraseQueue as $phrasePos) {
351
            $minDistance = null;
352
353
            for ($shift = -$this->_slop; $shift <= $this->_slop; $shift++) {
354
                $distance = 0;
355
                $start = reset($phrasePos) - reset($this->_offsets) + $shift;
356
357
                foreach ($this->_terms as $termId => $term) {
358
                    $distance += abs($phrasePos[$termId] - $this->_offsets[$termId] - $start);
359
360
                    if($distance > $this->_slop) {
361
                        break;
362
                    }
363
                }
364
365
                if ($minDistance === null || $distance < $minDistance) {
366
                    $minDistance = $distance;
367
                }
368
            }
369
370
            if ($minDistance <= $this->_slop) {
371
                $freq += $reader->getSimilarity()->sloppyFreq($minDistance);
372
            }
373
        }
374
375
        return $freq;
376
    }
377
378
379
    /**
380
     * Score specified document
381
     *
382
     * @param integer $docId
383
     * @param Zend_Search_Lucene $reader
384
     * @return float
385
     */
386
    public function score($docId, $reader)
387
    {
388
        // optimize zero-term case
389
        if (count($this->_terms) == 0) {
390
            return 0;
391
        }
392
393
        if($this->_resVector === null) {
394
            $this->_calculateResult($reader);
395
            $this->_initWeight($reader);
396
        }
397
398
        if ( (extension_loaded('bitset')) ?
399
                bitset_in($this->_resVector, $docId) :
400
                isset($this->_resVector[$docId])  ) {
401
            if ($this->_slop == 0) {
402
                $freq = $this->_exactPhraseFreq($docId);
403
            } else {
404
                $freq = $this->_sloppyPhraseFreq($docId, $reader);
405
            }
406
407
/*
408
            return $reader->getSimilarity()->tf($freq) *
409
                   $this->_weight->getValue() *
410
                   $reader->norm($docId, reset($this->_terms)->field);
411
*/
412
            if ($freq != 0) {
413
                $tf = $reader->getSimilarity()->tf($freq);
414
                $weight = $this->_weight->getValue();
415
                $norm = $reader->norm($docId, reset($this->_terms)->field);
416
417
                return $tf*$weight*$norm;
418
            }
419
        } else {
420
            return 0;
421
        }
422
    }
423
}
424
425