Completed
Push — prado-3.3 ( e90646...0b76d5 )
by Fabio
23:37 queued 03:01
created

Zend_Search_Lucene   C

Complexity

Total Complexity 56

Size/Duplication

Total Lines 510
Duplicated Lines 4.12 %

Coupling/Cohesion

Components 1
Dependencies 14

Importance

Changes 0
Metric Value
dl 21
loc 510
rs 5.5199
c 0
b 0
f 0
wmc 56
lcom 1
cbo 14

19 Methods

Rating   Name   Duplication   Size   Complexity  
B __construct() 0 52 6
A __destruct() 0 8 2
A getIndexWriter() 0 8 2
A getDirectory() 0 4 1
A count() 0 4 1
A find() 0 31 5
A getFieldNames() 0 8 2
B getDocument() 0 56 6
A termDocs() 11 33 5
B termPositions() 10 46 7
A docFreq() 0 12 3
A getSimilarity() 0 4 1
A norm() 0 16 3
A addDocument() 0 8 2
A commit() 0 17 6
A terms() 0 4 1
A hasDeletions() 0 4 1
A delete() 0 2 1
A undeleteAll() 0 2 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like Zend_Search_Lucene often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use Zend_Search_Lucene, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * Zend Framework
4
 *
5
 * LICENSE
6
 *
7
 * This source file is subject to version 1.0 of the Zend Framework
8
 * license, that is bundled with this package in the file LICENSE, and
9
 * is available through the world-wide-web at the following URL:
10
 * http://www.zend.com/license/framework/1_0.txt. If you did not receive
11
 * a copy of the Zend Framework license and are unable to obtain it
12
 * through the world-wide-web, please send a note to [email protected]
13
 * so we can mail you a copy immediately.
14
 *
15
 * @package    Zend_Search_Lucene
16
 * @copyright  Copyright (c) 2005-2006 Zend Technologies USA Inc. (http://www.zend.com)
17
 * @license    http://www.zend.com/license/framework/1_0.txt Zend Framework License version 1.0
18
 */
19
20
21
/** Zend_Search_Lucene_Exception */
22
require_once 'Zend/Search/Lucene/Exception.php';
23
24
/** Zend_Search_Lucene_Document */
25
require_once 'Zend/Search/Lucene/Document.php';
26
27
/** Zend_Search_Lucene_Storage_Directory */
28
require_once 'Zend/Search/Lucene/Storage/Directory/Filesystem.php';
29
30
/** Zend_Search_Lucene_Index_Term */
31
require_once 'Zend/Search/Lucene/Index/Term.php';
32
33
/** Zend_Search_Lucene_Index_TermInfo */
34
require_once 'Zend/Search/Lucene/Index/TermInfo.php';
35
36
/** Zend_Search_Lucene_Index_SegmentInfo */
37
require_once 'Zend/Search/Lucene/Index/SegmentInfo.php';
38
39
/** Zend_Search_Lucene_Index_FieldInfo */
40
require_once 'Zend/Search/Lucene/Index/FieldInfo.php';
41
42
/** Zend_Search_Lucene_Index_Writer */
43
require_once 'Zend/Search/Lucene/Index/Writer.php';
44
45
/** Zend_Search_Lucene_Search_QueryParser */
46
require_once 'Zend/Search/Lucene/Search/QueryParser.php';
47
48
/** Zend_Search_Lucene_Search_QueryHit */
49
require_once 'Zend/Search/Lucene/Search/QueryHit.php';
50
51
/** Zend_Search_Lucene_Search_Similarity */
52
require_once 'Zend/Search/Lucene/Search/Similarity.php';
53
54
55
/**
56
 * @package    Zend_Search_Lucene
57
 * @copyright  Copyright (c) 2005-2006 Zend Technologies USA Inc. (http://www.zend.com)
58
 * @license    http://www.zend.com/license/framework/1_0.txt Zend Framework License version 1.0
59
 */
60
class Zend_Search_Lucene
61
{
62
    /**
63
     * File system adapter.
64
     *
65
     * @var Zend_Search_Lucene_Storage_Directory
66
     */
67
    private $_directory = null;
68
69
    /**
70
     * File system adapter closing option
71
     *
72
     * @var boolean
73
     */
74
    private $_closeDirOnExit = true;
75
76
    /**
77
     * Writer for this index, not instantiated unless required.
78
     *
79
     * @var Zend_Search_Lucene_Index_Writer
80
     */
81
    private $_writer = null;
82
83
    /**
84
     * Array of Zend_Search_Lucene_Index_SegmentInfo objects for this index.
85
     *
86
     * @var array Zend_Search_Lucene_Index_SegmentInfo
87
     */
88
    private $_segmentInfos = array();
89
90
    /**
91
     * Number of documents in this index.
92
     *
93
     * @var integer
94
     */
95
    private $_docCount = 0;
96
97
98
    /**
99
     * Opens the index.
100
     *
101
     * IndexReader constructor needs Directory as a parameter. It should be
102
     * a string with a path to the index folder or a Directory object.
103
     *
104
     * @param mixed $directory
105
     * @throws Zend_Search_Lucene_Exception
106
     */
107
    public function __construct($directory = null, $create = false)
108
    {
109
        if ($directory === null) {
110
            throw new Zend_Search_Exception('No index directory specified');
111
        }
112
113
        if ($directory instanceof Zend_Search_Lucene_Storage_Directory_Filesystem) {
114
            $this->_directory      = $directory;
115
            $this->_closeDirOnExit = false;
116
        } else {
117
            $this->_directory      = new Zend_Search_Lucene_Storage_Directory_Filesystem($directory);
118
            $this->_closeDirOnExit = true;
119
        }
120
121
        if ($create) {
122
            $this->_writer = new Zend_Search_Lucene_Index_Writer($this->_directory, true);
123
        } else {
124
            $this->_writer = null;
125
        }
126
127
        $this->_segmentInfos = array();
128
129
        $segmentsFile = $this->_directory->getFileObject('segments');
130
131
        $format = $segmentsFile->readInt();
132
133
        if ($format != (int)0xFFFFFFFF) {
134
            throw new Zend_Search_Lucene_Exception('Wrong segments file format');
135
        }
136
137
        // read version
138
        $segmentsFile->readLong();
139
140
        // read counter
141
        $segmentsFile->readInt();
142
143
        $segments = $segmentsFile->readInt();
144
145
        $this->_docCount = 0;
146
147
        // read segmentInfos
148
        for ($count = 0; $count < $segments; $count++) {
149
            $segName = $segmentsFile->readString();
150
            $segSize = $segmentsFile->readInt();
151
            $this->_docCount += $segSize;
152
153
            $this->_segmentInfos[$count] =
154
                                new Zend_Search_Lucene_Index_SegmentInfo($segName,
155
                                                                         $segSize,
156
                                                                         $this->_directory);
157
        }
158
    }
159
160
161
    /**
162
     * Object destructor
163
     */
164
    public function __destruct()
165
    {
166
        $this->commit();
167
168
        if ($this->_closeDirOnExit) {
169
            $this->_directory->close();
170
        }
171
    }
172
173
    /**
174
     * Returns an instance of Zend_Search_Lucene_Index_Writer for the index
175
     *
176
     * @return Zend_Search_Lucene_Index_Writer
177
     */
178
    public function getIndexWriter()
179
    {
180
        if (!$this->_writer instanceof Zend_Search_Lucene_Index_Writer) {
181
            $this->_writer = new Zend_Search_Lucene_Index_Writer($this->_directory);
182
        }
183
184
        return $this->_writer;
185
    }
186
187
188
    /**
189
     * Returns the Zend_Search_Lucene_Storage_Directory instance for this index.
190
     *
191
     * @return Zend_Search_Lucene_Storage_Directory
192
     */
193
    public function getDirectory()
194
    {
195
        return $this->_directory;
196
    }
197
198
199
    /**
200
     * Returns the total number of documents in this index.
201
     *
202
     * @return integer
203
     */
204
    public function count()
205
    {
206
        return $this->_docCount;
207
    }
208
209
210
    /**
211
     * Performs a query against the index and returns an array
212
     * of Zend_Search_Lucene_Search_QueryHit objects.
213
     * Input is a string or Zend_Search_Lucene_Search_Query.
214
     *
215
     * @param mixed $query
216
     * @return array ZSearchHit
217
     */
218
    public function find($query)
219
    {
220
        if (is_string($query)) {
221
            $query = Zend_Search_Lucene_Search_QueryParser::parse($query);
222
        }
223
224
        if (!$query instanceof Zend_Search_Lucene_Search_Query) {
225
            throw new Zend_Search_Lucene_Exception('Query must be a string or Zend_Search_Lucene_Search_Query object');
226
        }
227
228
        $this->commit();
229
230
        $hits = array();
231
        $scores = array();
232
233
        $docNum = $this->count();
234
        for( $count=0; $count < $docNum; $count++ ) {
235
            $docScore = $query->score( $count, $this);
236
            if( $docScore != 0 ) {
237
                $hit = new Zend_Search_Lucene_Search_QueryHit($this);
238
                $hit->id = $count;
239
                $hit->score = $docScore;
240
241
                $hits[] = $hit;
242
                $scores[] = $docScore;
243
            }
244
        }
245
        array_multisort($scores, SORT_DESC, SORT_REGULAR, $hits);
246
247
        return $hits;
248
    }
249
250
251
    /**
252
     * Returns a list of all unique field names that exist in this index.
253
     *
254
     * @param boolean $indexed
255
     * @return array
256
     */
257
    public function getFieldNames($indexed = false)
258
    {
259
        $result = array();
260
        foreach( $this->_segmentInfos as $segmentInfo ) {
261
            $result = array_merge($result, $segmentInfo->getFields($indexed));
262
        }
263
        return $result;
264
    }
265
266
267
    /**
268
     * Returns a Zend_Search_Lucene_Document object for the document
269
     * number $id in this index.
270
     *
271
     * @param integer|Zend_Search_Lucene_Search_QueryHit $id
272
     * @return Zend_Search_Lucene_Document
273
     */
274
    public function getDocument($id)
275
    {
276
        if ($id instanceof Zend_Search_Lucene_Search_QueryHit) {
277
            /* @var $id Zend_Search_Lucene_Search_QueryHit */
278
            $id = $id->id;
279
        }
280
281
        if ($id >= $this->_docCount) {
282
            /**
283
             * @todo exception here?
284
             */
285
            return null;
286
        }
287
288
        $segCount = 0;
289
        $nextSegmentStartId = $this->_segmentInfos[ 0 ]->count();
290
        while( $nextSegmentStartId <= $id ) {
291
               $segCount++;
292
               $nextSegmentStartId += $this->_segmentInfos[ $segCount ]->count();
293
        }
294
        $segmentStartId = $nextSegmentStartId - $this->_segmentInfos[ $segCount ]->count();
295
296
        $fdxFile = $this->_segmentInfos[ $segCount ]->openCompoundFile('.fdx');
297
        $fdxFile->seek( ($id-$segmentStartId)*8, SEEK_CUR );
298
        $fieldValuesPosition = $fdxFile->readLong();
299
300
        $fdtFile = $this->_segmentInfos[ $segCount ]->openCompoundFile('.fdt');
301
        $fdtFile->seek( $fieldValuesPosition, SEEK_CUR );
302
        $fieldCount = $fdtFile->readVInt();
303
304
        $doc = new Zend_Search_Lucene_Document();
305
        for( $count = 0; $count < $fieldCount; $count++ ) {
306
            $fieldNum = $fdtFile->readVInt();
307
            $bits = $fdtFile->readByte();
308
309
            $fieldInfo = $this->_segmentInfos[ $segCount ]->getField($fieldNum);
310
311
            if( !($bits & 2) ) { // Text data
312
                $field = new Zend_Search_Lucene_Field($fieldInfo->name,
313
                                                      $fdtFile->readString(),
314
                                                      true,
315
                                                      $fieldInfo->isIndexed,
316
                                                      $bits & 1 );
317
            } else {
318
                $field = new Zend_Search_Lucene_Field($fieldInfo->name,
319
                                                      $fdtFile->readBinary(),
320
                                                      true,
321
                                                      $fieldInfo->isIndexed,
322
                                                      $bits & 1 );
323
            }
324
325
            $doc->addField($field);
326
        }
327
328
        return $doc;
329
    }
330
331
332
    /**
333
     * Returns an array of all the documents which contain term.
334
     *
335
     * @param Zend_Search_Lucene_Index_Term $term
336
     * @return array
337
     */
338
    public function termDocs(Zend_Search_Lucene_Index_Term $term)
339
    {
340
        $result = array();
341
        $segmentStartDocId = 0;
342
343
        foreach ($this->_segmentInfos as $segInfo) {
344
            $termInfo = $segInfo->getTermInfo($term);
345
346
            if (!$termInfo instanceof Zend_Search_Lucene_Index_TermInfo) {
347
                $segmentStartDocId += $segInfo->count();
348
                continue;
349
            }
350
351
            $frqFile = $segInfo->openCompoundFile('.frq');
352
            $frqFile->seek($termInfo->freqPointer,SEEK_CUR);
353
            $docId = 0;
354 View Code Duplication
            for( $count=0; $count < $termInfo->docFreq; $count++ ) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
355
                $docDelta = $frqFile->readVInt();
356
                if( $docDelta % 2 == 1 ) {
357
                    $docId += ($docDelta-1)/2;
358
                } else {
359
                    $docId += $docDelta/2;
360
                    // read freq
361
                    $frqFile->readVInt();
362
                }
363
                $result[] = $segmentStartDocId + $docId;
364
            }
365
366
            $segmentStartDocId += $segInfo->count();
367
        }
368
369
        return $result;
370
    }
371
372
373
    /**
374
     * Returns an array of all term positions in the documents.
375
     * Return array structure: array( docId => array( pos1, pos2, ...), ...)
376
     *
377
     * @param Zend_Search_Lucene_Index_Term $term
378
     * @return array
379
     */
380
    public function termPositions(Zend_Search_Lucene_Index_Term $term)
381
    {
382
        $result = array();
383
        $segmentStartDocId = 0;
384
        foreach( $this->_segmentInfos as $segInfo ) {
385
            $termInfo = $segInfo->getTermInfo($term);
386
387
            if (!$termInfo instanceof Zend_Search_Lucene_Index_TermInfo) {
388
                $segmentStartDocId += $segInfo->count();
389
                continue;
390
            }
391
392
            $frqFile = $segInfo->openCompoundFile('.frq');
393
            $frqFile->seek($termInfo->freqPointer,SEEK_CUR);
394
            $freqs = array();
395
            $docId = 0;
396
397 View Code Duplication
            for( $count = 0; $count < $termInfo->docFreq; $count++ ) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
398
                $docDelta = $frqFile->readVInt();
399
                if( $docDelta % 2 == 1 ) {
400
                    $docId += ($docDelta-1)/2;
401
                    $freqs[ $docId ] = 1;
402
                } else {
403
                    $docId += $docDelta/2;
404
                    $freqs[ $docId ] = $frqFile->readVInt();
405
                }
406
            }
407
408
            $prxFile = $segInfo->openCompoundFile('.prx');
409
            $prxFile->seek($termInfo->proxPointer,SEEK_CUR);
410
            foreach ($freqs as $docId => $freq) {
411
                $termPosition = 0;
412
                $positions = array();
413
414
                for ($count = 0; $count < $freq; $count++ ) {
415
                    $termPosition += $prxFile->readVInt();
416
                    $positions[] = $termPosition;
417
                }
418
                $result[ $segmentStartDocId + $docId ] = $positions;
419
            }
420
421
            $segmentStartDocId += $segInfo->count();
422
        }
423
424
        return $result;
425
    }
426
427
428
    /**
429
     * Returns the number of documents in this index containing the $term.
430
     *
431
     * @param Zend_Search_Lucene_Index_Term $term
432
     * @return integer
433
     */
434
    public function docFreq(Zend_Search_Lucene_Index_Term $term)
435
    {
436
        $result = 0;
437
        foreach ($this->_segmentInfos as $segInfo) {
438
            $termInfo = $segInfo->getTermInfo($term);
439
            if ($termInfo !== null) {
440
                $result += $termInfo->docFreq;
441
            }
442
        }
443
444
        return $result;
445
    }
446
447
448
    /**
449
     * Retrive similarity used by index reader
450
     *
451
     * @return Zend_Search_Lucene_Search_Similarity
452
     */
453
    public function getSimilarity()
454
    {
455
        return Zend_Search_Lucene_Search_Similarity::getDefault();
456
    }
457
458
459
    /**
460
     * Returns a normalization factor for "field, document" pair.
461
     *
462
     * @param integer $id
463
     * @param string $fieldName
464
     * @return Zend_Search_Lucene_Document
465
     */
466
    public function norm( $id, $fieldName )
467
    {
468
        if( $id >= $this->_docCount )
469
            return null;
470
471
        $segCount = 0;
472
        $nextSegmentStartId = $this->_segmentInfos[ 0 ]->count();
473
        while( $nextSegmentStartId <= $id ) {
474
               $segCount++;
475
               $nextSegmentStartId += $this->_segmentInfos[ $segCount ]->count();
476
        }
477
478
        $segmentStartId = $nextSegmentStartId - $this->_segmentInfos[ $segCount ]->count();
479
480
        return $this->_segmentInfos[ $segCount ]->norm($id - $segmentStartId, $fieldName);
481
    }
482
483
484
    /**
485
     * Adds a document to this index.
486
     *
487
     * @param Zend_Search_Lucene_Document $document
488
     */
489
    public function addDocument(Zend_Search_Lucene_Document $document)
490
    {
491
        if (!$this->_writer instanceof Zend_Search_Lucene_Index_Writer) {
492
            $this->_writer = new Zend_Search_Lucene_Index_Writer($this->_directory);
493
        }
494
495
        $this->_writer->addDocument($document);
496
    }
497
498
499
    /**
500
     * Commit changes resulting from delete() or undeleteAll() operations.
501
     *
502
     * @todo delete() and undeleteAll processing.
503
     */
504
    public function commit()
505
    {
506
        if ($this->_writer !== null) {
507
            foreach ($this->_writer->commit() as $segmentName => $segmentInfo) {
508
                if ($segmentInfo !== null) {
509
                    $this->_segmentInfos[] = $segmentInfo;
510
                    $this->_docCount += $segmentInfo->count();
511
                } else {
512
                    foreach ($this->_segmentInfos as $segId => $segInfo) {
513
                        if ($segInfo->getName() == $segmentName) {
514
                            unset($this->_segmentInfos[$segId]);
515
                        }
516
                    }
517
                }
518
            }
519
        }
520
    }
521
522
523
    /*************************************************************************
524
    @todo UNIMPLEMENTED
525
    *************************************************************************/
526
527
    /**
528
     * Returns an array of all terms in this index.
529
     *
530
     * @todo Implementation
531
     * @return array
532
     */
533
    public function terms()
534
    {
535
        return array();
536
    }
537
538
539
    /**
540
     * Returns true if any documents have been deleted from this index.
541
     *
542
     * @todo Implementation
543
     * @return boolean
544
     */
545
    public function hasDeletions()
546
    {
547
        return false;
548
    }
549
550
551
    /**
552
     * Deletes a document from the index.  $doc may contain a Zend_Search_Lucene_Document
553
     * or the number of the document to delete.
554
     *
555
     * @todo Implementation
556
     * @param mixed $item_to_del
0 ignored issues
show
Bug introduced by
There is no parameter named $item_to_del. Was it maybe removed?

This check looks for PHPDoc comments describing methods or function parameters that do not exist on the corresponding method or function.

Consider the following example. The parameter $italy is not defined by the method finale(...).

/**
 * @param array $germany
 * @param array $island
 * @param array $italy
 */
function finale($germany, $island) {
    return "2:1";
}

The most likely cause is that the parameter was removed, but the annotation was not.

Loading history...
557
     */
558
    public function delete($doc)
559
    {}
560
561
562
    /**
563
     * Undeletes all documents currently marked as deleted in this index.
564
     *
565
     * @todo Implementation
566
     */
567
    public function undeleteAll()
568
    {}
569
}