Levenshtein::run()   B
last analyzed

Complexity

Conditions 10
Paths 21

Size

Total Lines 38
Code Lines 18

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 110

Importance

Changes 2
Bugs 0 Features 0
Metric Value
cc 10
eloc 18
c 2
b 0
f 0
nc 21
nop 0
dl 0
loc 38
ccs 0
cts 19
cp 0
crap 110
rs 7.6666

How to fix   Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/*
3
 * Copyright (c) 2019 LinkedData.Center. All rights reserved.
4
 */
5
namespace BOTK\Reasoner;
6
7
/**
8
 * input sensitivity: an acceptable rate for changes (0-1)
9
 * Input stream specification:
10
 *   The input stream is divided in two part, the first part contains, one for row,
11
 *   a couple of object Uri,  string
12
 *   then there is an empty line
13
 *   the second part contains, one for row, a couple of an subject uri and a token to be searched in the string
14
 *   
15
 * Output stream:
16
 *   rules to insert in a graphName, RDF statements that links subject and object having  
17
 *   the minimun Levenshtein distance evaluated on string tokens
18
 */
19
class Levenshtein extends AbstractReasoner
20
{
21
    protected $sensitivity=0.2;
22
   
23
    public function setSensitivity( float $sensitivity )
24
    {
25
        assert( $sentistivity >= 0 && $sentistivity <=1 );
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $sentistivity does not exist. Did you maybe mean $sensitivity?
Loading history...
26
        
27
        $this->sensitivity = $sensitivity;
28
        return $this ;
29
    }
30
      
31
 		
32
    /**
33
	 * targets must be an hash of tokenized strings
34
     */
35
	protected function findClosestUriToToken( $token, array $targets) 
36
	{
37
	    $tokenLength=strlen($token);
38
	    $absSensitivity= round($tokenLength * $this->sensitivity);
39
	    $minLength=max( 1, $tokenLength - $absSensitivity);
40
	    $maxLength=$tokenLength + $absSensitivity;
41
	    $closestTarget = null;
42
	    $shortest = -1;
43
	    foreach($targets as $targetUri=>$targetTokens){
44
	        if( $shortest === 0 ) break; // a perfect match found.
45
	        foreach ($targetTokens as $targetToken ){
46
	            $targetTokenLength=strlen($targetToken);
47
	            if( ($targetTokenLength >= $minLength) && ($targetTokenLength <= $maxLength)) {
48
	                $lev = levenshtein($token, $targetToken);
49
	                if ($lev <= $shortest || $shortest < 0) {
50
	                    $closestTarget  = $targetUri;
51
	                    $shortest = $lev;
52
	                }
53
	                if( $shortest === 0 ) break; // a perfect match found.
54
	            }
55
	        }
56
	    }
57
	    
58
	    if( $closestTarget && ($shortest <= $absSensitivity) ){
59
	        return $closestTarget;
60
	    } else {
61
	        return false;
62
	    }
63
	}
64
	
65
	
66
	public function run ()
67
	{
68
	    // analyze first input stream part: read targets
69
	    // only tokens with length > sensitivity are considered.
70
	    $targets=[];
71
	    
72
	    while (($data=fgetcsv($this->inputStream)) && isset($data[1])) {
73
	        assert( !empty($data[0]) && !empty($data[1]) ) ;
74
	        
75
	        $subjectUri=$data[0];
76
	        $string=$data[1];
77
	        
78
	        // tokenize string
79
	        $tokenizedString=[];
80
	        $tok= strtok($string, ' ');
81
	        while ($tok !== false) {
82
	            // ignore too short tokens
83
	            if( floor(strlen($tok)*$this->sensitivity)>0) { $tokenizedString[]=strtolower($tok);}
84
                $tok = strtok(' ');
85
	        }
86
	        
87
	        if( !empty($tokenizedString) ) { $targets[$subjectUri]=$tokenizedString;}
88
	    }
89
	    
90
	    //analyze second input stream part: token to be searched in targets
91
	    fwrite($this->outputStream, "INSERT DATA { GRAPH <$this->graphName> {\n");
92
	    
93
	    while (($data = fgetcsv($this->inputStream)) !== FALSE) {
94
	       
95
	        assert(!empty($data[0]) && !empty($data[1]) ) ;
96
	        
97
	        list ($uri, $token)=$data;
98
	        if( $closestUri=$this->findClosestUriToToken( strtolower($token), $targets) ){
99
	            fprintf($this->outputStream, "<%s> <%s> <%s>.\n", $closestUri, $this->property, $uri);
100
	        }
101
	        
102
	    }
103
	    fwrite($this->outputStream, "}}");	    
104
	}
105
}