Completed
Pull Request — master (#28)
by
unknown
08:00
created

FileTextExtractor::for_file()   D

Complexity

Conditions 9
Paths 6

Size

Total Lines 27
Code Lines 13

Duplication

Lines 0
Ratio 0 %

Importance

Changes 6
Bugs 1 Features 1
Metric Value
c 6
b 1
f 1
dl 0
loc 27
rs 4.909
cc 9
eloc 13
nc 6
nop 1
1
<?php
2
3
/**
4
 * A decorator for File or a subclass that provides a method for extracting full-text from the file's external contents.
5
 * @author mstephens
6
 *
7
 */
8
abstract class FileTextExtractor extends Object
0 ignored issues
show
Coding Style Compatibility introduced by
PSR1 recommends that each class must be in a namespace of at least one level to avoid collisions.

You can fix this by adding a namespace to your class:

namespace YourVendor;

class YourClass { }

When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.

Loading history...
9
{
10
    /**
11
     * Set priority from 0-100.
12
     * The highest priority extractor for a given content type will be selected.
13
     *
14
     * @config
15
     * @var integer
16
     */
17
    private static $priority = 50;
0 ignored issues
show
Unused Code introduced by
The property $priority is not used and could be removed.

This check marks private properties in classes that are never used. Those properties can be removed.

Loading history...
18
19
    /**
20
     * Cache of extractor class names, sorted by priority
21
     *
22
     * @var array
23
     */
24
    protected static $sorted_extractor_classes = null;
25
26
    /**
27
     * Gets the list of prioritised extractor classes
28
     *
29
     * @return array
30
     */
31
    protected static function get_extractor_classes()
32
    {
33
        // Check cache
34
        if (self::$sorted_extractor_classes) {
0 ignored issues
show
Bug Best Practice introduced by
The expression self::$sorted_extractor_classes of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
35
            return self::$sorted_extractor_classes;
36
        }
37
        
38
        // Generate the sorted list of extractors on demand.
39
        $classes = ClassInfo::subclassesFor("FileTextExtractor");
40
        array_shift($classes);
41
        $classPriorities = array();
42
        foreach ($classes as $class) {
43
            $classPriorities[$class] = Config::inst()->get($class, 'priority');
44
        }
45
        arsort($classPriorities);
46
47
        // Save classes
48
        $sortedClasses = array_keys($classPriorities);
49
        return self::$sorted_extractor_classes = $sortedClasses;
50
    }
51
52
    /**
53
     * Get the text file extractor for the given class
54
     *
55
     * @param string $class
56
     * @return FileTextExtractor
57
     */
58
    protected static function get_extractor($class)
59
    {
60
        return Injector::inst()->get($class);
61
    }
62
63
    /**
64
     * Attempt to detect mime type for given file
65
     *
66
     * @param string $path
67
     * @return string Mime type if found
68
     */
69
    protected static function get_mime($path)
70
    {
71
        $file = new Symfony\Component\HttpFoundation\File\File($path);
72
73
        return $file->getMimeType();
74
    }
75
76
    /**
77
     * @param string $path
78
     * @return FileTextExtractor|null
79
     */
80
    public static function for_file($path)
81
    {
82
        if (!file_exists($path) || is_dir($path)) {
83
            return;
84
        }
85
86
        $extension = pathinfo($path, PATHINFO_EXTENSION);
87
        $mime = self::get_mime($path);
88
        foreach (self::get_extractor_classes() as $className) {
89
            $extractor = self::get_extractor($className);
90
91
            // Skip unavailable extractors
92
            if (!$extractor->isAvailable()) {
93
                continue;
94
            }
95
96
            // Check extension
97
            if ($extension && $extractor->supportsExtension($extension)) {
98
                return $extractor;
99
            }
100
101
            // Check mime
102
            if ($mime && $extractor->supportsMime($mime)) {
103
                return $extractor;
104
            }
105
        }
106
    }
107
108
    /**
109
     * Checks if the extractor is supported on the current environment,
110
     * for example if the correct binaries or libraries are available.
111
     * 
112
     * @return boolean
113
     */
114
    abstract public function isAvailable();
115
116
    /**
117
     * Determine if this extractor supports the given extension.
118
     * If support is determined by mime/type only, then this should return false.
119
     *
120
     * @param string $extension
121
     * @return boolean
122
     */
123
    abstract public function supportsExtension($extension);
124
125
    /**
126
     * Determine if this extractor suports the given mime type.
127
     * Will only be called if supportsExtension returns false.
128
     * 
129
     * @param string $mime
130
     * @return boolean
131
     */
132
    abstract public function supportsMime($mime);
133
134
    /**
135
     * Given a file path, extract the contents as text.
136
     * 
137
     * @param string $path
138
     * @return string
139
     */
140
    abstract public function getContent($path);
141
}
142
143
class FileTextExtractor_Exception extends Exception
0 ignored issues
show
Coding Style Compatibility introduced by
PSR1 recommends that each class should be in its own file to aid autoloaders.

Having each class in a dedicated file usually plays nice with PSR autoloaders and is therefore a well established practice. If you use other autoloaders, you might not want to follow this rule.

Loading history...
Coding Style Compatibility introduced by
PSR1 recommends that each class must be in a namespace of at least one level to avoid collisions.

You can fix this by adding a namespace to your class:

namespace YourVendor;

class YourClass { }

When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.

Loading history...
144
{
145
}
146