Completed
Push — master ( da2115...dd0feb )
by Andreas
18:06
created

document::process_datamanager()   D

Complexity

Conditions 21
Paths 119

Size

Total Lines 67
Code Lines 46

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 26
CRAP Score 54.1889

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 21
eloc 46
c 1
b 0
f 0
nc 119
nop 0
dl 0
loc 67
ccs 26
cts 45
cp 0.5778
crap 54.1889
rs 4.0083

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * @author The Midgard Project, http://www.midgard-project.org
4
 * @copyright The Midgard Project, http://www.midgard-project.org
5
 * @license http://www.gnu.org/licenses/lgpl.html GNU Lesser General Public License
6
 */
7
8
namespace midcom\datamanager\indexer;
9
10
use Symfony\Component\Form\FormView;
11
use midcom\datamanager\datamanager;
12
use midcom_services_indexer_document_midcom;
13
use midcom_error;
14
15
/**
16
 * This class is geared to ease indexing of datamanager driven documents. The
17
 * user invoking the indexing must have full read permissions to the object.
18
 *
19
 * <b>Basic indexing operation</b>
20
 *
21
 * This class uses a number of conventions, see below, to merge an existing
22
 * datamanager driven document into an indexing capable document. It requires
23
 * the callee to instantiate the datamanager, as this class would have no
24
 * idea where to take the schema database from.
25
 *
26
 * The RI (the GUID) from the base class is left untouched.
27
 *
28
 * <b>Indexing field defaults:</b>
29
 *
30
 * Unless you specify anything else explicitly in the schema,
31
 * the class will merge all text based fields together to form the <i>content</i>
32
 * field of the index record, to allow for easy searching of the document.
33
 * This will *not* include any metadata like keywords or summaries.
34
 *
35
 * If the schema contains a field <i>abstract</i>, it will also be used as
36
 * abstract field for the indexing process. In the same way, fields named
37
 * <i>title</i> or <i>author</i> will be used for the index document's title
38
 * or author respectively. The contents of abstract, title and author will also
39
 * be appended to the content field at the end of the object construction,
40
 * easing searching over this fields.
41
 *
42
 * If no abstract field is present, the first 200 characters of the content
43
 * area are used instead.
44
 *
45
 * Not all types can be indexed, check the various types in question about their
46
 * indexing capabilities. In general, if the system should index any non-text
47
 * field, it will use the CSV representation for implicit conversion.
48
 *
49
 * Metadata processing is done by the base class.
50
 *
51
 * <b>Document title:</b>
52
 *
53
 * You should either have an auto-indexed title field, or an assortment
54
 * of other fields manually assigned to index to the title field.
55
 *
56
 * <b>Configurability using the Datamanager schema:</b>
57
 *
58
 * You can decorate datamanager fields with various directives influencing
59
 * the indexing. See the Datamanager's schema documentation for details.
60
 * Basically, you can choose from the following indexing methods using the
61
 * key 'index_method' for each field:
62
 *
63
 * - The default <i>auto</i> mode will use the above guidelines to determine
64
 *   the indexing destination automatically, adding data to the content, abstract,
65
 *   title and author fields respectively.
66
 * - You can specify <i>abstract</i>, <i>content</i>, <i>title</i> or
67
 *   <i>author</i> to indicate that the field should be used for the indicated
68
 *   document fields. The content selector may be specified more than once,
69
 *   indicating that the content of the relevant fields should be merged.
70
 * - Any date field can be indexed into its own, range-filterable field using
71
 *   the <i>date</i> method. In this case, two document fields will be created
72
 *   actually. One containing the filterable timestamp named directly after
73
 *   the schema field, and a second one, having the _TS postfix which is set as
74
 *   noindex containing the plain timestamp.
75
 * - Finally, you can explicitly index a field as a separate document field
76
 *   using one of the five field types <i>keyword</i>, <i>unindexed</i>,
77
 *   <i>unstored</i> or <i>text</i>. You can further control if the content
78
 *   of these fields is also added to the main content field. This is useful
79
 *   if you want to have fields searchable both by explicit field specification
80
 *   and the default field for simpler searches. This is controlled by setting
81
 *   the boolean key 'index_merge_with_content' in the field, which defaults
82
 *   to true.
83
 * - <i>noindex</i> will prevent indexing of this field.
84
 *
85
 * The documents type is "midcom_datamanager".
86
 *
87
 * @see midcom_services_indexer
88
 */
89
class document extends midcom_services_indexer_document_midcom
90
{
91
    /**
92
     * The datamanager instance of the document we need to index.
93
     *
94
     * @var datamanager
95
     */
96
    private $datamanager;
97
98
    /**
99
     * The constructor initializes the member variables and invokes
100
     * _process_datamanager, which will read and process the information
101
     * out of that instance.
102
     *
103
     * The document is ready for indexing after construction. On any
104
     * critical error, midcom_error is triggered.
105
     *
106
     * @param datamanager $datamanager The fully initialized datamanager instance to use
107
     */
108 6
    public function __construct($datamanager)
109
    {
110 6
        parent::__construct($datamanager->get_storage()->get_value());
111
112 6
        $this->_set_type('datamanager');
113
114 6
        $this->datamanager = $datamanager;
115
116 6
        $this->process_datamanager();
117 6
        $this->complete_fields();
118 6
    }
119
120
    /**
121
     * Completes all fields which are not yet complete:
122
     *
123
     * content is completed with author, title and, if necessary, abstract.
124
     *
125
     * The title is set to the documents' URL in case that no title is set yet. The title
126
     * is not added to the content field in that case.
127
     */
128 6
    private function complete_fields()
129
    {
130 6
        $this->content .= "{$this->author}\n{$this->title}\n";
131
132
        // Add the abstract only if we haven't done so already.
133 6
        if (strstr($this->abstract, $this->content) === false) {
134 6
            $this->content .= "{$this->abstract}\n";
135
        }
136
137 6
        if (! $this->title) {
138 1
            $this->title = $this->document_url;
139
        }
140 6
    }
141
142
    /**
143
     * Processes the information contained in the datamanager instance.
144
     *
145
     * The function iterates over the fields in the schema, and processes them
146
     * according to the rules given in the introduction.
147
     */
148 6
    private function process_datamanager()
149
    {
150 6
        $renderer = $this->datamanager->get_renderer('view');
151 6
        foreach ($renderer->get_view() as $name => $field) {
152 6
            $method = $field->vars['index_method'];
153 6
            if ($method == 'auto') {
154 6
                $method = $this->resolve_auto_method($field->vars['name']);
155
            }
156
157
            switch ($method) {
158 6
                case 'abstract':
159 6
                case 'title':
160 6
                case 'author':
161 6
                    $this->{$method} = $renderer->widget($field);
162 6
                    break;
163
164 6
                case 'content':
165 6
                    $this->content .= $renderer->widget($field) . "\n";
166 6
                    break;
167
168 6
                case 'date':
169
                    $this->add_as_date_field($field);
170
                    break;
171
172 6
                case 'attachment':
173
                    if (!empty($field->vars['value'])) {
174
                        //only index the first attachment for now
175
                        $attachment = array_shift($field->vars['value']);
176
                        if (   !$attachment instanceof \midcom_db_attachment
177
                            && !empty($attachment['object'])) {
178
                            //This is the form edit case
179
                            //@todo: In create case, nothing is found currently
180
                            $attachment = $attachment['object'];
181
                        }
182
                        if ($attachment instanceof \midcom_db_attachment) {
183
                            $att_doc = new \midcom_services_indexer_document_attachment($attachment);
184
                            $this->content .= $att_doc->content;
185
                            $this->abstract .= $att_doc->abstract;
186
                        }
187
                    }
188
189
                    break;
190
191 6
                case 'unstored':
192 6
                case 'unindexed':
193 6
                case 'text':
194 6
                case 'keyword':
195
                    $data = $renderer->widget($field);
196
                    $function = 'add_' . $method;
197
                    $this->$function($name, $data);
198
                    if ($field->vars['index_merge_with_content']) {
199
                        $this->content .= $data . "\n";
200
                    }
201
                    break;
202
203 6
                case 'noindex':
204 6
                    break;
205
206
                default:
207 6
                    throw new midcom_error(" Unknown indexing method {$method} for field {$name} discovered, aborting.");
208
            }
209
        }
210
211 6
        if ($this->abstract == '') {
212 6
            $this->abstract = $this->html2text($this->content);
213 6
            if (mb_strlen($this->abstract) > 200) {
214
                $this->abstract = mb_substr($this->abstract, 0, 200) . ' ...';
215
            }
216
        }
217 6
    }
218
219
    /**
220
     * This function tries to convert the $field into a date
221
     * representation. Unixdate fields are used directly (localtime is used,
222
     * not GMT), other fields will be parsed with strtodate.
223
     *
224
     * Invalid strings which are not parseable using strtotime will be
225
     * stored as a "0" timestamp.
226
     *
227
     * Be aware, that this will work only for current dates in range of an
228
     * UNIX timestamp. For all other cases you should use an ISO 8601 representation,
229
     * which should work as well with Lucene range queries.
230
     *
231
     * @param FormView $field The field that should be stored
232
     */
233
    private function add_as_date_field(FormView $field)
234
    {
235
        if (is_array($field->vars['value']) && array_key_exists('date', $field->vars['value'])) {
236
            $timestamp = 0;
237
            if (!empty($field->vars['value']['date'])) {
238
                $timestamp = $field->vars['value']['date']->format('U');
239
            }
240
            $this->add_date_pair($field->vars['name'], $timestamp);
241
        } else {
242
            $string = (string) $field->vars['value'];
243
            $timestamp = strtotime($string);
244
            if ($timestamp === -1) {
245
                debug_add("The string representation of the field {$field->vars['name']} could not be parsed into a timestamp; treating as 0.", MIDCOM_LOG_INFO);
246
                debug_print_r('String representation was:', $string);
247
                $timestamp = 0;
248
            }
249
            $this->add_date_pair($field->vars['name'], $timestamp);
250
        }
251
    }
252
253
    /**
254
     * @param string $name The field name
255
     * @return string index method
256
     */
257 6
    private function resolve_auto_method($name)
258
    {
259 6
        if (in_array($name, ['abstract', 'title', 'author'])) {
260 6
            return $name;
261
        }
262 6
        return 'content';
263
    }
264
}
265