Passed
Push — 4 ( dace2f...ced2ba )
by Damian
09:35
created

CsvBulkLoader   F

Complexity

Total Complexity 65

Size/Duplication

Total Lines 431
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 431
rs 3.3333
c 0
b 0
f 0
wmc 65

9 Methods

Rating   Name   Duplication   Size   Complexity  
A preview() 0 3 1
A getNewSplitFileName() 0 4 1
C processAll() 0 61 13
B processChunk() 0 37 6
B splitFile() 0 57 6
F processRecord() 0 107 22
D findExistingObject() 0 41 10
A getNormalisedColumnMap() 0 16 4
A hasHeaderRow() 0 3 2

How to fix   Complexity   

Complex Class

Complex classes like CsvBulkLoader often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use CsvBulkLoader, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
namespace SilverStripe\Dev;
4
5
use League\Csv\Reader;
6
use SilverStripe\Control\Director;
7
use SilverStripe\ORM\DataObject;
8
9
/**
10
 * Utility class to facilitate complex CSV-imports by defining column-mappings
11
 * and custom converters.
12
 *
13
 * Uses the fgetcsv() function to process CSV input. Accepts a file-handler as
14
 * input.
15
 *
16
 * @see http://tools.ietf.org/html/rfc4180
17
 *
18
 * @todo Support for deleting existing records not matched in the import
19
 * (through relation checks)
20
 */
21
class CsvBulkLoader extends BulkLoader
22
{
23
24
    /**
25
     * Delimiter character (Default: comma).
26
     *
27
     * @var string
28
     */
29
    public $delimiter = ',';
30
31
    /**
32
     * Enclosure character (Default: doublequote)
33
     *
34
     * @var string
35
     */
36
    public $enclosure = '"';
37
38
    /**
39
     * Identifies if csv the has a header row.
40
     *
41
     * @var boolean
42
     */
43
    public $hasHeaderRow = true;
44
45
    /**
46
     * Number of lines to split large CSV files into.
47
     *
48
     * @var int
49
     *
50
     * @config
51
     */
52
    private static $lines = 1000;
53
54
    /**
55
     * @inheritDoc
56
     */
57
    public function preview($filepath)
58
    {
59
        return $this->processAll($filepath, true);
60
    }
61
62
    /**
63
     * @param string $filepath
64
     * @param boolean $preview
65
     *
66
     * @return null|BulkLoader_Result
67
     */
68
    protected function processAll($filepath, $preview = false)
69
    {
70
        $previousDetectLE = ini_get('auto_detect_line_endings');
71
72
        ini_set('auto_detect_line_endings', true);
0 ignored issues
show
Bug introduced by
true of type true is incompatible with the type string expected by parameter $newvalue of ini_set(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

72
        ini_set('auto_detect_line_endings', /** @scrutinizer ignore-type */ true);
Loading history...
73
        try {
74
            $filepath = Director::getAbsFile($filepath);
75
            $csvReader = Reader::createFromPath($filepath, 'r');
76
77
            $tabExtractor = function ($row, $rowOffset, $iterator) {
78
                foreach ($row as &$item) {
79
                    // [SS-2017-007] Ensure all cells with leading tab and then [@=+] have the tab removed on import
80
                    if (preg_match("/^\t[\-@=\+]+.*/", $item)) {
81
                        $item = ltrim($item, "\t");
82
                    }
83
                }
84
                return $row;
85
            };
86
87
            if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
88
                $headerMap = $this->getNormalisedColumnMap();
89
                $remapper = function ($row, $rowOffset, $iterator) use ($headerMap, $tabExtractor) {
90
                    $row = $tabExtractor($row, $rowOffset, $iterator);
91
                    foreach ($headerMap as $column => $renamedColumn) {
92
                        if ($column == $renamedColumn) {
93
                            continue;
94
                        }
95
                        if (array_key_exists($column, $row)) {
96
                            if (strpos($renamedColumn, '_ignore_') !== 0) {
97
                                $row[$renamedColumn] = $row[$column];
98
                            }
99
                            unset($row[$column]);
100
                        }
101
                    }
102
                    return $row;
103
                };
104
            } else {
105
                $remapper = $tabExtractor;
106
            }
107
108
            if ($this->hasHeaderRow) {
109
                $rows = $csvReader->fetchAssoc(0, $remapper);
110
            } elseif ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
111
                $rows = $csvReader->fetchAssoc($headerMap, $remapper);
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $headerMap does not seem to be defined for all execution paths leading up to this point.
Loading history...
112
            }
113
114
            $result = BulkLoader_Result::create();
115
116
            foreach ($rows as $row) {
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable $rows does not seem to be defined for all execution paths leading up to this point.
Loading history...
117
                $this->processRecord($row, $this->columnMap, $result, $preview);
118
            }
119
        } catch (\Exception $e) {
120
            $failedMessage = sprintf("Failed to parse %s", $filepath);
121
            if (Director::isDev()) {
122
                $failedMessage = sprintf($failedMessage . " because %s", $e->getMessage());
123
            }
124
            print $failedMessage . PHP_EOL;
125
        } finally {
126
            ini_set('auto_detect_line_endings', $previousDetectLE);
127
        }
128
        return $result;
129
    }
130
131
    protected function getNormalisedColumnMap()
132
    {
133
        $map = [];
134
        foreach ($this->columnMap as $column => $newColumn) {
135
            if (strpos($newColumn, "->") === 0) {
136
                $map[$column] = $column;
137
            } elseif (is_null($newColumn)) {
138
                // the column map must consist of unique scalar values
139
                // `null` can be present multiple times and is not scalar
140
                // so we name it in a standard way so we can remove it later
141
                $map[$column] = '_ignore_' . $column;
142
            } else {
143
                $map[$column] = $newColumn;
144
            }
145
        }
146
        return $map;
147
    }
148
149
    /**
150
     * Splits a large file up into many smaller files.
151
     *
152
     * @param string $path Path to large file to split
153
     * @param int $lines Number of lines per file
154
     *
155
     * @return array List of file paths
156
     */
157
    protected function splitFile($path, $lines = null)
158
    {
159
        Deprecation::notice('5.0', 'splitFile is deprecated, please process files using a stream');
160
        $previous = ini_get('auto_detect_line_endings');
161
162
        ini_set('auto_detect_line_endings', true);
0 ignored issues
show
Bug introduced by
true of type true is incompatible with the type string expected by parameter $newvalue of ini_set(). ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

162
        ini_set('auto_detect_line_endings', /** @scrutinizer ignore-type */ true);
Loading history...
163
164
        if (!is_int($lines)) {
165
            $lines = $this->config()->get("lines");
166
        }
167
168
        $new = $this->getNewSplitFileName();
169
170
        $to = fopen($new, 'w+');
171
        $from = fopen($path, 'r');
172
173
        $header = null;
174
175
        if ($this->hasHeaderRow) {
176
            $header = fgets($from);
0 ignored issues
show
Bug introduced by
It seems like $from can also be of type false; however, parameter $handle of fgets() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

176
            $header = fgets(/** @scrutinizer ignore-type */ $from);
Loading history...
177
            fwrite($to, $header);
0 ignored issues
show
Bug introduced by
It seems like $to can also be of type false; however, parameter $handle of fwrite() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

177
            fwrite(/** @scrutinizer ignore-type */ $to, $header);
Loading history...
178
        }
179
180
        $files = array();
181
        $files[] = $new;
182
183
        $count = 0;
184
185
        while (!feof($from)) {
0 ignored issues
show
Bug introduced by
It seems like $from can also be of type false; however, parameter $handle of feof() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

185
        while (!feof(/** @scrutinizer ignore-type */ $from)) {
Loading history...
186
            fwrite($to, fgets($from));
187
188
            $count++;
189
190
            if ($count >= $lines) {
191
                fclose($to);
0 ignored issues
show
Bug introduced by
It seems like $to can also be of type false; however, parameter $handle of fclose() does only seem to accept resource, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

191
                fclose(/** @scrutinizer ignore-type */ $to);
Loading history...
192
193
                // get a new temporary file name, to write the next lines to
194
                $new = $this->getNewSplitFileName();
195
196
                $to = fopen($new, 'w+');
197
198
                if ($this->hasHeaderRow) {
199
                    // add the headers to the new file
200
                    fwrite($to, $header);
201
                }
202
203
                $files[] = $new;
204
205
                $count = 0;
206
            }
207
        }
208
209
        fclose($to);
210
211
        ini_set('auto_detect_line_endings', $previous);
212
213
        return $files;
214
    }
215
216
    /**
217
     * @return string
218
     */
219
    protected function getNewSplitFileName()
220
    {
221
        Deprecation::notice('5.0', 'getNewSplitFileName is deprecated, please name your files yourself');
222
        return TEMP_PATH . DIRECTORY_SEPARATOR . uniqid(str_replace('\\', '_', static::class), true) . '.csv';
223
    }
224
225
    /**
226
     * @param string $filepath
227
     * @param boolean $preview
228
     *
229
     * @return BulkLoader_Result
230
     */
231
    protected function processChunk($filepath, $preview = false)
232
    {
233
        Deprecation::notice('5.0', 'processChunk is deprecated, please process rows individually');
234
        $results = BulkLoader_Result::create();
235
236
        $csv = new CSVParser(
237
            $filepath,
238
            $this->delimiter,
239
            $this->enclosure
240
        );
241
242
        // ColumnMap has two uses, depending on whether hasHeaderRow is set
243
        if ($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
244
            // if the map goes to a callback, use the same key value as the map
245
            // value, rather than function name as multiple keys may use the
246
            // same callback
247
            $map = [];
248
            foreach ($this->columnMap as $k => $v) {
249
                if (strpos($v, "->") === 0) {
250
                    $map[$k] = $k;
251
                } else {
252
                    $map[$k] = $v;
253
                }
254
            }
255
256
            if ($this->hasHeaderRow) {
257
                $csv->mapColumns($map);
258
            } else {
259
                $csv->provideHeaderRow($map);
260
            }
261
        }
262
263
        foreach ($csv as $row) {
264
            $this->processRecord($row, $this->columnMap, $results, $preview);
265
        }
266
267
        return $results;
268
    }
269
270
    /**
271
     * @todo Better messages for relation checks and duplicate detection
272
     * Note that columnMap isn't used.
273
     *
274
     * @param array $record
275
     * @param array $columnMap
276
     * @param BulkLoader_Result $results
277
     * @param boolean $preview
278
     *
279
     * @return int
280
     */
281
    protected function processRecord($record, $columnMap, &$results, $preview = false)
282
    {
283
        $class = $this->objectClass;
284
285
        // find existing object, or create new one
286
        $existingObj = $this->findExistingObject($record, $columnMap);
287
        /** @var DataObject $obj */
288
        $obj = ($existingObj) ? $existingObj : new $class();
0 ignored issues
show
introduced by
The condition $existingObj can never be true.
Loading history...
289
        $schema = DataObject::getSchema();
290
291
        // first run: find/create any relations and store them on the object
292
        // we can't combine runs, as other columns might rely on the relation being present
293
        foreach ($record as $fieldName => $val) {
294
            // don't bother querying of value is not set
295
            if ($this->isNullValue($val)) {
296
                continue;
297
            }
298
299
            // checking for existing relations
300
            if (isset($this->relationCallbacks[$fieldName])) {
301
                // trigger custom search method for finding a relation based on the given value
302
                // and write it back to the relation (or create a new object)
303
                $relationName = $this->relationCallbacks[$fieldName]['relationname'];
304
                /** @var DataObject $relationObj */
305
                $relationObj = null;
306
                if ($this->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
307
                    $relationObj = $this->{$this->relationCallbacks[$fieldName]['callback']}($obj, $val, $record);
308
                } elseif ($obj->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
309
                    $relationObj = $obj->{$this->relationCallbacks[$fieldName]['callback']}($val, $record);
310
                }
311
                if (!$relationObj || !$relationObj->exists()) {
312
                    $relationClass = $schema->hasOneComponent(get_class($obj), $relationName);
313
                    $relationObj = new $relationClass();
314
                    //write if we aren't previewing
315
                    if (!$preview) {
316
                        $relationObj->write();
317
                    }
318
                }
319
                $obj->{"{$relationName}ID"} = $relationObj->ID;
320
                //write if we are not previewing
321
                if (!$preview) {
322
                    $obj->write();
323
                    $obj->flushCache(); // avoid relation caching confusion
324
                }
325
            } elseif (strpos($fieldName, '.') !== false) {
326
                // we have a relation column with dot notation
327
                list($relationName, $columnName) = explode('.', $fieldName);
328
                // always gives us an component (either empty or existing)
329
                $relationObj = $obj->getComponent($relationName);
330
                if (!$preview) {
331
                    $relationObj->write();
332
                }
333
                $obj->{"{$relationName}ID"} = $relationObj->ID;
334
335
                //write if we are not previewing
336
                if (!$preview) {
337
                    $obj->write();
338
                    $obj->flushCache(); // avoid relation caching confusion
339
                }
340
            }
341
        }
342
343
        // second run: save data
344
345
        foreach ($record as $fieldName => $val) {
346
            // break out of the loop if we are previewing
347
            if ($preview) {
348
                break;
349
            }
350
351
            // look up the mapping to see if this needs to map to callback
352
            $mapped = $this->columnMap && isset($this->columnMap[$fieldName]);
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
353
354
            if ($mapped && strpos($this->columnMap[$fieldName], '->') === 0) {
355
                $funcName = substr($this->columnMap[$fieldName], 2);
356
357
                $this->$funcName($obj, $val, $record);
358
            } elseif ($obj->hasMethod("import{$fieldName}")) {
359
                $obj->{"import{$fieldName}"}($val, $record);
360
            } else {
361
                $obj->update(array($fieldName => $val));
362
            }
363
        }
364
365
        // write record
366
        if (!$preview) {
367
            $obj->write();
368
        }
369
370
        // @todo better message support
371
        $message = '';
372
373
        // save to results
374
        if ($existingObj) {
0 ignored issues
show
introduced by
The condition $existingObj can never be true.
Loading history...
375
            $results->addUpdated($obj, $message);
376
        } else {
377
            $results->addCreated($obj, $message);
378
        }
379
380
        $objID = $obj->ID;
381
382
        $obj->destroy();
383
384
        // memory usage
385
        unset($existingObj, $obj);
386
387
        return $objID;
388
    }
389
390
    /**
391
     * Find an existing objects based on one or more uniqueness columns
392
     * specified via {@link self::$duplicateChecks}.
393
     *
394
     * @todo support $columnMap
395
     *
396
     * @param array $record CSV data column
397
     * @param array $columnMap
398
     * @return DataObject
399
     */
400
    public function findExistingObject($record, $columnMap = [])
401
    {
402
        $SNG_objectClass = singleton($this->objectClass);
403
        // checking for existing records (only if not already found)
404
405
        foreach ($this->duplicateChecks as $fieldName => $duplicateCheck) {
406
            $existingRecord = null;
407
            if (is_string($duplicateCheck)) {
408
                // Skip current duplicate check if field value is empty
409
                if (empty($record[$duplicateCheck])) {
410
                    continue;
411
                }
412
413
                // Check existing record with this value
414
                $dbFieldValue = $record[$duplicateCheck];
415
                $existingRecord = DataObject::get($this->objectClass)
416
                    ->filter($duplicateCheck, $dbFieldValue)
417
                    ->first();
418
419
                if ($existingRecord) {
420
                    return $existingRecord;
421
                }
422
            } elseif (is_array($duplicateCheck) && isset($duplicateCheck['callback'])) {
423
                if ($this->hasMethod($duplicateCheck['callback'])) {
424
                    $existingRecord = $this->{$duplicateCheck['callback']}($record[$fieldName], $record);
425
                } elseif ($SNG_objectClass->hasMethod($duplicateCheck['callback'])) {
426
                    $existingRecord = $SNG_objectClass->{$duplicateCheck['callback']}($record[$fieldName], $record);
427
                } else {
428
                    user_error("CsvBulkLoader::processRecord():"
429
                        . " {$duplicateCheck['callback']} not found on importer or object class.", E_USER_ERROR);
430
                }
431
432
                if ($existingRecord) {
433
                    return $existingRecord;
434
                }
435
            } else {
436
                user_error('CsvBulkLoader::processRecord(): Wrong format for $duplicateChecks', E_USER_ERROR);
437
            }
438
        }
439
440
        return false;
441
    }
442
443
    /**
444
     * Determine whether any loaded files should be parsed with a
445
     * header-row (otherwise we rely on {@link self::$columnMap}.
446
     *
447
     * @return boolean
448
     */
449
    public function hasHeaderRow()
450
    {
451
        return ($this->hasHeaderRow || isset($this->columnMap));
452
    }
453
}
454