Completed
Push — 3.7 ( 81b2d8...ef0909 )
by
unknown
09:42
created

CsvBulkLoader   C

Complexity

Total Complexity 53

Size/Duplication

Total Lines 357
Duplicated Lines 0 %

Coupling/Cohesion

Components 1
Dependencies 7

Importance

Changes 0
Metric Value
dl 0
loc 357
rs 6.96
c 0
b 0
f 0
wmc 53
lcom 1
cbo 7

8 Methods

Rating   Name   Duplication   Size   Complexity  
A preview() 0 3 1
A processAll() 0 31 5
B splitFile() 0 56 6
A getNewSplitFileName() 0 3 1
B processChunk() 0 35 6
F processRecord() 0 98 22
B findExistingObject() 0 37 10
A hasHeaderRow() 0 3 2

How to fix   Complexity   

Complex Class

Complex classes like CsvBulkLoader often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use CsvBulkLoader, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * Utility class to facilitate complex CSV-imports by defining column-mappings
4
 * and custom converters.
5
 *
6
 * Uses the fgetcsv() function to process CSV input. Accepts a file-handler as
7
 * input.
8
 *
9
 * @see http://tools.ietf.org/html/rfc4180
10
 *
11
 * @package framework
12
 * @subpackage bulkloading
13
 *
14
 * @todo Support for deleting existing records not matched in the import
15
 * (through relation checks)
16
 */
17
class CsvBulkLoader extends BulkLoader {
18
19
	/**
20
	 * Delimiter character (Default: comma).
21
	 *
22
	 * @var string
23
	 */
24
	public $delimiter = ',';
25
26
	/**
27
	 * Enclosure character (Default: doublequote)
28
	 *
29
	 * @var string
30
	 */
31
	public $enclosure = '"';
32
33
	/**
34
	 * Identifies if csv the has a header row.
35
	 *
36
	 * @var boolean
37
	 */
38
	public $hasHeaderRow = true;
39
40
	/**
41
	 * Number of lines to split large CSV files into.
42
	 *
43
	 * @var int
44
	 *
45
	 * @config
46
	 */
47
	private static $lines = 1000;
48
49
	/**
50
	 * @inheritDoc
51
	 */
52
	public function preview($filepath) {
53
		return $this->processAll($filepath, true);
54
	}
55
56
	/**
57
	 * @param string $filepath
58
	 * @param boolean $preview
59
	 *
60
	 * @return null|BulkLoader_Result
61
	 */
62
	protected function processAll($filepath, $preview = false) {
63
		$filepath = Director::getAbsFile($filepath);
64
		$files = $this->splitFile($filepath);
65
66
		$result = null;
67
		$last = null;
68
69
		try {
70
			foreach ($files as $file) {
71
				$last = $file;
72
73
				$next = $this->processChunk($file, $preview);
74
75
				if ($result instanceof BulkLoader_Result) {
76
					$result->merge($next);
77
				} else {
78
					$result = $next;
79
				}
80
81
				@unlink($file);
0 ignored issues
show
Security Best Practice introduced by
It seems like you do not handle an error condition here. This can introduce security issues, and is generally not recommended.

If you suppress an error, we recommend checking for the error condition explicitly:

// For example instead of
@mkdir($dir);

// Better use
if (@mkdir($dir) === false) {
    throw new \RuntimeException('The directory '.$dir.' could not be created.');
}
Loading history...
82
			}
83
		} catch (Exception $e) {
84
			$failedMessage = sprintf("Failed to parse %s", $last);
85
			if (Director::isDev()) {
86
				$failedMessage = sprintf($failedMessage . " because %s", $e->getMessage());
87
			}
88
			print $failedMessage . PHP_EOL;
89
		}
90
91
		return $result;
92
	}
93
94
	/**
95
	 * Splits a large file up into many smaller files.
96
	 *
97
	 * @param string $path Path to large file to split
98
	 * @param int $lines Number of lines per file
99
	 *
100
	 * @return array List of file paths
101
	 */
102
	protected function splitFile($path, $lines = null) {
103
		$previous = ini_get('auto_detect_line_endings');
104
105
		ini_set('auto_detect_line_endings', true);
106
107
		if (!is_int($lines)) {
108
			$lines = $this->config()->get("lines");
109
		}
110
111
		$new = $this->getNewSplitFileName();
112
113
		$to = fopen($new, 'w+');
114
		$from = fopen($path, 'r');
115
116
		$header = null;
117
118
		if ($this->hasHeaderRow) {
119
			$header = fgets($from);
120
			fwrite($to, $header);
121
		}
122
123
		$files = array();
124
		$files[] = $new;
125
126
		$count = 0;
127
128
		while (!feof($from)) {
129
			fwrite($to, fgets($from));
130
131
			$count++;
132
133
			if ($count >= $lines) {
134
				fclose($to);
135
136
				// get a new temporary file name, to write the next lines to
137
				$new = $this->getNewSplitFileName();
138
139
				$to = fopen($new, 'w+');
140
141
				if ($this->hasHeaderRow) {
142
					// add the headers to the new file
143
					fwrite($to, $header);
144
				}
145
146
				$files[] = $new;
147
148
				$count = 0;
149
			}
150
		}
151
152
		fclose($to);
153
154
		ini_set('auto_detect_line_endings', $previous);
155
156
		return $files;
157
	}
158
159
	/**
160
	 * @return string
161
	 */
162
	protected function getNewSplitFileName() {
163
		return TEMP_FOLDER . '/' . uniqid('BulkLoader', true) . '.csv';
164
	}
165
166
	/**
167
	 * @param string $filepath
168
	 * @param boolean $preview
169
	 *
170
	 * @return BulkLoader_Result
171
	 */
172
	protected function processChunk($filepath, $preview = false) {
173
		$results = new BulkLoader_Result();
174
175
		$csv = new CSVParser(
176
			$filepath,
177
			$this->delimiter,
178
			$this->enclosure
179
		);
180
181
		// ColumnMap has two uses, depending on whether hasHeaderRow is set
182
		if($this->columnMap) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
183
			// if the map goes to a callback, use the same key value as the map
184
			// value, rather than function name as multiple keys may use the
185
			// same callback
186
			foreach($this->columnMap as $k => $v) {
187
				if(strpos($v, "->") === 0) {
188
					$map[$k] = $k;
0 ignored issues
show
Coding Style Comprehensibility introduced by
$map was never initialized. Although not strictly required by PHP, it is generally a good practice to add $map = array(); before regardless.

Adding an explicit array definition is generally preferable to implicit array definition as it guarantees a stable state of the code.

Let’s take a look at an example:

foreach ($collection as $item) {
    $myArray['foo'] = $item->getFoo();

    if ($item->hasBar()) {
        $myArray['bar'] = $item->getBar();
    }

    // do something with $myArray
}

As you can see in this example, the array $myArray is initialized the first time when the foreach loop is entered. You can also see that the value of the bar key is only written conditionally; thus, its value might result from a previous iteration.

This might or might not be intended. To make your intention clear, your code more readible and to avoid accidental bugs, we recommend to add an explicit initialization $myArray = array() either outside or inside the foreach loop.

Loading history...
189
				} else {
190
					$map[$k] = $v;
0 ignored issues
show
Bug introduced by
The variable $map does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
191
				}
192
			}
193
194
			if($this->hasHeaderRow) {
195
				$csv->mapColumns($map);
196
			} else {
197
				$csv->provideHeaderRow($map);
198
			}
199
		}
200
201
		foreach($csv as $row) {
202
			$this->processRecord($row, $this->columnMap, $results, $preview);
203
		}
204
205
		return $results;
206
	}
207
208
	/**
209
	 * @todo Better messages for relation checks and duplicate detection
210
	 * Note that columnMap isn't used.
211
	 *
212
	 * @param array $record
213
	 * @param array $columnMap
214
	 * @param BulkLoader_Result $results
215
	 * @param boolean $preview
216
	 *
217
	 * @return int
218
	 */
219
	protected function processRecord($record, $columnMap, &$results, $preview = false) {
220
		$class = $this->objectClass;
221
222
		// find existing object, or create new one
223
		$existingObj = $this->findExistingObject($record, $columnMap);
0 ignored issues
show
Unused Code introduced by
The call to CsvBulkLoader::findExistingObject() has too many arguments starting with $columnMap.

This check compares calls to functions or methods with their respective definitions. If the call has more arguments than are defined, it raises an issue.

If a function is defined several times with a different number of parameters, the check may pick up the wrong definition and report false positives. One codebase where this has been known to happen is Wordpress.

In this case you can add the @ignore PhpDoc annotation to the duplicate definition and it will be ignored.

Loading history...
224
		$obj = ($existingObj) ? $existingObj : new $class();
225
226
		// first run: find/create any relations and store them on the object
227
		// we can't combine runs, as other columns might rely on the relation being present
228
		$relations = array();
229
		foreach($record as $fieldName => $val) {
230
			// don't bother querying of value is not set
231
			if($this->isNullValue($val)) continue;
232
233
			// checking for existing relations
234
			if(isset($this->relationCallbacks[$fieldName])) {
235
				// trigger custom search method for finding a relation based on the given value
236
				// and write it back to the relation (or create a new object)
237
				$relationName = $this->relationCallbacks[$fieldName]['relationname'];
238
				if($this->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
239
					$relationObj = $this->{$this->relationCallbacks[$fieldName]['callback']}($obj, $val, $record);
240
				} elseif($obj->hasMethod($this->relationCallbacks[$fieldName]['callback'])) {
241
					$relationObj = $obj->{$this->relationCallbacks[$fieldName]['callback']}($val, $record);
242
				}
243
				if(!$relationObj || !$relationObj->exists()) {
0 ignored issues
show
Bug introduced by
The variable $relationObj does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
244
					$relationClass = $obj->hasOneComponent($relationName);
245
					$relationObj = new $relationClass();
246
					//write if we aren't previewing
247
					if (!$preview) $relationObj->write();
248
				}
249
				$obj->{"{$relationName}ID"} = $relationObj->ID;
250
				//write if we are not previewing
251
				if (!$preview) {
252
					$obj->write();
253
					$obj->flushCache(); // avoid relation caching confusion
254
				}
255
256
			} elseif(strpos($fieldName, '.') !== false) {
257
				// we have a relation column with dot notation
258
				list($relationName, $columnName) = explode('.', $fieldName);
0 ignored issues
show
Unused Code introduced by
The assignment to $columnName is unused. Consider omitting it like so list($first,,$third).

This checks looks for assignemnts to variables using the list(...) function, where not all assigned variables are subsequently used.

Consider the following code example.

<?php

function returnThreeValues() {
    return array('a', 'b', 'c');
}

list($a, $b, $c) = returnThreeValues();

print $a . " - " . $c;

Only the variables $a and $c are used. There was no need to assign $b.

Instead, the list call could have been.

list($a,, $c) = returnThreeValues();
Loading history...
259
				// always gives us an component (either empty or existing)
260
				$relationObj = $obj->getComponent($relationName);
261
				if (!$preview) $relationObj->write();
262
				$obj->{"{$relationName}ID"} = $relationObj->ID;
263
264
				//write if we are not previewing
265
				if (!$preview) {
266
					$obj->write();
267
					$obj->flushCache(); // avoid relation caching confusion
268
				}
269
			}
270
		}
271
272
		// second run: save data
273
274
		foreach($record as $fieldName => $val) {
275
			// break out of the loop if we are previewing
276
			if ($preview) {
277
				break;
278
			}
279
280
			// look up the mapping to see if this needs to map to callback
281
			$mapped = $this->columnMap && isset($this->columnMap[$fieldName]);
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->columnMap of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
282
283
			if($mapped && strpos($this->columnMap[$fieldName], '->') === 0) {
284
				$funcName = substr($this->columnMap[$fieldName], 2);
285
286
				$this->$funcName($obj, $val, $record);
287
			} else if($obj->hasMethod("import{$fieldName}")) {
288
				$obj->{"import{$fieldName}"}($val, $record);
289
			} else {
290
				$obj->update(array($fieldName => $val));
291
			}
292
		}
293
294
		// write record
295
		$id = ($preview) ? 0 : $obj->write();
296
297
		// @todo better message support
298
		$message = '';
299
300
		// save to results
301
		if($existingObj) {
302
			$results->addUpdated($obj, $message);
303
		} else {
304
			$results->addCreated($obj, $message);
305
		}
306
307
		$objID = $obj->ID;
308
309
		$obj->destroy();
310
311
		// memory usage
312
		unset($existingObj);
313
		unset($obj);
314
315
		return $objID;
316
	}
317
318
	/**
319
	 * Find an existing objects based on one or more uniqueness columns
320
	 * specified via {@link self::$duplicateChecks}.
321
	 *
322
	 * @param array $record CSV data column
323
	 *
324
	 * @return mixed
325
	 */
326
	public function findExistingObject($record) {
327
		$SNG_objectClass = singleton($this->objectClass);
328
		// checking for existing records (only if not already found)
329
330
		foreach($this->duplicateChecks as $fieldName => $duplicateCheck) {
331
			if(is_string($duplicateCheck)) {
332
333
				// Skip current duplicate check if field value is empty
334
				if(empty($record[$duplicateCheck])) continue;
335
336
				// Check existing record with this value
337
				$dbFieldValue = $record[$duplicateCheck];
338
				$existingRecord = DataObject::get($this->objectClass)
339
					->filter($duplicateCheck, $dbFieldValue)
340
					->first();
341
342
				if($existingRecord) return $existingRecord;
343
			} elseif(is_array($duplicateCheck) && isset($duplicateCheck['callback'])) {
344
				if($this->hasMethod($duplicateCheck['callback'])) {
345
					$existingRecord = $this->{$duplicateCheck['callback']}($record[$fieldName], $record);
346
				} elseif($SNG_objectClass->hasMethod($duplicateCheck['callback'])) {
347
					$existingRecord = $SNG_objectClass->{$duplicateCheck['callback']}($record[$fieldName], $record);
348
				} else {
349
					user_error("CsvBulkLoader::processRecord():"
350
						. " {$duplicateCheck['callback']} not found on importer or object class.", E_USER_ERROR);
351
				}
352
353
				if($existingRecord) {
354
					return $existingRecord;
0 ignored issues
show
Bug introduced by
The variable $existingRecord does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
355
				}
356
			} else {
357
				user_error('CsvBulkLoader::processRecord(): Wrong format for $duplicateChecks', E_USER_ERROR);
358
			}
359
		}
360
361
		return false;
362
	}
363
364
	/**
365
	 * Determine whether any loaded files should be parsed with a
366
	 * header-row (otherwise we rely on {@link self::$columnMap}.
367
	 *
368
	 * @return boolean
369
	 */
370
	public function hasHeaderRow() {
371
		return ($this->hasHeaderRow || isset($this->columnMap));
372
	}
373
}
374