dirutility.walk.sequential.Crawler.filter()   C
last analyzed

Complexity

Conditions 11

Size

Total Lines 32
Code Lines 18

Duplication

Lines 31
Ratio 96.88 %

Importance

Changes 0
Metric Value
cc 11
eloc 18
nop 1
dl 31
loc 32
rs 5.4
c 0
b 0
f 0

How to fix   Complexity   

Complexity

Complex classes like dirutility.walk.sequential.Crawler.filter() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import os
2
3
from looptools import Counter
4
5
6 View Code Duplication
class Crawler:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
7
8
    def __init__(self, directory, filters, full_paths, topown, _printer):
9
        """Sub class of DirPaths used for sequential directory parsing"""
10
        self.directory = directory
11
        self.filters = filters
12
        self.topdown = topown
13
        self._printer = _printer
14
15
        self.filepaths = []
16
17
        if full_paths:
18
            self.add_path = self._add_filepath_absolute
19
            self._printer('Absolute paths')
20
        else:
21
            self.add_path = self._add_filepath_relative
22
            self._printer('Relative paths')
23
24
    def __iter__(self):
25
        return iter(self.filepaths)
26
27
    def __len__(self):
28
        return len(self.filepaths)
29
30
    def _add_filepath_relative(self, directory, fullname):
31
        self.filepaths.append(fullname)
32
33
    def _add_filepath_absolute(self, directory, fullname):
34
        self.filepaths.append(os.path.join(directory, fullname))
35
36
    def crawler(self):
37
        if self.filters:
38
            self._printer('Filtering enabled')
39
            self.filter()
40
        else:
41
            self._printer('Filtering disabled')
42
            self.encompass()
43
        return self.filepaths
44
45
    def encompass(self):
46
        """
47
        Called when parallelize is False.
48
        This function will generate the file names in a directory tree by walking the tree either top-down or
49
        bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple
50
        (dirpath, dirnames, filenames).
51
        """
52
        self._printer('Standard Walk')
53
        count = Counter(length=3)
54
        for directory in self.directory:
55
            for root, directories, files in os.walk(directory, topdown=self.topdown):
56
                root = root[len(str(directory)) + 1:]
57
                self._printer(str(count.up) + ": Explored path - " + str(root), stream=True)
58
                for filename in files:
59
                    fullname = os.path.join(root, filename)
60
                    # Join the two strings in order to form the full filepath.
61
                    self.add_path(directory, fullname)
62
63
    def filter(self):
64
        """
65
        Called when parallelize is False.
66
        This function will generate the file names in a directory tree by walking the tree either top-down or
67
        bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple
68
        (dirpath, dirnames, filenames).
69
        """
70
        self._printer('Standard Walk')
71
        count = Counter(length=3)
72
        for directory in self.directory:
73
            self._printer('Searching ' + directory)
74
            for root, directories, files in os.walk(directory, topdown=self.topdown):
75
                root = root[len(str(directory)) + 1:]
76
                self._printer(str(count.up) + ": Explored path - " + str(root), stream=True)
77
                if self.filters.validate(root):
78
                    # Check that non-empty folders flag is on and we're at the max directory level
79
                    if self.filters.non_empty_folders and self.filters.get_level(root) == self.filters.max_level:
80
                        # Check that the path is not an empty folder
81
                        if os.path.isdir(directory + os.sep + root):
82
                            # Get paths in folder without walking directory
83
                            paths = os.listdir(directory + os.sep + root)
84
85
                            # Check that any of the paths are files and not just directories
86
                            if paths and any(os.path.isfile(os.path.join(directory, p)) for p in paths):
87
                                self.add_path(directory, root)
88
89
                    else:
90
                        for filename in files:
91
                            fullname = os.path.join(root, filename)
92
                            if self.filters.validate(fullname):
93
                                # Join the two strings in order to form the full filepath.
94
                                self.add_path(directory, fullname)
95