Conditions | 15 |
Total Lines | 104 |
Lines | 0 |
Ratio | 0 % |
Tests | 39 |
CRAP Score | 15.0035 |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like Type2JoinHelper._merge_adjacent_rows() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | """ |
||
107 | 1 | def _merge_adjacent_rows(self, rows): |
|
108 | """ |
||
109 | Resolves adjacent and overlapping rows. With proper reference data overlapping rows MUST not occur. However, |
||
110 | this method can handle overlapping rows. Overlapping rows are resolved as follows: |
||
111 | * The interval with the most recent begin date prevails for the overlapping period. |
||
112 | * If the begin dates are the same the interval with the most recent end date prevails. |
||
113 | * If the begin and end dates are equal the last row in the data set prevails. |
||
114 | Identical (excluding begin and end date) adjacent rows are replace with a single row. |
||
115 | |||
116 | :param list[dict[str,T]] rows: The rows in a group (i.e. with the same natural key). |
||
117 | . |
||
118 | :rtype: list[dict[str,T]] |
||
119 | """ |
||
120 | 1 | ret = list() |
|
121 | |||
122 | 1 | prev_row = None |
|
123 | 1 | for row in rows: |
|
124 | 1 | if prev_row: |
|
125 | 1 | relation = Allen.relation(prev_row[self._key_start_date], |
|
126 | prev_row[self._key_end_date], |
||
127 | row[self._key_start_date], |
||
128 | row[self._key_end_date]) |
||
129 | 1 | if relation == Allen.X_BEFORE_Y: |
|
130 | # Two rows with distinct intervals. |
||
131 | # prev_row: |----| |
||
132 | # row: |-----| |
||
133 | 1 | ret.append(prev_row) |
|
134 | 1 | prev_row = row |
|
135 | |||
136 | 1 | elif relation == Allen.X_MEETS_Y: |
|
137 | # The two rows are adjacent. |
||
138 | # prev_row: |-------| |
||
139 | # row: |-------| |
||
140 | 1 | if self._equal(prev_row, row): |
|
141 | # The two rows are identical (except for start and end date) and adjacent. Combine the two rows |
||
142 | # into one row. |
||
143 | 1 | prev_row[self._key_end_date] = row[self._key_end_date] |
|
144 | else: |
||
145 | # Rows are adjacent but not identical. |
||
146 | 1 | ret.append(prev_row) |
|
147 | 1 | prev_row = row |
|
148 | |||
149 | 1 | elif relation == Allen.X_OVERLAPS_WITH_Y: |
|
150 | # prev_row overlaps row. Should not occur with proper reference data. |
||
151 | # prev_row: |-----------| |
||
152 | # row: |----------| |
||
153 | 1 | if self._equal(prev_row, row): |
|
154 | # The two rows are identical (except for start and end date) and overlapping. Combine the two |
||
155 | # rows into one row. |
||
156 | 1 | prev_row[self._key_end_date] = row[self._key_end_date] |
|
157 | else: |
||
158 | # Rows are overlapping but not identical. |
||
159 | 1 | prev_row[self._key_end_date] = row[self._key_start_date] - 1 |
|
160 | 1 | ret.append(prev_row) |
|
161 | 1 | prev_row = row |
|
162 | |||
163 | 1 | elif relation == Allen.X_STARTS_Y: |
|
164 | # prev_row start row. Should not occur with proper reference data. |
||
165 | # prev_row: |------| |
||
166 | # row: |----------------| |
||
167 | 1 | prev_row = row |
|
168 | |||
169 | 1 | elif relation == Allen.X_EQUAL_Y: |
|
170 | # Can happen when the reference data sets are joined without respect for date intervals. |
||
171 | # prev_row: |----------------| |
||
172 | # row: |----------------| |
||
173 | 1 | prev_row = row |
|
174 | |||
175 | 1 | elif relation == Allen.X_DURING_Y_INVERSE: |
|
176 | # row during prev_row. Should not occur with proper reference data. |
||
177 | # prev_row: |----------------| |
||
178 | # row: |------| |
||
179 | # Note: the interval with the most recent start date prevails. Hence, the interval after |
||
180 | # row[self._key_end_date] is discarded. |
||
181 | 1 | if self._equal(prev_row, row): |
|
182 | 1 | prev_row[self._key_end_date] = row[self._key_end_date] |
|
183 | else: |
||
184 | 1 | prev_row[self._key_end_date] = row[self._key_start_date] - 1 |
|
185 | 1 | ret.append(prev_row) |
|
186 | 1 | prev_row = row |
|
187 | |||
188 | 1 | elif relation == Allen.X_FINISHES_Y_INVERSE: |
|
189 | # row finishes prev_row. Should not occur with proper reference data. |
||
190 | # prev_row: |----------------| |
||
191 | # row: |------| |
||
192 | 1 | if not self._equal(prev_row, row): |
|
193 | 1 | prev_row[self._key_end_date] = row[self._key_start_date] - 1 |
|
194 | 1 | ret.append(prev_row) |
|
195 | 1 | prev_row = row |
|
196 | |||
197 | # Note: if the two rows are identical (except for start and end date) nothing to do. |
||
198 | else: |
||
199 | # Note: The rows are sorted such that prev_row[self._key_begin_date] <= row[self._key_begin_date]. |
||
200 | # Hence the following relation should not occur: X_DURING_Y, X_FINISHES_Y, X_BEFORE_Y_INVERSE, |
||
201 | # X_MEETS_Y_INVERSE, X_OVERLAPS_WITH_Y_INVERSE, and X_STARTS_Y_INVERSE. Hence, we covered all 13 |
||
202 | # relations in Allen's interval algebra. |
||
203 | raise ValueError('Data is not sorted properly. Relation: %d' % relation) |
||
204 | else: |
||
205 | 1 | prev_row = row |
|
206 | |||
207 | 1 | if prev_row: |
|
208 | 1 | ret.append(prev_row) |
|
209 | |||
210 | 1 | return ret |
|
211 | |||
233 |