data.datasets.calculate_dlr   B
last analyzed

Complexity

Total Complexity 46

Size/Duplication

Total Lines 362
Duplicated Lines 8.29 %

Importance

Changes 0
Metric Value
wmc 46
eloc 223
dl 30
loc 362
rs 8.72
c 0
b 0
f 0

2 Functions

Rating   Name   Duplication   Size   Complexity  
F dlr() 0 118 14
F DLR_Regions() 30 188 31

1 Method

Rating   Name   Duplication   Size   Complexity  
A Calculate_dlr.__init__() 0 6 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.calculate_dlr often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Use the concept of dynamic line rating(DLR) to calculate temporal
3
depending capacity for HV transmission lines.
4
Inspired mainly on Planungsgrundsaetze-2020
5
Available at:
6
<https://www.transnetbw.de/files/pdf/netzentwicklung/netzplanungsgrundsaetze/UENB_PlGrS_Juli2020.pdf>
7
"""
8
9
from pathlib import Path
10
11
from shapely.geometry import Point
12
import geopandas as gpd
13
import numpy as np
14
import pandas as pd
15
import xarray as xr
16
17
from egon.data import config, db
18
from egon.data.datasets import Dataset
19
from egon.data.datasets.scenario_parameters import get_sector_parameters
20
21
22
class Calculate_dlr(Dataset):
23
    """Calculate DLR and assign values to each line in the db
24
25
    Parameters
26
    ----------
27
    *No parameters required
28
29
    *Dependencies*
30
      * :py:class:`DataBundle <egon.data.datasets.data_bundle.DataBundle>`
31
      * :py:class:`Osmtgmod <egon.data.datasets.osmtgmod.Osmtgmod>`
32
      * :py:class:`WeatherData <egon.data.datasets.era5.WeatherData>`
33
      * :py:class:`FixEhvSubnetworks <egon.data.datasets.FixEhvSubnetworks>`
34
35
    *Resulting tables*
36
      * :py:class:`grid.egon_etrago_line_timeseries
37
        <egon.data.datasets.etrago_setup.EgonPfHvLineTimeseries>` is filled
38
    """
39
40
    #:
41
    name: str = "dlr"
42
    #:
43
    version: str = "0.0.2"
44
45
    def __init__(self, dependencies):
46
        super().__init__(
47
            name=self.name,
48
            version=self.version,
49
            dependencies=dependencies,
50
            tasks=(dlr,),
51
        )
52
53
54
def dlr():
55
    """Calculate DLR and assign values to each line in the db
56
57
    Parameters
58
    ----------
59
    *No parameters required
60
61
    """
62
    cfg = config.datasets()["dlr"]
63
    for scn in set(config.settings()["egon-data"]["--scenarios"]):
64
        weather_year = get_sector_parameters("global", scn)["weather_year"]
65
66
        regions_shape_path = (
67
            Path(".")
68
            / "data_bundle_egon_data"
69
            / "regions_dynamic_line_rating"
70
            / "Germany_regions.shp"
71
        )
72
73
        # Calculate hourly DLR per region
74
        dlr_hourly_dic, dlr_hourly = DLR_Regions(
75
            weather_year, regions_shape_path
76
        )
77
78
        regions = gpd.read_file(regions_shape_path)
79
        regions = regions.sort_values(by=["Region"])
80
81
        # Connect to the data base
82
        con = db.engine()
83
84
        sql = f"""
85
        SELECT scn_name, line_id, topo, s_nom FROM
86
        {cfg['sources']['trans_lines']['schema']}.
87
        {cfg['sources']['trans_lines']['table']}
88
        """
89
        df = gpd.GeoDataFrame.from_postgis(
90
            sql, con, crs="EPSG:4326", geom_col="topo"
91
        )
92
93
        trans_lines_R = {}
94
        for i in regions.Region:
95
            shape_area = regions[regions["Region"] == i]
96
            trans_lines_R[i] = gpd.clip(df, shape_area)
97
        trans_lines = df[["s_nom"]]
98
        trans_lines["in_regions"] = [[] for i in range(len(df))]
99
100
        trans_lines[["line_id", "geometry", "scn_name"]] = df[
101
            ["line_id", "topo", "scn_name"]
102
        ]
103
        trans_lines = gpd.GeoDataFrame(trans_lines)
104
        # Assign to each transmission line the region to which it belongs
105
        for i in trans_lines_R:
106
            for j in trans_lines_R[i].index:
107
                trans_lines.loc[j][1] = trans_lines.loc[j][1].append(i)
108
        trans_lines["crossborder"] = ~trans_lines.within(regions.unary_union)
109
110
        DLR = []
111
112
        # Assign to each transmision line the final values of DLR based on location
113
        # and type of line (overhead or underground)
114
        for i in trans_lines.index:
115
            # The concept of DLR does not apply to crossborder lines
116
            if trans_lines.loc[i, "crossborder"] == True:
117
                DLR.append([1] * 8760)
118
                continue
119
            # Underground lines have DLR = 1
120
            if (
121
                trans_lines.loc[i][0] % 280 == 0
122
                or trans_lines.loc[i][0] % 550 == 0
123
                or trans_lines.loc[i][0] % 925 == 0
124
            ):
125
                DLR.append([1] * 8760)
126
                continue
127
            # Lines completely in one of the regions, have the DLR of the region
128
            if len(trans_lines.loc[i][1]) == 1:
129
                region = int(trans_lines.loc[i][1][0])
130
                DLR.append(dlr_hourly_dic["R" + str(region) + "-DLR"])
131
                continue
132
            # For lines crossing 2 or more regions, the lowest DLR between the
133
            # different regions per hour is assigned.
134
            if len(trans_lines.loc[i][1]) > 1:
135
                reg = []
136
                for j in trans_lines.loc[i][1]:
137
                    reg.append("Reg_" + str(j))
138
                min_DLR_reg = dlr_hourly[reg].min(axis=1)
139
                DLR.append(list(min_DLR_reg))
140
141
        trans_lines["s_max_pu"] = DLR
142
143
        # delete unnecessary columns
144
        trans_lines.drop(
145
            columns=["in_regions", "s_nom", "geometry", "crossborder"],
146
            inplace=True,
147
        )
148
149
        # Modify column "s_max_pu" to fit the requirement of the table
150
        trans_lines["s_max_pu"] = trans_lines.apply(
151
            lambda x: list(x["s_max_pu"]), axis=1
152
        )
153
        trans_lines["temp_id"] = 1
154
155
        # Delete existing data
156
        db.execute_sql(
157
            f"""
158
            DELETE FROM {cfg['sources']['line_timeseries']['schema']}.
159
            {cfg['sources']['line_timeseries']['table']};
160
            """
161
        )
162
163
        # Insert into database
164
        trans_lines.to_sql(
165
            f"{cfg['targets']['line_timeseries']['table']}",
166
            schema=f"{cfg['targets']['line_timeseries']['schema']}",
167
            con=db.engine(),
168
            if_exists="append",
169
            index=False,
170
        )
171
        return 0
172
173
174
def DLR_Regions(weather_year, regions_shape_path):
175
    """Calculate DLR values for the given regions
176
177
    Parameters
178
    ----------
179
    weather_info_path: str, mandatory
180
        path of the weather data downloaded from ERA5
181
    regions_shape_path: str, mandatory
182
        path to the shape file with the shape of the regions to analyze
183
184
    """
185
    # load, index and sort shapefile with the 9 regions defined by NEP 2020
186
    regions = gpd.read_file(regions_shape_path)
187
    regions = regions.set_index(["Region"])
188
    regions = regions.sort_values(by=["Region"])
189
190
    # The data downloaded using Atlite is loaded in 'weather_data_raw'.
191
    file_name = f"germany-{weather_year}-era5.nc"
192
    weather_info_path = (
193
        Path(".") / "data_bundle_egon_data" / "cutouts" / file_name
194
    )
195
    weather_data_raw = xr.open_mfdataset(str(weather_info_path))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable str does not seem to be defined.
Loading history...
196
    weather_data_raw = weather_data_raw.rio.write_crs(4326)
197
    weather_data_raw = weather_data_raw.rio.clip_box(
198
        minx=5.5, miny=47, maxx=15.5, maxy=55.5
199
    )
200
201
    wind_speed_raw = weather_data_raw.wnd100m.values
202
    temperature_raw = weather_data_raw.temperature.values
203
    roughness_raw = weather_data_raw.roughness.values
204
    index = weather_data_raw.indexes._indexes
205
    # The info in 'weather_data_raw' has 3 dimensions. In 'weather_data' will be
206
    # stored all the relevant data in a 2 dimensions array.
207
    weather_data = np.zeros(shape=(wind_speed_raw.size, 5))
208
    count = 0
209
    for hour in range(index["time"].size):
210
        for row in range(index["y"].size):
211
            for column in range(index["x"].size):
212
                rough = roughness_raw[hour, row, column]
213
                ws_100m = wind_speed_raw[hour, row, column]
214
                # Use Log Law to calculate wind speed at 50m height
215
                ws_50m = ws_100m * (np.log(50 / rough) / np.log(100 / rough))
216
                weather_data[count, 0] = hour
217
                weather_data[count, 1] = index["y"][row]
218
                weather_data[count, 2] = index["x"][column]
219
                weather_data[count, 3] = ws_50m
220
                weather_data[count, 4] = (
221
                    temperature_raw[hour, row, column] - 273.15
222
                )
223
                count += 1
224
225
    weather_data = pd.DataFrame(
226
        weather_data, columns=["hour", "lat", "lon", "wind_s", "temp"]
227
    )
228
229
    region_selec = weather_data[0 : index["x"].size * index["y"].size].copy()
230
    region_selec["geom"] = region_selec.apply(
231
        lambda x: Point(x["lon"], x["lat"]), axis=1
232
    )
233
    region_selec = gpd.GeoDataFrame(region_selec)
234
    region_selec = region_selec.set_geometry("geom")
235
    region_selec["region"] = np.zeros(index["x"].size * index["y"].size)
236
237
    # Mask weather information for each region defined by NEP 2020
238
    for reg in regions.index:
239
        weather_region = gpd.clip(region_selec, regions.loc[reg][0])
240
        region_selec["region"][
241
            region_selec.isin(weather_region).any(axis=1)
242
        ] = reg
243
244
    weather_data["region"] = (
245
        region_selec["region"].tolist() * index["time"].size
246
    )
247
    weather_data = weather_data[weather_data["region"] != 0]
248
249
    # Create data frame to save results(Min wind speed, max temperature and %DLR per region along 8760h in a year)
250
    time = pd.date_range(
251
        f"{weather_year}-01-01", f"{weather_year}-12-31 23:00:00", freq="H"
252
    )
253
    # time = time.transpose()
254
    dlr = pd.DataFrame(
255
        0,
256
        columns=[
257
            "R1-Wind_min",
258
            "R1-Temp_max",
259
            "R1-DLR",
260
            "R2-Wind_min",
261
            "R2-Temp_max",
262
            "R2-DLR",
263
            "R3-Wind_min",
264
            "R3-Temp_max",
265
            "R3-DLR",
266
            "R4-Wind_min",
267
            "R4-Temp_max",
268
            "R4-DLR",
269
            "R5-Wind_min",
270
            "R5-Temp_max",
271
            "R5-DLR",
272
            "R6-Wind_min",
273
            "R6-Temp_max",
274
            "R6-DLR",
275
            "R7-Wind_min",
276
            "R7-Temp_max",
277
            "R7-DLR",
278
            "R8-Wind_min",
279
            "R8-Temp_max",
280
            "R8-DLR",
281
            "R9-Wind_min",
282
            "R9-Temp_max",
283
            "R9-DLR",
284
        ],
285
        index=time,
286
    )
287
288
    # Calculate and save min wind speed and max temperature in a dataframe.
289
    # Since the dataframe generated by the function era5.weather_df_from_era5() is sorted by date,
290
    # it is faster to calculate the hourly results using blocks of data defined by "step", instead of
291
    # using a filter or a search function.
292
    for reg, df in weather_data.groupby("region"):
293
        for t in range(0, len(time)):
294
            step = df.shape[0] / len(time)
295
            low_limit = int(t * step)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable int does not seem to be defined.
Loading history...
296
            up_limit = int(step * (t + 1))
297
            dlr.iloc[t, 0 + int(reg - 1) * 3] = min(
298
                df.iloc[low_limit:up_limit, 3]
299
            )
300
            dlr.iloc[t, 1 + int(reg - 1) * 3] = max(
301
                df.iloc[low_limit:up_limit, 4]
302
            )
303
304
    # The next loop use the min wind speed and max temperature calculated previously to
305
    # define the hourly DLR for each region based on the table given by NEP 2020 pag 31
306
    for i in range(0, len(regions)):
307
        for j in range(0, len(time)):
308
            if dlr.iloc[j, 1 + i * 3] <= 5:
309
                if dlr.iloc[j, 0 + i * 3] < 3:
310
                    dlr.iloc[j, 2 + i * 3] = 1.30
311
                elif dlr.iloc[j, 0 + i * 3] < 4:
312
                    dlr.iloc[j, 2 + i * 3] = 1.35
313
                elif dlr.iloc[j, 0 + i * 3] < 5:
314
                    dlr.iloc[j, 2 + i * 3] = 1.45
315
                else:
316
                    dlr.iloc[j, 2 + i * 3] = 1.50
317
            elif dlr.iloc[j, 1 + i * 3] <= 15:
318 View Code Duplication
                if dlr.iloc[j, 0 + i * 3] < 3:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
319
                    dlr.iloc[j, 2 + i * 3] = 1.20
320
                elif dlr.iloc[j, 0 + i * 3] < 4:
321
                    dlr.iloc[j, 2 + i * 3] = 1.25
322
                elif dlr.iloc[j, 0 + i * 3] < 5:
323
                    dlr.iloc[j, 2 + i * 3] = 1.35
324
                elif dlr.iloc[j, 0 + i * 3] < 6:
325
                    dlr.iloc[j, 2 + i * 3] = 1.45
326
                else:
327
                    dlr.iloc[j, 2 + i * 3] = 1.50
328
            elif dlr.iloc[j, 1 + i * 3] <= 25:
329 View Code Duplication
                if dlr.iloc[j, 0 + i * 3] < 3:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
330
                    dlr.iloc[j, 2 + i * 3] = 1.10
331
                elif dlr.iloc[j, 0 + i * 3] < 4:
332
                    dlr.iloc[j, 2 + i * 3] = 1.15
333
                elif dlr.iloc[j, 0 + i * 3] < 5:
334
                    dlr.iloc[j, 2 + i * 3] = 1.20
335
                elif dlr.iloc[j, 0 + i * 3] < 6:
336
                    dlr.iloc[j, 2 + i * 3] = 1.30
337
                else:
338
                    dlr.iloc[j, 2 + i * 3] = 1.40
339
            elif dlr.iloc[j, 1 + i * 3] <= 35:
340 View Code Duplication
                if dlr.iloc[j, 0 + i * 3] < 3:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
341
                    dlr.iloc[j, 2 + i * 3] = 1.00
342
                elif dlr.iloc[j, 0 + i * 3] < 4:
343
                    dlr.iloc[j, 2 + i * 3] = 1.05
344
                elif dlr.iloc[j, 0 + i * 3] < 5:
345
                    dlr.iloc[j, 2 + i * 3] = 1.10
346
                elif dlr.iloc[j, 0 + i * 3] < 6:
347
                    dlr.iloc[j, 2 + i * 3] = 1.15
348
                else:
349
                    dlr.iloc[j, 2 + i * 3] = 1.25
350
            else:
351
                dlr.iloc[j, 2 + i * 3] = 1.00
352
353
    DLR_hourly_df_dic = {}
354
    for i in dlr.columns[range(2, 29, 3)]:  # columns with DLR values
355
        DLR_hourly_df_dic[i] = dlr[i].values
356
357
    dlr_hourly = pd.DataFrame(index=time)
358
    for i in range(len(regions)):
359
        dlr_hourly["Reg_" + str(i + 1)] = dlr.iloc[:, 3 * i + 2]
360
361
    return DLR_hourly_df_dic, dlr_hourly
362