Passed
Pull Request — dev (#826)
by
unknown
01:35
created

data.datasets.electricity_demand_timeseries.cts_buildings   B

Complexity

Total Complexity 48

Size/Duplication

Total Lines 1251
Duplicated Lines 5.2 %

Importance

Changes 0
Metric Value
wmc 48
eloc 567
dl 65
loc 1251
rs 8.5599
c 0
b 0
f 0

16 Functions

Rating   Name   Duplication   Size   Complexity  
A place_buildings_with_amenities() 0 41 3
B create_synthetic_buildings() 0 67 6
A amenities_without_buildings() 0 60 2
A start_logging() 0 11 1
A cts_heat() 33 33 2
A delete_synthetic_cts_buildings() 0 14 2
B get_cts_electricity_peak_load() 0 56 5
C calc_building_profiles() 0 106 10
A select_cts_buildings() 0 25 1
B cells_with_cts_demand_only() 0 72 2
A calc_building_demand_profile_share() 0 77 2
B buildings_with_amenities() 0 198 2
A cts_electricity() 32 32 2
A calc_census_cell_share() 0 56 4
B cts_buildings() 0 185 1
B buildings_without_amenities() 0 99 2

1 Method

Rating   Name   Duplication   Size   Complexity  
A CtsElectricityBuildings.__init__() 0 9 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.electricity_demand_timeseries.cts_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
2
3
from geoalchemy2 import Geometry
4
from geoalchemy2.shape import to_shape
5
from sqlalchemy import REAL, Column, Integer, String, func
6
from sqlalchemy.ext.declarative import declarative_base
7
import geopandas as gpd
8
import numpy as np
9
import pandas as pd
10
import saio
11
12
from egon.data import db
13
from egon.data.datasets import Dataset
14
from egon.data.datasets.electricity_demand import (
15
    EgonDemandRegioZensusElectricity,
16
)
17
from egon.data.datasets.electricity_demand.temporal import (
18
    EgonEtragoElectricityCts,
19
    calc_load_curves_cts,
20
)
21
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
22
    BuildingPeakLoads,
23
    OsmBuildingsSynthetic,
24
)
25
from egon.data.datasets.electricity_demand_timeseries.tools import (
26
    random_ints_until_sum,
27
    random_point_in_square,
28
    specific_int_until_sum,
29
    write_table_to_postgis,
30
    write_table_to_postgres,
31
)
32
from egon.data.datasets.heat_demand import EgonPetaHeat
33
from egon.data.datasets.zensus_mv_grid_districts import MapZensusGridDistricts
34
from egon.data.datasets.zensus_vg250 import DestatisZensusPopulationPerHa
35
36
engine = db.engine()
37
Base = declarative_base()
38
39
# import db tables
40
saio.register_schema("openstreetmap", engine=engine)
41
saio.register_schema("boundaries", engine=engine)
42
43
44
class EgonCtsElectricityDemandBuildingShare(Base):
45
    __tablename__ = "egon_cts_electricity_demand_building_share"
46
    __table_args__ = {"schema": "demand"}
47
48
    id = Column(Integer, primary_key=True)
49
    scenario = Column(String, primary_key=True)
50
    bus_id = Column(Integer, index=True)
51
    profile_share = Column(REAL)
52
53
54
class EgonCtsHeatDemandBuildingShare(Base):
55
    __tablename__ = "egon_cts_heat_demand_building_share"
56
    __table_args__ = {"schema": "demand"}
57
58
    id = Column(Integer, primary_key=True)
59
    scenario = Column(String, primary_key=True)
60
    bus_id = Column(Integer, index=True)
61
    profile_share = Column(REAL)
62
63
64
class CtsBuildings(Base):
65
    __tablename__ = "egon_cts_buildings"
66
    __table_args__ = {"schema": "openstreetmap"}
67
68
    serial = Column(Integer, primary_key=True)
69
    id = Column(Integer, index=True)
70
    zensus_population_id = Column(Integer, index=True)
71
    geom_building = Column(Geometry("Polygon", 3035))
72
    n_amenities_inside = Column(Integer)
73
    source = Column(String)
74
75
76
def start_logging():
77
    """Start logging into console"""
78
    log = logging.getLogger()
79
    log.setLevel(logging.INFO)
80
    logformat = logging.Formatter(
81
        "%(asctime)s %(message)s", "%m/%d/%Y %H:%M:%S"
82
    )
83
    sh = logging.StreamHandler()
84
    sh.setFormatter(logformat)
85
    log.addHandler(sh)
86
    return log
87
88
89
def amenities_without_buildings():
90
    """
91
    Amenities which have no buildings assigned and are in
92
    a cell with cts demand are determined.
93
94
    Returns
95
    -------
96
    pd.DataFrame
97
        Table of amenities without buildings
98
    """
99
    from saio.openstreetmap import osm_amenities_not_in_buildings_filtered
100
101
    with db.session_scope() as session:
102
        cells_query = (
103
            session.query(
104
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
105
                # TODO can be used for square around amenity
106
                #  (1 geom_amenity: 1 geom_building)
107
                #  not unique amenity_ids yet
108
                osm_amenities_not_in_buildings_filtered.geom_amenity,
109
                osm_amenities_not_in_buildings_filtered.egon_amenity_id,
110
                # EgonDemandRegioZensusElectricity.demand,
111
                # # TODO can be used to generate n random buildings
112
                # # (n amenities : 1 randombuilding)
113
                # func.count(
114
                #     osm_amenities_not_in_buildings_filtered.egon_amenity_id
115
                # ).label("n_amenities_inside"),
116
                # DestatisZensusPopulationPerHa.geom,
117
            )
118
            .filter(
119
                func.st_within(
120
                    osm_amenities_not_in_buildings_filtered.geom_amenity,
121
                    DestatisZensusPopulationPerHa.geom,
122
                )
123
            )
124
            .filter(
125
                DestatisZensusPopulationPerHa.id
126
                == EgonDemandRegioZensusElectricity.zensus_population_id
127
            )
128
            .filter(
129
                EgonDemandRegioZensusElectricity.sector == "service",
130
                EgonDemandRegioZensusElectricity.scenario == "eGon2035"
131
                #         ).group_by(
132
                #             EgonDemandRegioZensusElectricity.zensus_population_id,
133
                #             DestatisZensusPopulationPerHa.geom,
134
            )
135
        )
136
    # # TODO can be used to generate n random buildings
137
    # df_cells_with_amenities_not_in_buildings = gpd.read_postgis(
138
    #     cells_query.statement, cells_query.session.bind, geom_col="geom"
139
    # )
140
    #
141
142
    # # TODO can be used for square around amenity
143
    df_amenities_without_buildings = gpd.read_postgis(
144
        cells_query.statement,
145
        cells_query.session.bind,
146
        geom_col="geom_amenity",
147
    )
148
    return df_amenities_without_buildings
149
150
151
def place_buildings_with_amenities(df, amenities=None, max_amenities=None):
152
    """
153
    Building centroids are placed randomly within census cells.
154
    The Number of buildings is derived from n_amenity_inside, the selected
155
    method and number of amenities per building.
156
157
    Returns
158
    -------
159
    df: gpd.GeoDataFrame
160
        Table of buildings centroids
161
    """
162
    if isinstance(max_amenities, int):
163
        # amount of amenities is randomly generated within bounds
164
        # (max_amenities, amenities per cell)
165
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
166
            random_ints_until_sum, args=[max_amenities]
167
        )
168
    if isinstance(amenities, int):
169
        # Specific amount of amenities per building
170
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
171
            specific_int_until_sum, args=[amenities]
172
        )
173
174
    # Unnest each building
175
    df = df.explode(column="n_amenities_inside")
176
177
    # building count per cell
178
    df["building_count"] = df.groupby(["zensus_population_id"]).cumcount() + 1
179
180
    # generate random synthetic buildings
181
    edge_length = 5
182
    # create random points within census cells
183
    points = random_point_in_square(geom=df["geom"], tol=edge_length / 2)
184
185
    df.reset_index(drop=True, inplace=True)
186
    # Store center of polygon
187
    df["geom_point"] = points
188
    # Drop geometry of census cell
189
    df = df.drop(columns=["geom"])
190
191
    return df
192
193
194
def create_synthetic_buildings(df, points=None, crs="EPSG:3035"):
195
    """
196
    Synthetic buildings are generated around points.
197
198
    Parameters
199
    ----------
200
    df: pd.DataFrame
201
        Table of census cells
202
    points: gpd.GeoSeries or str
203
        List of points to place buildings around or column name of df
204
    crs: str
205
        CRS of result table
206
207
    Returns
208
    -------
209
    df: gpd.GeoDataFrame
210
        Synthetic buildings
211
    """
212
213
    if isinstance(points, str) and points in df.columns:
214
        points = df[points]
215
    elif isinstance(points, gpd.GeoSeries):
216
        pass
217
    else:
218
        raise ValueError("Points are of the wrong type")
219
220
    # Create building using a square around point
221
    edge_length = 5
222
    df["geom_building"] = points.buffer(distance=edge_length / 2, cap_style=3)
223
224
    if "geom_point" not in df.columns:
225
        df["geom_point"] = df["geom_building"].centroid
226
227
    # TODO Check CRS
228
    df = gpd.GeoDataFrame(
229
        df,
230
        crs=crs,
231
        geometry="geom_building",
232
    )
233
234
    # TODO remove after implementation of egon_building_id
235
    df.rename(columns={"id": "egon_building_id"}, inplace=True)
236
237
    # get max number of building ids from synthetic residential table
238
    with db.session_scope() as session:
239
        max_synth_residential_id = session.execute(
240
            func.max(OsmBuildingsSynthetic.id)
241
        ).scalar()
242
    max_synth_residential_id = int(max_synth_residential_id)
243
244
    # create sequential ids
245
    df["egon_building_id"] = range(
246
        max_synth_residential_id + 1,
247
        max_synth_residential_id + df.shape[0] + 1,
248
    )
249
250
    df["area"] = df["geom_building"].area
251
    # set building type of synthetic building
252
    df["building"] = "cts"
253
    # TODO remove in #772
254
    df = df.rename(
255
        columns={
256
            # "zensus_population_id": "cell_id",
257
            "egon_building_id": "id",
258
        }
259
    )
260
    return df
261
262
263
def buildings_with_amenities():
264
    """
265
    Amenities which are assigned to buildings are determined
266
    and grouped per building and census cell. Buildings
267
    covering multiple cells therefore exists multiple times
268
    but in different census cells. This is necessary to cover
269
    all cells with a cts demand. The zensus_population_id therefore
270
    refers to the amenities of the building. The building is assigned to a
271
    substation via the census cells of its centroid. Therefore a building can
272
    have demand shares of multiple cells but is assigned to only one cell and
273
    substation. The buildings are aggregated afterwards during the calculation
274
    of the profile_share.
275
276
    If buildings exist in multiple
277
    substations, their amenities are summed and assigned and kept in
278
    one substation only. If as a result, a census cell is uncovered,
279
    a synthetic amenity is placed. The buildings are aggregated
280
    afterwards during the calculation of the profile_share.
281
282
    Returns
283
    -------
284
    df_buildings_with_amenities: gpd.GeoDataFrame
285
        Contains all buildings with amenities per zensus cell.
286
    df_lost_cells: gpd.GeoDataFrame
287
        Contains synthetic amenities in lost cells. Might be empty
288
    """
289
290
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
291
    from saio.openstreetmap import osm_amenities_in_buildings_filtered
292
293
    with db.session_scope() as session:
294
        cells_query = (
295
            session.query(
296
                osm_amenities_in_buildings_filtered.id,
297
                osm_amenities_in_buildings_filtered.geom_building,
298
                osm_amenities_in_buildings_filtered.zensus_population_id,
299
                # egon_map_zensus_buildings_filtered_all.zensus_population_id,
300
                MapZensusGridDistricts.bus_id,
301
            )
302
            .filter(
303
                egon_map_zensus_buildings_filtered_all.id
304
                == osm_amenities_in_buildings_filtered.id
305
            )
306
            .filter(
307
                MapZensusGridDistricts.zensus_population_id
308
                == egon_map_zensus_buildings_filtered_all.zensus_population_id
309
            )
310
            .filter(
311
                EgonDemandRegioZensusElectricity.zensus_population_id
312
                == egon_map_zensus_buildings_filtered_all.zensus_population_id
313
            )
314
            .filter(
315
                EgonDemandRegioZensusElectricity.sector == "service",
316
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
317
            )
318
        )
319
    df_amenities_in_buildings = pd.read_sql(
320
        cells_query.statement, cells_query.session.bind, index_col=None
321
    )
322
323
    df_amenities_in_buildings["geom_building"] = df_amenities_in_buildings[
324
        "geom_building"
325
    ].apply(to_shape)
326
    # df_amenities_in_buildings["geom_amenity"] = df_amenities_in_buildings[
327
    #     "geom_amenity"
328
    # ].apply(to_shape)
329
330
    df_amenities_in_buildings["n_amenities_inside"] = 1
331
332
    # # add identifier column for buildings in multiple substations
333
    # df_amenities_in_buildings[
334
    #     "duplicate_identifier"
335
    # ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
336
    #     "n_amenities_inside"
337
    # ].transform(
338
    #     "cumsum"
339
    # )
340
    # df_amenities_in_buildings = df_amenities_in_buildings.sort_values(
341
    #     ["id", "duplicate_identifier"]
342
    # )
343
    # # sum amenities of buildings with multiple substations
344
    # df_amenities_in_buildings[
345
    #     "n_amenities_inside"
346
    # ] = df_amenities_in_buildings.groupby(["id", "duplicate_identifier"])[
347
    #     "n_amenities_inside"
348
    # ].transform(
349
    #     "sum"
350
    # )
351
    #
352
    # # create column to always go for bus_id with max amenities
353
    # df_amenities_in_buildings[
354
    #     "max_amenities"
355
    # ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
356
    #     "n_amenities_inside"
357
    # ].transform(
358
    #     "sum"
359
    # )
360
    # # sort to go for
361
    # df_amenities_in_buildings.sort_values(
362
    #     ["id", "max_amenities"], ascending=False, inplace=True
363
    # )
364
    #
365
    # # identify lost zensus cells
366
    # df_lost_cells = df_amenities_in_buildings.loc[
367
    #     df_amenities_in_buildings.duplicated(
368
    #         subset=["id", "duplicate_identifier"], keep="first"
369
    #     )
370
    # ]
371
    # df_lost_cells.drop_duplicates(
372
    #     subset=["zensus_population_id"], inplace=True
373
    # )
374
    #
375
    # # drop buildings with multiple substation and lower max amenity
376
    # df_amenities_in_buildings.drop_duplicates(
377
    #     subset=["id", "duplicate_identifier"], keep="first", inplace=True
378
    # )
379
    #
380
    # # check if lost zensus cells are already covered
381
    # if not df_lost_cells.empty:
382
    #     if not (
383
    #         df_amenities_in_buildings["zensus_population_id"]
384
    #         .isin(df_lost_cells["zensus_population_id"])
385
    #         .empty
386
    #     ):
387
    #         # query geom data for cell if not
388
    #         with db.session_scope() as session:
389
    #             cells_query = session.query(
390
    #                 DestatisZensusPopulationPerHa.id,
391
    #                 DestatisZensusPopulationPerHa.geom,
392
    #             ).filter(
393
    #                 DestatisZensusPopulationPerHa.id.in_(
394
    #                     df_lost_cells["zensus_population_id"]
395
    #                 )
396
    #             )
397
    #
398
    #         df_lost_cells = gpd.read_postgis(
399
    #             cells_query.statement,
400
    #             cells_query.session.bind,
401
    #             geom_col="geom",
402
    #         )
403
    #         # TODO maybe adapt method
404
    #         # place random amenity in cell
405
    #         df_lost_cells["n_amenities_inside"] = 1
406
    #         df_lost_cells.rename(
407
    #             columns={
408
    #                 "id": "zensus_population_id",
409
    #             },
410
    #             inplace=True,
411
    #         )
412
    #         df_lost_cells = place_buildings_with_amenities(
413
    #             df_lost_cells, amenities=1
414
    #         )
415
    #         df_lost_cells.rename(
416
    #             columns={
417
    #                 # "id": "zensus_population_id",
418
    #                 "geom_point": "geom_amenity",
419
    #             },
420
    #             inplace=True,
421
    #         )
422
    #         df_lost_cells.drop(
423
    #             columns=["building_count", "n_amenities_inside"], inplace=True
424
    #         )
425
    #     else:
426
    #         df_lost_cells = None
427
    # else:
428
    #     df_lost_cells = None
429
    #
430
    # # drop helper columns
431
    # df_amenities_in_buildings.drop(
432
    #     columns=["duplicate_identifier"], inplace=True
433
    # )
434
435
    # sum amenities per building and cell
436
    df_amenities_in_buildings[
437
        "n_amenities_inside"
438
    ] = df_amenities_in_buildings.groupby(["zensus_population_id", "id"])[
439
        "n_amenities_inside"
440
    ].transform(
441
        "sum"
442
    )
443
    # drop duplicated buildings
444
    df_buildings_with_amenities = df_amenities_in_buildings.drop_duplicates(
445
        ["id", "zensus_population_id"]
446
    )
447
    df_buildings_with_amenities.reset_index(inplace=True, drop=True)
448
449
    df_buildings_with_amenities = df_buildings_with_amenities[
450
        ["id", "zensus_population_id", "geom_building", "n_amenities_inside"]
451
    ]
452
    df_buildings_with_amenities.rename(
453
        columns={
454
            # "zensus_population_id": "cell_id",
455
            "egon_building_id": "id"
456
        },
457
        inplace=True,
458
    )
459
460
    return df_buildings_with_amenities
461
    # return df_buildings_with_amenities, df_lost_cells
462
463
464
def buildings_without_amenities():
465
    """
466
    Buildings (filtered and synthetic) in cells with
467
    cts demand but no amenities are determined.
468
469
    Returns
470
    -------
471
    df_buildings_without_amenities: gpd.GeoDataFrame
472
        Table of buildings without amenities in zensus cells
473
        with cts demand.
474
    """
475
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
476
    from saio.openstreetmap import (
477
        osm_amenities_shops_filtered,
478
        osm_buildings_filtered,
479
        osm_buildings_synthetic,
480
    )
481
482
    # buildings_filtered in cts-demand-cells without amenities
483
    with db.session_scope() as session:
484
485
        # Synthetic Buildings
486
        q_synth_buildings = session.query(
487
            osm_buildings_synthetic.cell_id.cast(Integer).label(
488
                "zensus_population_id"
489
            ),
490
            osm_buildings_synthetic.id.cast(Integer).label("id"),
491
            osm_buildings_synthetic.area.label("area"),
492
            osm_buildings_synthetic.geom_building.label("geom_building"),
493
            osm_buildings_synthetic.geom_point.label("geom_point"),
494
        )
495
496
        # Buildings filtered
497
        q_buildings_filtered = session.query(
498
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
499
            osm_buildings_filtered.id,
500
            osm_buildings_filtered.area,
501
            osm_buildings_filtered.geom_building,
502
            osm_buildings_filtered.geom_point,
503
        ).filter(
504
            osm_buildings_filtered.id
505
            == egon_map_zensus_buildings_filtered_all.id
506
        )
507
508
        # Amenities + zensus_population_id
509
        q_amenities = (
510
            session.query(
511
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
512
            )
513
            .filter(
514
                func.st_within(
515
                    osm_amenities_shops_filtered.geom_amenity,
516
                    DestatisZensusPopulationPerHa.geom,
517
                )
518
            )
519
            .distinct(DestatisZensusPopulationPerHa.id)
520
        )
521
522
        # Cells with CTS demand but without amenities
523
        q_cts_without_amenities = (
524
            session.query(
525
                EgonDemandRegioZensusElectricity.zensus_population_id,
526
            )
527
            .filter(
528
                EgonDemandRegioZensusElectricity.sector == "service",
529
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
530
            )
531
            .filter(
532
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
533
                    q_amenities
534
                )
535
            )
536
            .distinct()
537
        )
538
539
        # Buildings filtered + synthetic buildings residential in
540
        # cells with CTS demand but without amenities
541
        cells_query = q_synth_buildings.union(q_buildings_filtered).filter(
542
            egon_map_zensus_buildings_filtered_all.zensus_population_id.in_(
543
                q_cts_without_amenities
544
            )
545
        )
546
547
    # df_buildings_without_amenities = pd.read_sql(
548
    #     cells_query.statement, cells_query.session.bind, index_col=None)
549
    df_buildings_without_amenities = gpd.read_postgis(
550
        cells_query.statement,
551
        cells_query.session.bind,
552
        geom_col="geom_building",
553
    )
554
555
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
556
        columns={
557
            # "zensus_population_id": "cell_id",
558
            "egon_building_id": "id",
559
        }
560
    )
561
562
    return df_buildings_without_amenities
563
564
565
def select_cts_buildings(df_buildings_wo_amenities, max_n):
566
    """
567
    N Buildings (filtered and synthetic) in each cell with
568
    cts demand are selected. Only the first n buildings
569
    are taken for each cell. The buildings are sorted by surface
570
    area.
571
572
    Returns
573
    -------
574
    df_buildings_with_cts_demand: gpd.GeoDataFrame
575
        Table of buildings
576
    """
577
578
    df_buildings_wo_amenities.sort_values(
579
        "area", ascending=False, inplace=True
580
    )
581
    # select first n ids each census cell if available
582
    df_buildings_with_cts_demand = (
583
        df_buildings_wo_amenities.groupby("zensus_population_id")
584
        .nth(list(range(max_n)))
585
        .reset_index()
586
    )
587
    df_buildings_with_cts_demand.reset_index(drop=True, inplace=True)
588
589
    return df_buildings_with_cts_demand
590
591
592
def cells_with_cts_demand_only(df_buildings_without_amenities):
593
    """
594
    Cells with cts demand but no amenities or buildilngs
595
    are determined.
596
597
    Returns
598
    -------
599
    df_cells_only_cts_demand: gpd.GeoDataFrame
600
        Table of cells with cts demand but no amenities or buildings
601
    """
602
    from saio.openstreetmap import osm_amenities_shops_filtered
603
604
    # cells mit amenities
605
    with db.session_scope() as session:
606
        sub_query = (
607
            session.query(
608
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
609
            )
610
            .filter(
611
                func.st_within(
612
                    osm_amenities_shops_filtered.geom_amenity,
613
                    DestatisZensusPopulationPerHa.geom,
614
                )
615
            )
616
            .distinct(DestatisZensusPopulationPerHa.id)
617
        )
618
619
        cells_query = (
620
            session.query(
621
                EgonDemandRegioZensusElectricity.zensus_population_id,
622
                EgonDemandRegioZensusElectricity.scenario,
623
                EgonDemandRegioZensusElectricity.sector,
624
                EgonDemandRegioZensusElectricity.demand,
625
                DestatisZensusPopulationPerHa.geom,
626
            )
627
            .filter(
628
                EgonDemandRegioZensusElectricity.sector == "service",
629
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
630
            )
631
            .filter(
632
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
633
                    sub_query
634
                )
635
            )
636
            .filter(
637
                EgonDemandRegioZensusElectricity.zensus_population_id
638
                == DestatisZensusPopulationPerHa.id
639
            )
640
        )
641
642
    df_cts_cell_without_amenities = gpd.read_postgis(
643
        cells_query.statement,
644
        cells_query.session.bind,
645
        geom_col="geom",
646
        index_col=None,
647
    )
648
649
    # TODO maybe remove
650
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
651
        columns={"cell_id": "zensus_population_id"}
652
    )
653
654
    # Census cells with only cts demand
655
    df_cells_only_cts_demand = df_cts_cell_without_amenities.loc[
656
        ~df_cts_cell_without_amenities["zensus_population_id"].isin(
657
            df_buildings_without_amenities["zensus_population_id"].unique()
658
        )
659
    ]
660
661
    df_cells_only_cts_demand.reset_index(drop=True, inplace=True)
662
663
    return df_cells_only_cts_demand
664
665
666
def calc_census_cell_share(scenario="eGon2035", sector="electricity"):
667
    """
668
    The profile share for each census cell is calculated by it's
669
    share of annual demand per substation bus. The annual demand
670
    per cell is defined by DemandRegio/Peta5. The share is for both
671
    scenarios identical as the annual demand is linearly scaled.
672
673
    Parameters
674
    ----------
675
    scenario: str
676
        Scenario for which the share is calculated.
677
    sector: str
678
        Scenario for which the share is calculated.
679
680
    Returns
681
    -------
682
    df_census_share: pd.DataFrame
683
    """
684
    if sector == "electricity":
685
        demand_table = EgonDemandRegioZensusElectricity
686
    elif sector == "heat":
687
        demand_table = EgonPetaHeat
688
689
    with db.session_scope() as session:
690
        cells_query = (
691
            session.query(demand_table, MapZensusGridDistricts.bus_id)
0 ignored issues
show
introduced by
The variable demand_table does not seem to be defined for all execution paths.
Loading history...
692
            .filter(demand_table.sector == "service")
693
            .filter(demand_table.scenario == scenario)
694
            .filter(
695
                demand_table.zensus_population_id
696
                == MapZensusGridDistricts.zensus_population_id
697
            )
698
        )
699
700
    df_demand = pd.read_sql(
701
        cells_query.statement,
702
        cells_query.session.bind,
703
        index_col="zensus_population_id",
704
    )
705
706
    # get demand share of cell per bus
707
    df_census_share = df_demand["demand"] / df_demand.groupby("bus_id")[
708
        "demand"
709
    ].transform("sum")
710
    df_census_share = df_census_share.rename("cell_share")
711
712
    df_census_share = pd.concat(
713
        [
714
            df_census_share,
715
            df_demand[["bus_id", "scenario"]],
716
        ],
717
        axis=1,
718
    )
719
720
    df_census_share.reset_index(inplace=True)
721
    return df_census_share
722
723
724
def calc_building_demand_profile_share(
725
    df_cts_buildings, scenario="eGon2035", sector="electricity"
726
):
727
    """
728
    Share of cts electricity demand profile per bus for every selected building
729
    is calculated. Building-amenity share is multiplied with census cell share
730
    to get the substation bus profile share for each building. The share is
731
    grouped and aggregated per building as some cover multiple cells.
732
733
    Parameters
734
    ----------
735
    df_cts_buildings: gpd.GeoDataFrame
736
        Table of all buildings with cts demand assigned
737
    scenario: str
738
        Scenario for which the share is calculated.
739
    sector: str
740
        Sector for which the share is calculated.
741
742
    Returns
743
    -------
744
    df_building_share: pd.DataFrame
745
        Table of bus profile share per building
746
747
    """
748
749
    def calc_building_amenity_share(df_cts_buildings):
750
        """
751
        Calculate the building share by the number amenities per building
752
        within a census cell.
753
        """
754
        df_building_amenity_share = df_cts_buildings[
755
            "n_amenities_inside"
756
        ] / df_cts_buildings.groupby("zensus_population_id")[
757
            "n_amenities_inside"
758
        ].transform(
759
            "sum"
760
        )
761
        df_building_amenity_share = pd.concat(
762
            [
763
                df_building_amenity_share.rename("building_amenity_share"),
764
                df_cts_buildings[["zensus_population_id", "id"]],
765
            ],
766
            axis=1,
767
        )
768
        return df_building_amenity_share
769
770
    df_building_amenity_share = calc_building_amenity_share(df_cts_buildings)
771
772
    df_census_cell_share = calc_census_cell_share(
773
        scenario=scenario, sector=sector
774
    )
775
776
    df_demand_share = pd.merge(
777
        left=df_building_amenity_share,
778
        right=df_census_cell_share,
779
        left_on="zensus_population_id",
780
        right_on="zensus_population_id",
781
    )
782
    df_demand_share["profile_share"] = df_demand_share[
783
        "building_amenity_share"
784
    ].multiply(df_demand_share["cell_share"])
785
786
    df_demand_share = df_demand_share[
787
        ["id", "bus_id", "scenario", "profile_share"]
788
    ]
789
    # TODO adapt groupby?
790
    # Group and aggregate per building for multi cell buildings
791
    df_demand_share = (
792
        df_demand_share.groupby(["scenario", "id", "bus_id"])
793
        .sum()
794
        .reset_index()
795
    )
796
    if df_demand_share.duplicated("id", keep=False).any():
797
        print(
798
            df_demand_share.loc[df_demand_share.duplicated("id", keep=False)]
799
        )
800
    return df_demand_share
801
802
803
def calc_building_profiles(
804
    egon_building_id=None,
805
    bus_id=None,
806
    scenario="eGon2035",
807
    sector="electricity",
808
):
809
    """
810
    Calculate the demand profile for each building. The profile is
811
    calculated by the demand share of the building per substation bus.
812
813
    Parameters
814
    ----------
815
    egon_building_id: int
816
        Id of the building for which the profile is calculated.
817
        If not given, the profiles are calculated for all buildings.
818
    bus_id: int
819
        Id of the substation for which the all profiles are calculated.
820
        If not given, the profiles are calculated for all buildings.
821
    scenario: str
822
        Scenario for which the share is calculated.
823
    sector: str
824
        Sector for which the share is calculated.
825
826
    Returns
827
    -------
828
    df_building_profiles: pd.DataFrame
829
        Table of demand profile per building
830
    """
831
    if sector == "electricity":
832
        with db.session_scope() as session:
833
            cells_query = session.query(
834
                EgonCtsElectricityDemandBuildingShare,
835
            ).filter(
836
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
837
            )
838
839
        df_demand_share = pd.read_sql(
840
            cells_query.statement, cells_query.session.bind, index_col=None
841
        )
842
843
        # TODO maybe use demand.egon_etrago_electricity_cts
844
        # with db.session_scope() as session:
845
        #     cells_query = (
846
        #         session.query(
847
        #             EgonEtragoElectricityCts
848
        #         ).filter(
849
        #             EgonEtragoElectricityCts.scn_name == scenario)
850
        #     )
851
        #
852
        # df_cts_profiles = pd.read_sql(
853
        #     cells_query.statement,
854
        #     cells_query.session.bind,
855
        # )
856
        # df_cts_profiles = pd.DataFrame.from_dict(
857
        #   df_cts_profiles.set_index('bus_id')['p_set'].to_dict(),
858
        #   orient="index")
859
        df_cts_profiles = calc_load_curves_cts(scenario)
860
861
    elif sector == "heat":
862
        with db.session_scope() as session:
863
            cells_query = session.query(
864
                EgonCtsHeatDemandBuildingShare,
865
            ).filter(EgonCtsHeatDemandBuildingShare.scenario == scenario)
866
867
        df_demand_share = pd.read_sql(
868
            cells_query.statement, cells_query.session.bind, index_col=None
869
        )
870
871
        # TODO cts heat substation profiles missing
872
873
    # get demand share of selected building id
874
    if isinstance(egon_building_id, int):
875
        if egon_building_id in df_demand_share["id"]:
0 ignored issues
show
introduced by
The variable df_demand_share does not seem to be defined for all execution paths.
Loading history...
876
            df_demand_share = df_demand_share.loc[
877
                df_demand_share["id"] == egon_building_id
878
            ]
879
        else:
880
            raise KeyError(f"Building with id {egon_building_id} not found")
881
    # TODO maybe add list
882
    # elif isinstance(egon_building_id, list):
883
884
    # get demand share of all buildings for selected bus id
885
    if isinstance(bus_id, int):
886
        if bus_id in df_demand_share["bus_id"]:
887
            df_demand_share = df_demand_share.loc[
888
                df_demand_share["bus_id"] == bus_id
889
            ]
890
        else:
891
            raise KeyError(f"Bus with id {bus_id} not found")
892
893
    # get demand profile for all buildings for selected demand share
894
    # TODO takes a few seconds per iteration
895
    df_building_profiles = pd.DataFrame()
896
    for bus_id, df in df_demand_share.groupby("bus_id"):
897
        shares = df.set_index("id", drop=True)["profile_share"]
898
        profile = df_cts_profiles.loc[:, bus_id]
0 ignored issues
show
introduced by
The variable df_cts_profiles does not seem to be defined for all execution paths.
Loading history...
899
        # building_profiles = profile.apply(lambda x: x * shares)
900
        building_profiles = np.outer(profile, shares)
901
        building_profiles = pd.DataFrame(
902
            building_profiles, index=profile.index, columns=shares.index
903
        )
904
        df_building_profiles = pd.concat(
905
            [df_building_profiles, building_profiles], axis=1
906
        )
907
908
    return df_building_profiles
909
910
911
def delete_synthetic_cts_buildings():
912
    """
913
    All synthetic cts buildings are deleted from the DB. This is necessary if
914
    the task is run multiple times as the existing synthetic buildings
915
    influence the results.
916
    """
917
    # import db tables
918
    from saio.openstreetmap import osm_buildings_synthetic
919
920
    # cells mit amenities
921
    with db.session_scope() as session:
922
        session.query(osm_buildings_synthetic).filter(
923
            osm_buildings_synthetic.building == "cts"
924
        ).delete()
925
926
927
def cts_buildings():
928
    """
929
    Assigns CTS demand to buildings and calculates the respective demand
930
    profiles. The demand profile per substation are disaggregated per
931
    annual demand share of each census cell and by the number of amenities
932
    per building within the cell. If no building data is available,
933
    synthetic buildings are generated around the amenities. If no amenities
934
    but cts demand is available, buildings are randomly selected. If no
935
    building nor amenity is available, random synthetic buildings are
936
    generated. The demand share is stored in the database.
937
938
    Note:
939
    -----
940
    Cells with CTS demand, amenities and buildings do not change within
941
    the scenarios, only the demand itself. Therefore scenario eGon2035
942
    can be used universally to determine the cts buildings but not for
943
    he demand share.
944
    """
945
946
    log = start_logging()
947
    log.info("Start logging!")
948
    # Buildings with amenities
949
    # df_buildings_with_amenities, df_lost_cells = buildings_with_amenities()
950
    df_buildings_with_amenities = buildings_with_amenities()
951
    log.info("Buildings with amenities selected!")
952
953
    # Median number of amenities per cell
954
    median_n_amenities = int(
955
        df_buildings_with_amenities.groupby("zensus_population_id")[
956
            "n_amenities_inside"
957
        ]
958
        .sum()
959
        .median()
960
    )
961
    # TODO remove
962
    print(f"Median amenity value: {median_n_amenities}")
963
964
    # Remove synthetic CTS buildings if existing
965
    delete_synthetic_cts_buildings()
966
    log.info("Old synthetic cts buildings deleted!")
967
968
    # Amenities not assigned to buildings
969
    df_amenities_without_buildings = amenities_without_buildings()
970
    log.info("Amenities without buildlings selected!")
971
972
    # # Append lost cells due to duplicated ids, to cover all demand cells
973
    # if not df_lost_cells.empty:
974
    #
975
    #     df_lost_cells["amenities"] = median_n_amenities
976
    #     # create row for every amenity
977
    #     df_lost_cells["amenities"] = (
978
    #         df_lost_cells["amenities"].astype(int).apply(range)
979
    #     )
980
    #     df_lost_cells = df_lost_cells.explode("amenities")
981
    #     df_lost_cells.drop(columns="amenities", inplace=True)
982
    #     df_amenities_without_buildings = df_amenities_without_buildings.append(
983
    #         df_lost_cells, ignore_index=True
984
    #     )
985
    #     log.info("Lost cells due to substation intersection appended!")
986
987
    # One building per amenity
988
    df_amenities_without_buildings["n_amenities_inside"] = 1
989
    # Create synthetic buildings for amenites without buildings
990
    df_synthetic_buildings_with_amenities = create_synthetic_buildings(
991
        df_amenities_without_buildings, points="geom_amenity"
992
    )
993
    log.info("Synthetic buildings created!")
994
995
    # TODO write to DB and remove renaming
996
    write_table_to_postgis(
997
        df_synthetic_buildings_with_amenities.rename(
998
            columns={
999
                "zensus_population_id": "cell_id",
1000
                "egon_building_id": "id",
1001
            }
1002
        ),
1003
        OsmBuildingsSynthetic,
1004
        drop=False,
1005
    )
1006
    log.info("Synthetic buildings exported to DB!")
1007
1008
    # Cells without amenities but CTS demand and buildings
1009
    df_buildings_without_amenities = buildings_without_amenities()
1010
    log.info("Buildings without amenities in demand cells identified!")
1011
1012
    # TODO Fix Adhoc Bugfix duplicated buildings
1013
    # drop building ids which have already been used
1014
    mask = df_buildings_without_amenities.loc[
1015
        df_buildings_without_amenities["id"].isin(
1016
            df_buildings_with_amenities["id"]
1017
        )
1018
    ].index
1019
    df_buildings_without_amenities = df_buildings_without_amenities.drop(
1020
        index=mask
1021
    ).reset_index(drop=True)
1022
    log.info(f"{len(mask)} duplicated ids removed!")
1023
1024
    # select median n buildings per cell
1025
    df_buildings_without_amenities = select_cts_buildings(
1026
        df_buildings_without_amenities, max_n=median_n_amenities
1027
    )
1028
    df_buildings_without_amenities["n_amenities_inside"] = 1
1029
    log.info(f"{median_n_amenities} buildings per cell selected!")
1030
1031
    # Create synthetic amenities and buildings in cells with only CTS demand
1032
    df_cells_with_cts_demand_only = cells_with_cts_demand_only(
1033
        df_buildings_without_amenities
1034
    )
1035
    log.info("Cells with only demand identified!")
1036
1037
    # Median n Amenities per cell
1038
    df_cells_with_cts_demand_only["amenities"] = median_n_amenities
1039
    # create row for every amenity
1040
    df_cells_with_cts_demand_only["amenities"] = (
1041
        df_cells_with_cts_demand_only["amenities"].astype(int).apply(range)
1042
    )
1043
    df_cells_with_cts_demand_only = df_cells_with_cts_demand_only.explode(
1044
        "amenities"
1045
    )
1046
    df_cells_with_cts_demand_only.drop(columns="amenities", inplace=True)
1047
1048
    # Only 1 Amenity per Building
1049
    df_cells_with_cts_demand_only["n_amenities_inside"] = 1
1050
    df_cells_with_cts_demand_only = place_buildings_with_amenities(
1051
        df_cells_with_cts_demand_only, amenities=1
1052
    )
1053
    df_synthetic_buildings_without_amenities = create_synthetic_buildings(
1054
        df_cells_with_cts_demand_only, points="geom_point"
1055
    )
1056
    log.info(f"{median_n_amenities} synthetic buildings per cell created")
1057
1058
    # TODO write to DB and remove (backup) renaming
1059
    write_table_to_postgis(
1060
        df_synthetic_buildings_without_amenities.rename(
1061
            columns={
1062
                "zensus_population_id": "cell_id",
1063
                "egon_building_id": "id",
1064
            }
1065
        ),
1066
        OsmBuildingsSynthetic,
1067
        drop=False,
1068
    )
1069
    log.info("Synthetic buildings exported to DB")
1070
1071
    # Concat all buildings
1072
    columns = [
1073
        "zensus_population_id",
1074
        "id",
1075
        "geom_building",
1076
        "n_amenities_inside",
1077
        "source",
1078
    ]
1079
1080
    df_buildings_with_amenities["source"] = "bwa"
1081
    df_synthetic_buildings_with_amenities["source"] = "sbwa"
1082
    df_buildings_without_amenities["source"] = "bwoa"
1083
    df_synthetic_buildings_without_amenities["source"] = "sbwoa"
1084
1085
    df_cts_buildings = pd.concat(
1086
        [
1087
            df_buildings_with_amenities[columns],
1088
            df_synthetic_buildings_with_amenities[columns],
1089
            df_buildings_without_amenities[columns],
1090
            df_synthetic_buildings_without_amenities[columns],
1091
        ],
1092
        axis=0,
1093
        ignore_index=True,
1094
    )
1095
    # TODO maybe remove after #772
1096
    df_cts_buildings["id"] = df_cts_buildings["id"].astype(int)
1097
1098
    # Write table to db for debugging
1099
    # TODO remove later
1100
    df_cts_buildings = gpd.GeoDataFrame(
1101
        df_cts_buildings, geometry="geom_building", crs=3035
1102
    )
1103
    df_cts_buildings = df_cts_buildings.reset_index().rename(
1104
        columns={"index": "serial"}
1105
    )
1106
    write_table_to_postgis(
1107
        df_cts_buildings,
1108
        CtsBuildings,
1109
        drop=True,
1110
    )
1111
    log.info("CTS buildings exported to DB!")
1112
1113
1114 View Code Duplication
def cts_electricity():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1115
    """
1116
    Calculate cts electricity demand share of hvmv substation profile
1117
     for buildings.
1118
    """
1119
    log = start_logging()
1120
    log.info("Start logging!")
1121
    with db.session_scope() as session:
1122
        cells_query = session.query(CtsBuildings)
1123
1124
    df_cts_buildings = pd.read_sql(
1125
        cells_query.statement, cells_query.session.bind, index_col=None
1126
    )
1127
    log.info("CTS buildings from DB imported!")
1128
    df_demand_share_2035 = calc_building_demand_profile_share(
1129
        df_cts_buildings, scenario="eGon2035", sector="electricity"
1130
    )
1131
    log.info("Profile share for egon2035 calculated!")
1132
    df_demand_share_100RE = calc_building_demand_profile_share(
1133
        df_cts_buildings, scenario="eGon100RE", sector="electricity"
1134
    )
1135
    log.info("Profile share for egon100RE calculated!")
1136
    df_demand_share = pd.concat(
1137
        [df_demand_share_2035, df_demand_share_100RE],
1138
        axis=0,
1139
        ignore_index=True,
1140
    )
1141
1142
    write_table_to_postgres(
1143
        df_demand_share, EgonCtsElectricityDemandBuildingShare, drop=True
1144
    )
1145
    log.info("Profile share exported to DB!")
1146
1147
1148 View Code Duplication
def cts_heat():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1149
    """
1150
    Calculate cts electricity demand share of hvmv substation profile
1151
     for buildings.
1152
    """
1153
    log = start_logging()
1154
    log.info("Start logging!")
1155
    with db.session_scope() as session:
1156
        cells_query = session.query(CtsBuildings)
1157
1158
    df_cts_buildings = pd.read_sql(
1159
        cells_query.statement, cells_query.session.bind, index_col=None
1160
    )
1161
    log.info("CTS buildings from DB imported!")
1162
1163
    df_demand_share_2035 = calc_building_demand_profile_share(
1164
        df_cts_buildings, scenario="eGon2035", sector="heat"
1165
    )
1166
    log.info("Profile share for egon2035 calculated!")
1167
    df_demand_share_100RE = calc_building_demand_profile_share(
1168
        df_cts_buildings, scenario="eGon100RE", sector="heat"
1169
    )
1170
    log.info("Profile share for egon100RE calculated!")
1171
    df_demand_share = pd.concat(
1172
        [df_demand_share_2035, df_demand_share_100RE],
1173
        axis=0,
1174
        ignore_index=True,
1175
    )
1176
1177
    write_table_to_postgres(
1178
        df_demand_share, EgonCtsHeatDemandBuildingShare, drop=True
1179
    )
1180
    log.info("Profile share exported to DB!")
1181
1182
1183
def get_cts_electricity_peak_load():
1184
    """
1185
    Get peak load of all CTS buildings for both scenarios and store in DB.
1186
    """
1187
    log = start_logging()
1188
    log.info("Start logging!")
1189
    # Delete rows with cts demand
1190
    with db.session_scope() as session:
1191
        session.query(BuildingPeakLoads).filter(
1192
            BuildingPeakLoads.sector == "cts"
1193
        ).delete()
1194
    log.info("CTS Peak load removed from DB!")
1195
1196
    for scenario in ["eGon2035", "eGon100RE"]:
1197
1198
        with db.session_scope() as session:
1199
            cells_query = session.query(
1200
                EgonCtsElectricityDemandBuildingShare
1201
            ).filter(
1202
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
1203
            )
1204
1205
        df_demand_share = pd.read_sql(
1206
            cells_query.statement, cells_query.session.bind, index_col=None
1207
        )
1208
1209
        df_cts_profiles = calc_load_curves_cts(scenario=scenario)
1210
1211
        df_peak_load = pd.merge(
1212
            left=df_cts_profiles.max(axis=0).astype(float).rename("max"),
1213
            right=df_demand_share,
1214
            left_on="bus_id",
1215
            right_on="bus_id",
1216
        )
1217
1218
        # Convert unit from MWh to W
1219
        df_peak_load["max"] = df_peak_load["max"] * 1e6
1220
        df_peak_load["peak_load_in_w"] = (
1221
            df_peak_load["max"] * df_peak_load["profile_share"]
1222
        )
1223
        log.info(f"Peak load for {scenario} determined!")
1224
1225
        df_peak_load.rename(columns={"id": "building_id"}, inplace=True)
1226
        df_peak_load["sector"] = "cts"
1227
1228
        df_peak_load = df_peak_load[
1229
            ["building_id", "sector", "scenario", "peak_load_in_w"]
1230
        ]
1231
1232
        # Write peak loads into db
1233
        with db.session_scope() as session:
1234
            session.bulk_insert_mappings(
1235
                BuildingPeakLoads,
1236
                df_peak_load.to_dict(orient="records"),
1237
            )
1238
        log.info(f"Peak load for {scenario} exported to DB!")
1239
1240
1241
class CtsElectricityBuildings(Dataset):
1242
    def __init__(self, dependencies):
1243
        super().__init__(
1244
            name="CtsElectricityBuildings",
1245
            version="0.0.0",
1246
            dependencies=dependencies,
1247
            tasks=(
1248
                cts_buildings,
1249
                {cts_electricity, cts_heat},
1250
                get_cts_electricity_peak_load,
1251
            ),
1252
        )
1253