Passed
Pull Request — dev (#826)
by
unknown
01:28
created

data.datasets.electricity_demand_timeseries.cts_buildings   B

Complexity

Total Complexity 49

Size/Duplication

Total Lines 1278
Duplicated Lines 5.09 %

Importance

Changes 0
Metric Value
wmc 49
eloc 577
dl 65
loc 1278
rs 8.48
c 0
b 0
f 0

16 Functions

Rating   Name   Duplication   Size   Complexity  
A place_buildings_with_amenities() 0 41 3
B create_synthetic_buildings() 0 67 6
A amenities_without_buildings() 0 60 2
A start_logging() 0 11 1
A select_cts_buildings() 0 25 1
B cells_with_cts_demand_only() 0 72 2
A calc_census_cell_share() 0 56 4
B buildings_without_amenities() 0 99 2
A cts_heat() 33 33 2
A delete_synthetic_cts_buildings() 0 14 2
B get_cts_electricity_peak_load() 0 56 5
C calc_building_profiles() 0 106 10
B calc_building_demand_profile_share() 0 104 3
B buildings_with_amenities() 0 198 2
A cts_electricity() 32 32 2
B cts_buildings() 0 185 1

1 Method

Rating   Name   Duplication   Size   Complexity  
A CtsElectricityBuildings.__init__() 0 9 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.electricity_demand_timeseries.cts_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
2
3
from geoalchemy2 import Geometry
4
from geoalchemy2.shape import to_shape
5
from sqlalchemy import REAL, Column, Integer, String, func
6
from sqlalchemy.ext.declarative import declarative_base
7
import geopandas as gpd
8
import numpy as np
9
import pandas as pd
10
import saio
11
12
from egon.data import db
13
from egon.data.datasets import Dataset
14
from egon.data.datasets.electricity_demand import (
15
    EgonDemandRegioZensusElectricity,
16
)
17
from egon.data.datasets.electricity_demand.temporal import (
18
    EgonEtragoElectricityCts,
19
    calc_load_curves_cts,
20
)
21
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
22
    BuildingPeakLoads,
23
    OsmBuildingsSynthetic,
24
)
25
from egon.data.datasets.electricity_demand_timeseries.tools import (
26
    random_ints_until_sum,
27
    random_point_in_square,
28
    specific_int_until_sum,
29
    write_table_to_postgis,
30
    write_table_to_postgres,
31
)
32
from egon.data.datasets.heat_demand import EgonPetaHeat
33
from egon.data.datasets.zensus_mv_grid_districts import MapZensusGridDistricts
34
from egon.data.datasets.zensus_vg250 import DestatisZensusPopulationPerHa
35
36
engine = db.engine()
37
Base = declarative_base()
38
39
# import db tables
40
saio.register_schema("openstreetmap", engine=engine)
41
saio.register_schema("boundaries", engine=engine)
42
43
44
class EgonCtsElectricityDemandBuildingShare(Base):
45
    __tablename__ = "egon_cts_electricity_demand_building_share"
46
    __table_args__ = {"schema": "demand"}
47
48
    id = Column(Integer, primary_key=True)
49
    scenario = Column(String, primary_key=True)
50
    bus_id = Column(Integer, index=True)
51
    profile_share = Column(REAL)
52
53
54
class EgonCtsHeatDemandBuildingShare(Base):
55
    __tablename__ = "egon_cts_heat_demand_building_share"
56
    __table_args__ = {"schema": "demand"}
57
58
    id = Column(Integer, primary_key=True)
59
    scenario = Column(String, primary_key=True)
60
    bus_id = Column(Integer, index=True)
61
    profile_share = Column(REAL)
62
63
64
class CtsBuildings(Base):
65
    __tablename__ = "egon_cts_buildings"
66
    __table_args__ = {"schema": "openstreetmap"}
67
68
    serial = Column(Integer, primary_key=True)
69
    id = Column(Integer, index=True)
70
    zensus_population_id = Column(Integer, index=True)
71
    geom_building = Column(Geometry("Polygon", 3035))
72
    n_amenities_inside = Column(Integer)
73
    source = Column(String)
74
75
76
def start_logging():
77
    """Start logging into console"""
78
    log = logging.getLogger()
79
    log.setLevel(logging.INFO)
80
    logformat = logging.Formatter(
81
        "%(asctime)s %(message)s", "%m/%d/%Y %H:%M:%S"
82
    )
83
    sh = logging.StreamHandler()
84
    sh.setFormatter(logformat)
85
    log.addHandler(sh)
86
    return log
87
88
89
def amenities_without_buildings():
90
    """
91
    Amenities which have no buildings assigned and are in
92
    a cell with cts demand are determined.
93
94
    Returns
95
    -------
96
    pd.DataFrame
97
        Table of amenities without buildings
98
    """
99
    from saio.openstreetmap import osm_amenities_not_in_buildings_filtered
100
101
    with db.session_scope() as session:
102
        cells_query = (
103
            session.query(
104
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
105
                # TODO can be used for square around amenity
106
                #  (1 geom_amenity: 1 geom_building)
107
                #  not unique amenity_ids yet
108
                osm_amenities_not_in_buildings_filtered.geom_amenity,
109
                osm_amenities_not_in_buildings_filtered.egon_amenity_id,
110
                # EgonDemandRegioZensusElectricity.demand,
111
                # # TODO can be used to generate n random buildings
112
                # # (n amenities : 1 randombuilding)
113
                # func.count(
114
                #     osm_amenities_not_in_buildings_filtered.egon_amenity_id
115
                # ).label("n_amenities_inside"),
116
                # DestatisZensusPopulationPerHa.geom,
117
            )
118
            .filter(
119
                func.st_within(
120
                    osm_amenities_not_in_buildings_filtered.geom_amenity,
121
                    DestatisZensusPopulationPerHa.geom,
122
                )
123
            )
124
            .filter(
125
                DestatisZensusPopulationPerHa.id
126
                == EgonDemandRegioZensusElectricity.zensus_population_id
127
            )
128
            .filter(
129
                EgonDemandRegioZensusElectricity.sector == "service",
130
                EgonDemandRegioZensusElectricity.scenario == "eGon2035"
131
                #         ).group_by(
132
                #             EgonDemandRegioZensusElectricity.zensus_population_id,
133
                #             DestatisZensusPopulationPerHa.geom,
134
            )
135
        )
136
    # # TODO can be used to generate n random buildings
137
    # df_cells_with_amenities_not_in_buildings = gpd.read_postgis(
138
    #     cells_query.statement, cells_query.session.bind, geom_col="geom"
139
    # )
140
    #
141
142
    # # TODO can be used for square around amenity
143
    df_amenities_without_buildings = gpd.read_postgis(
144
        cells_query.statement,
145
        cells_query.session.bind,
146
        geom_col="geom_amenity",
147
    )
148
    return df_amenities_without_buildings
149
150
151
def place_buildings_with_amenities(df, amenities=None, max_amenities=None):
152
    """
153
    Building centroids are placed randomly within census cells.
154
    The Number of buildings is derived from n_amenity_inside, the selected
155
    method and number of amenities per building.
156
157
    Returns
158
    -------
159
    df: gpd.GeoDataFrame
160
        Table of buildings centroids
161
    """
162
    if isinstance(max_amenities, int):
163
        # amount of amenities is randomly generated within bounds
164
        # (max_amenities, amenities per cell)
165
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
166
            random_ints_until_sum, args=[max_amenities]
167
        )
168
    if isinstance(amenities, int):
169
        # Specific amount of amenities per building
170
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
171
            specific_int_until_sum, args=[amenities]
172
        )
173
174
    # Unnest each building
175
    df = df.explode(column="n_amenities_inside")
176
177
    # building count per cell
178
    df["building_count"] = df.groupby(["zensus_population_id"]).cumcount() + 1
179
180
    # generate random synthetic buildings
181
    edge_length = 5
182
    # create random points within census cells
183
    points = random_point_in_square(geom=df["geom"], tol=edge_length / 2)
184
185
    df.reset_index(drop=True, inplace=True)
186
    # Store center of polygon
187
    df["geom_point"] = points
188
    # Drop geometry of census cell
189
    df = df.drop(columns=["geom"])
190
191
    return df
192
193
194
def create_synthetic_buildings(df, points=None, crs="EPSG:3035"):
195
    """
196
    Synthetic buildings are generated around points.
197
198
    Parameters
199
    ----------
200
    df: pd.DataFrame
201
        Table of census cells
202
    points: gpd.GeoSeries or str
203
        List of points to place buildings around or column name of df
204
    crs: str
205
        CRS of result table
206
207
    Returns
208
    -------
209
    df: gpd.GeoDataFrame
210
        Synthetic buildings
211
    """
212
213
    if isinstance(points, str) and points in df.columns:
214
        points = df[points]
215
    elif isinstance(points, gpd.GeoSeries):
216
        pass
217
    else:
218
        raise ValueError("Points are of the wrong type")
219
220
    # Create building using a square around point
221
    edge_length = 5
222
    df["geom_building"] = points.buffer(distance=edge_length / 2, cap_style=3)
223
224
    if "geom_point" not in df.columns:
225
        df["geom_point"] = df["geom_building"].centroid
226
227
    # TODO Check CRS
228
    df = gpd.GeoDataFrame(
229
        df,
230
        crs=crs,
231
        geometry="geom_building",
232
    )
233
234
    # TODO remove after implementation of egon_building_id
235
    df.rename(columns={"id": "egon_building_id"}, inplace=True)
236
237
    # get max number of building ids from synthetic residential table
238
    with db.session_scope() as session:
239
        max_synth_residential_id = session.execute(
240
            func.max(OsmBuildingsSynthetic.id)
241
        ).scalar()
242
    max_synth_residential_id = int(max_synth_residential_id)
243
244
    # create sequential ids
245
    df["egon_building_id"] = range(
246
        max_synth_residential_id + 1,
247
        max_synth_residential_id + df.shape[0] + 1,
248
    )
249
250
    df["area"] = df["geom_building"].area
251
    # set building type of synthetic building
252
    df["building"] = "cts"
253
    # TODO remove in #772
254
    df = df.rename(
255
        columns={
256
            # "zensus_population_id": "cell_id",
257
            "egon_building_id": "id",
258
        }
259
    )
260
    return df
261
262
263
def buildings_with_amenities():
264
    """
265
    Amenities which are assigned to buildings are determined
266
    and grouped per building and census cell. Buildings
267
    covering multiple cells therefore exists multiple times
268
    but in different census cells. This is necessary to cover
269
    all cells with a cts demand. The zensus_population_id therefore
270
    refers to the amenities of the building. The building is assigned to a
271
    substation via the census cells of its centroid. Therefore a building can
272
    have demand shares of multiple cells but is assigned to only one cell and
273
    substation. The buildings are aggregated afterwards during the calculation
274
    of the profile_share.
275
276
    If buildings exist in multiple
277
    substations, their amenities are summed and assigned and kept in
278
    one substation only. If as a result, a census cell is uncovered,
279
    a synthetic amenity is placed. The buildings are aggregated
280
    afterwards during the calculation of the profile_share.
281
282
    Returns
283
    -------
284
    df_buildings_with_amenities: gpd.GeoDataFrame
285
        Contains all buildings with amenities per zensus cell.
286
    df_lost_cells: gpd.GeoDataFrame
287
        Contains synthetic amenities in lost cells. Might be empty
288
    """
289
290
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
291
    from saio.openstreetmap import osm_amenities_in_buildings_filtered
292
293
    with db.session_scope() as session:
294
        cells_query = (
295
            session.query(
296
                osm_amenities_in_buildings_filtered.id,
297
                osm_amenities_in_buildings_filtered.geom_building,
298
                osm_amenities_in_buildings_filtered.zensus_population_id,
299
                # egon_map_zensus_buildings_filtered_all.zensus_population_id,
300
                # MapZensusGridDistricts.bus_id,
301
            )
302
            .filter(
303
                egon_map_zensus_buildings_filtered_all.id
304
                == osm_amenities_in_buildings_filtered.id
305
            )
306
            # .filter(
307
            #     MapZensusGridDistricts.zensus_population_id
308
            #     == egon_map_zensus_buildings_filtered_all.zensus_population_id
309
            # )
310
            .filter(
311
                EgonDemandRegioZensusElectricity.zensus_population_id
312
                == egon_map_zensus_buildings_filtered_all.zensus_population_id
313
            )
314
            .filter(
315
                EgonDemandRegioZensusElectricity.sector == "service",
316
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
317
            )
318
        )
319
    df_amenities_in_buildings = pd.read_sql(
320
        cells_query.statement, cells_query.session.bind, index_col=None
321
    )
322
323
    df_amenities_in_buildings["geom_building"] = df_amenities_in_buildings[
324
        "geom_building"
325
    ].apply(to_shape)
326
    # df_amenities_in_buildings["geom_amenity"] = df_amenities_in_buildings[
327
    #     "geom_amenity"
328
    # ].apply(to_shape)
329
330
    df_amenities_in_buildings["n_amenities_inside"] = 1
331
332
    # # add identifier column for buildings in multiple substations
333
    # df_amenities_in_buildings[
334
    #     "duplicate_identifier"
335
    # ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
336
    #     "n_amenities_inside"
337
    # ].transform(
338
    #     "cumsum"
339
    # )
340
    # df_amenities_in_buildings = df_amenities_in_buildings.sort_values(
341
    #     ["id", "duplicate_identifier"]
342
    # )
343
    # # sum amenities of buildings with multiple substations
344
    # df_amenities_in_buildings[
345
    #     "n_amenities_inside"
346
    # ] = df_amenities_in_buildings.groupby(["id", "duplicate_identifier"])[
347
    #     "n_amenities_inside"
348
    # ].transform(
349
    #     "sum"
350
    # )
351
    #
352
    # # create column to always go for bus_id with max amenities
353
    # df_amenities_in_buildings[
354
    #     "max_amenities"
355
    # ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
356
    #     "n_amenities_inside"
357
    # ].transform(
358
    #     "sum"
359
    # )
360
    # # sort to go for
361
    # df_amenities_in_buildings.sort_values(
362
    #     ["id", "max_amenities"], ascending=False, inplace=True
363
    # )
364
    #
365
    # # identify lost zensus cells
366
    # df_lost_cells = df_amenities_in_buildings.loc[
367
    #     df_amenities_in_buildings.duplicated(
368
    #         subset=["id", "duplicate_identifier"], keep="first"
369
    #     )
370
    # ]
371
    # df_lost_cells.drop_duplicates(
372
    #     subset=["zensus_population_id"], inplace=True
373
    # )
374
    #
375
    # # drop buildings with multiple substation and lower max amenity
376
    # df_amenities_in_buildings.drop_duplicates(
377
    #     subset=["id", "duplicate_identifier"], keep="first", inplace=True
378
    # )
379
    #
380
    # # check if lost zensus cells are already covered
381
    # if not df_lost_cells.empty:
382
    #     if not (
383
    #         df_amenities_in_buildings["zensus_population_id"]
384
    #         .isin(df_lost_cells["zensus_population_id"])
385
    #         .empty
386
    #     ):
387
    #         # query geom data for cell if not
388
    #         with db.session_scope() as session:
389
    #             cells_query = session.query(
390
    #                 DestatisZensusPopulationPerHa.id,
391
    #                 DestatisZensusPopulationPerHa.geom,
392
    #             ).filter(
393
    #                 DestatisZensusPopulationPerHa.id.in_(
394
    #                     df_lost_cells["zensus_population_id"]
395
    #                 )
396
    #             )
397
    #
398
    #         df_lost_cells = gpd.read_postgis(
399
    #             cells_query.statement,
400
    #             cells_query.session.bind,
401
    #             geom_col="geom",
402
    #         )
403
    #         # TODO maybe adapt method
404
    #         # place random amenity in cell
405
    #         df_lost_cells["n_amenities_inside"] = 1
406
    #         df_lost_cells.rename(
407
    #             columns={
408
    #                 "id": "zensus_population_id",
409
    #             },
410
    #             inplace=True,
411
    #         )
412
    #         df_lost_cells = place_buildings_with_amenities(
413
    #             df_lost_cells, amenities=1
414
    #         )
415
    #         df_lost_cells.rename(
416
    #             columns={
417
    #                 # "id": "zensus_population_id",
418
    #                 "geom_point": "geom_amenity",
419
    #             },
420
    #             inplace=True,
421
    #         )
422
    #         df_lost_cells.drop(
423
    #             columns=["building_count", "n_amenities_inside"], inplace=True
424
    #         )
425
    #     else:
426
    #         df_lost_cells = None
427
    # else:
428
    #     df_lost_cells = None
429
    #
430
    # # drop helper columns
431
    # df_amenities_in_buildings.drop(
432
    #     columns=["duplicate_identifier"], inplace=True
433
    # )
434
435
    # sum amenities per building and cell
436
    df_amenities_in_buildings[
437
        "n_amenities_inside"
438
    ] = df_amenities_in_buildings.groupby(["zensus_population_id", "id"])[
439
        "n_amenities_inside"
440
    ].transform(
441
        "sum"
442
    )
443
    # drop duplicated buildings
444
    df_buildings_with_amenities = df_amenities_in_buildings.drop_duplicates(
445
        ["id", "zensus_population_id"]
446
    )
447
    df_buildings_with_amenities.reset_index(inplace=True, drop=True)
448
449
    df_buildings_with_amenities = df_buildings_with_amenities[
450
        ["id", "zensus_population_id", "geom_building", "n_amenities_inside"]
451
    ]
452
    df_buildings_with_amenities.rename(
453
        columns={
454
            # "zensus_population_id": "cell_id",
455
            "egon_building_id": "id"
456
        },
457
        inplace=True,
458
    )
459
460
    return df_buildings_with_amenities
461
    # return df_buildings_with_amenities, df_lost_cells
462
463
464
def buildings_without_amenities():
465
    """
466
    Buildings (filtered and synthetic) in cells with
467
    cts demand but no amenities are determined.
468
469
    Returns
470
    -------
471
    df_buildings_without_amenities: gpd.GeoDataFrame
472
        Table of buildings without amenities in zensus cells
473
        with cts demand.
474
    """
475
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
476
    from saio.openstreetmap import (
477
        osm_amenities_shops_filtered,
478
        osm_buildings_filtered,
479
        osm_buildings_synthetic,
480
    )
481
482
    # buildings_filtered in cts-demand-cells without amenities
483
    with db.session_scope() as session:
484
485
        # Synthetic Buildings
486
        q_synth_buildings = session.query(
487
            osm_buildings_synthetic.cell_id.cast(Integer).label(
488
                "zensus_population_id"
489
            ),
490
            osm_buildings_synthetic.id.cast(Integer).label("id"),
491
            osm_buildings_synthetic.area.label("area"),
492
            osm_buildings_synthetic.geom_building.label("geom_building"),
493
            osm_buildings_synthetic.geom_point.label("geom_point"),
494
        )
495
496
        # Buildings filtered
497
        q_buildings_filtered = session.query(
498
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
499
            osm_buildings_filtered.id,
500
            osm_buildings_filtered.area,
501
            osm_buildings_filtered.geom_building,
502
            osm_buildings_filtered.geom_point,
503
        ).filter(
504
            osm_buildings_filtered.id
505
            == egon_map_zensus_buildings_filtered_all.id
506
        )
507
508
        # Amenities + zensus_population_id
509
        q_amenities = (
510
            session.query(
511
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
512
            )
513
            .filter(
514
                func.st_within(
515
                    osm_amenities_shops_filtered.geom_amenity,
516
                    DestatisZensusPopulationPerHa.geom,
517
                )
518
            )
519
            .distinct(DestatisZensusPopulationPerHa.id)
520
        )
521
522
        # Cells with CTS demand but without amenities
523
        q_cts_without_amenities = (
524
            session.query(
525
                EgonDemandRegioZensusElectricity.zensus_population_id,
526
            )
527
            .filter(
528
                EgonDemandRegioZensusElectricity.sector == "service",
529
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
530
            )
531
            .filter(
532
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
533
                    q_amenities
534
                )
535
            )
536
            .distinct()
537
        )
538
539
        # Buildings filtered + synthetic buildings residential in
540
        # cells with CTS demand but without amenities
541
        cells_query = q_synth_buildings.union(q_buildings_filtered).filter(
542
            egon_map_zensus_buildings_filtered_all.zensus_population_id.in_(
543
                q_cts_without_amenities
544
            )
545
        )
546
547
    # df_buildings_without_amenities = pd.read_sql(
548
    #     cells_query.statement, cells_query.session.bind, index_col=None)
549
    df_buildings_without_amenities = gpd.read_postgis(
550
        cells_query.statement,
551
        cells_query.session.bind,
552
        geom_col="geom_building",
553
    )
554
555
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
556
        columns={
557
            # "zensus_population_id": "cell_id",
558
            "egon_building_id": "id",
559
        }
560
    )
561
562
    return df_buildings_without_amenities
563
564
565
def select_cts_buildings(df_buildings_wo_amenities, max_n):
566
    """
567
    N Buildings (filtered and synthetic) in each cell with
568
    cts demand are selected. Only the first n buildings
569
    are taken for each cell. The buildings are sorted by surface
570
    area.
571
572
    Returns
573
    -------
574
    df_buildings_with_cts_demand: gpd.GeoDataFrame
575
        Table of buildings
576
    """
577
578
    df_buildings_wo_amenities.sort_values(
579
        "area", ascending=False, inplace=True
580
    )
581
    # select first n ids each census cell if available
582
    df_buildings_with_cts_demand = (
583
        df_buildings_wo_amenities.groupby("zensus_population_id")
584
        .nth(list(range(max_n)))
585
        .reset_index()
586
    )
587
    df_buildings_with_cts_demand.reset_index(drop=True, inplace=True)
588
589
    return df_buildings_with_cts_demand
590
591
592
def cells_with_cts_demand_only(df_buildings_without_amenities):
593
    """
594
    Cells with cts demand but no amenities or buildilngs
595
    are determined.
596
597
    Returns
598
    -------
599
    df_cells_only_cts_demand: gpd.GeoDataFrame
600
        Table of cells with cts demand but no amenities or buildings
601
    """
602
    from saio.openstreetmap import osm_amenities_shops_filtered
603
604
    # cells mit amenities
605
    with db.session_scope() as session:
606
        sub_query = (
607
            session.query(
608
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
609
            )
610
            .filter(
611
                func.st_within(
612
                    osm_amenities_shops_filtered.geom_amenity,
613
                    DestatisZensusPopulationPerHa.geom,
614
                )
615
            )
616
            .distinct(DestatisZensusPopulationPerHa.id)
617
        )
618
619
        cells_query = (
620
            session.query(
621
                EgonDemandRegioZensusElectricity.zensus_population_id,
622
                EgonDemandRegioZensusElectricity.scenario,
623
                EgonDemandRegioZensusElectricity.sector,
624
                EgonDemandRegioZensusElectricity.demand,
625
                DestatisZensusPopulationPerHa.geom,
626
            )
627
            .filter(
628
                EgonDemandRegioZensusElectricity.sector == "service",
629
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
630
            )
631
            .filter(
632
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
633
                    sub_query
634
                )
635
            )
636
            .filter(
637
                EgonDemandRegioZensusElectricity.zensus_population_id
638
                == DestatisZensusPopulationPerHa.id
639
            )
640
        )
641
642
    df_cts_cell_without_amenities = gpd.read_postgis(
643
        cells_query.statement,
644
        cells_query.session.bind,
645
        geom_col="geom",
646
        index_col=None,
647
    )
648
649
    # TODO maybe remove
650
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
651
        columns={"cell_id": "zensus_population_id"}
652
    )
653
654
    # Census cells with only cts demand
655
    df_cells_only_cts_demand = df_cts_cell_without_amenities.loc[
656
        ~df_cts_cell_without_amenities["zensus_population_id"].isin(
657
            df_buildings_without_amenities["zensus_population_id"].unique()
658
        )
659
    ]
660
661
    df_cells_only_cts_demand.reset_index(drop=True, inplace=True)
662
663
    return df_cells_only_cts_demand
664
665
666
def calc_census_cell_share(scenario="eGon2035", sector="electricity"):
667
    """
668
    The profile share for each census cell is calculated by it's
669
    share of annual demand per substation bus. The annual demand
670
    per cell is defined by DemandRegio/Peta5. The share is for both
671
    scenarios identical as the annual demand is linearly scaled.
672
673
    Parameters
674
    ----------
675
    scenario: str
676
        Scenario for which the share is calculated.
677
    sector: str
678
        Scenario for which the share is calculated.
679
680
    Returns
681
    -------
682
    df_census_share: pd.DataFrame
683
    """
684
    if sector == "electricity":
685
        demand_table = EgonDemandRegioZensusElectricity
686
    elif sector == "heat":
687
        demand_table = EgonPetaHeat
688
689
    with db.session_scope() as session:
690
        cells_query = (
691
            session.query(demand_table, MapZensusGridDistricts.bus_id)
0 ignored issues
show
introduced by
The variable demand_table does not seem to be defined for all execution paths.
Loading history...
692
            .filter(demand_table.sector == "service")
693
            .filter(demand_table.scenario == scenario)
694
            .filter(
695
                demand_table.zensus_population_id
696
                == MapZensusGridDistricts.zensus_population_id
697
            )
698
        )
699
700
    df_demand = pd.read_sql(
701
        cells_query.statement,
702
        cells_query.session.bind,
703
        index_col="zensus_population_id",
704
    )
705
706
    # get demand share of cell per bus
707
    df_census_share = df_demand["demand"] / df_demand.groupby("bus_id")[
708
        "demand"
709
    ].transform("sum")
710
    df_census_share = df_census_share.rename("cell_share")
711
712
    df_census_share = pd.concat(
713
        [
714
            df_census_share,
715
            df_demand[["bus_id", "scenario"]],
716
        ],
717
        axis=1,
718
    )
719
720
    df_census_share.reset_index(inplace=True)
721
    return df_census_share
722
723
724
def calc_building_demand_profile_share(
725
    df_cts_buildings, scenario="eGon2035", sector="electricity"
726
):
727
    """
728
    Share of cts electricity demand profile per bus for every selected building
729
    is calculated. Building-amenity share is multiplied with census cell share
730
    to get the substation bus profile share for each building. The share is
731
    grouped and aggregated per building as some buildings exceed the shape of
732
    census cells and have amenities assigned from multiple cells. Building
733
    therefore get the amenity share of all census cells.
734
735
    Parameters
736
    ----------
737
    df_cts_buildings: gpd.GeoDataFrame
738
        Table of all buildings with cts demand assigned
739
    scenario: str
740
        Scenario for which the share is calculated.
741
    sector: str
742
        Sector for which the share is calculated.
743
744
    Returns
745
    -------
746
    df_building_share: pd.DataFrame
747
        Table of bus profile share per building
748
749
    """
750
751
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
752
753
    def calc_building_amenity_share(df_cts_buildings):
754
        """
755
        Calculate the building share by the number amenities per building
756
        within a census cell. Building ids can exist multiple time but with
757
        different zensus_population_ids.
758
        """
759
        df_building_amenity_share = df_cts_buildings[
760
            "n_amenities_inside"
761
        ] / df_cts_buildings.groupby("zensus_population_id")[
762
            "n_amenities_inside"
763
        ].transform(
764
            "sum"
765
        )
766
        df_building_amenity_share = pd.concat(
767
            [
768
                df_building_amenity_share.rename("building_amenity_share"),
769
                df_cts_buildings[["zensus_population_id", "id"]],
770
            ],
771
            axis=1,
772
        )
773
        return df_building_amenity_share
774
775
    df_building_amenity_share = calc_building_amenity_share(df_cts_buildings)
776
777
    df_census_cell_share = calc_census_cell_share(
778
        scenario=scenario, sector=sector
779
    )
780
781
    df_demand_share = pd.merge(
782
        left=df_building_amenity_share,
783
        right=df_census_cell_share,
784
        left_on="zensus_population_id",
785
        right_on="zensus_population_id",
786
    )
787
    df_demand_share["profile_share"] = df_demand_share[
788
        "building_amenity_share"
789
    ].multiply(df_demand_share["cell_share"])
790
791
    # TODO bus_id fix
792
    # df_demand_share = df_demand_share[
793
    #     ["id", "bus_id", "scenario", "profile_share"]
794
    # ]
795
    df_demand_share = df_demand_share[["id", "scenario", "profile_share"]]
796
797
    with db.session_scope() as session:
798
        cells_query = session.query(
799
            egon_map_zensus_buildings_filtered_all.id,
800
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
801
            MapZensusGridDistricts.bus_id,
802
        ).filter(
803
            MapZensusGridDistricts.zensus_population_id
804
            == egon_map_zensus_buildings_filtered_all.zensus_population_id
805
        )
806
807
    df_egon_map_zensus_buildings_buses = pd.read_sql(
808
        cells_query.statement,
809
        cells_query.session.bind,
810
        index_col=None,
811
    )
812
    df_demand_share = pd.merge(
813
        left=df_demand_share, right=df_egon_map_zensus_buildings_buses, on="id"
814
    )
815
816
    # TODO adapt groupby?
817
    # Group and aggregate per building for multi cell buildings
818
    df_demand_share = (
819
        df_demand_share.groupby(["scenario", "id", "bus_id"])
820
        .sum()
821
        .reset_index()
822
    )
823
    if df_demand_share.duplicated("id", keep=False).any():
824
        print(
825
            df_demand_share.loc[df_demand_share.duplicated("id", keep=False)]
826
        )
827
    return df_demand_share
828
829
830
def calc_building_profiles(
831
    egon_building_id=None,
832
    bus_id=None,
833
    scenario="eGon2035",
834
    sector="electricity",
835
):
836
    """
837
    Calculate the demand profile for each building. The profile is
838
    calculated by the demand share of the building per substation bus.
839
840
    Parameters
841
    ----------
842
    egon_building_id: int
843
        Id of the building for which the profile is calculated.
844
        If not given, the profiles are calculated for all buildings.
845
    bus_id: int
846
        Id of the substation for which the all profiles are calculated.
847
        If not given, the profiles are calculated for all buildings.
848
    scenario: str
849
        Scenario for which the share is calculated.
850
    sector: str
851
        Sector for which the share is calculated.
852
853
    Returns
854
    -------
855
    df_building_profiles: pd.DataFrame
856
        Table of demand profile per building
857
    """
858
    if sector == "electricity":
859
        with db.session_scope() as session:
860
            cells_query = session.query(
861
                EgonCtsElectricityDemandBuildingShare,
862
            ).filter(
863
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
864
            )
865
866
        df_demand_share = pd.read_sql(
867
            cells_query.statement, cells_query.session.bind, index_col=None
868
        )
869
870
        # TODO maybe use demand.egon_etrago_electricity_cts
871
        # with db.session_scope() as session:
872
        #     cells_query = (
873
        #         session.query(
874
        #             EgonEtragoElectricityCts
875
        #         ).filter(
876
        #             EgonEtragoElectricityCts.scn_name == scenario)
877
        #     )
878
        #
879
        # df_cts_profiles = pd.read_sql(
880
        #     cells_query.statement,
881
        #     cells_query.session.bind,
882
        # )
883
        # df_cts_profiles = pd.DataFrame.from_dict(
884
        #   df_cts_profiles.set_index('bus_id')['p_set'].to_dict(),
885
        #   orient="index")
886
        df_cts_profiles = calc_load_curves_cts(scenario)
887
888
    elif sector == "heat":
889
        with db.session_scope() as session:
890
            cells_query = session.query(
891
                EgonCtsHeatDemandBuildingShare,
892
            ).filter(EgonCtsHeatDemandBuildingShare.scenario == scenario)
893
894
        df_demand_share = pd.read_sql(
895
            cells_query.statement, cells_query.session.bind, index_col=None
896
        )
897
898
        # TODO cts heat substation profiles missing
899
900
    # get demand share of selected building id
901
    if isinstance(egon_building_id, int):
902
        if egon_building_id in df_demand_share["id"]:
0 ignored issues
show
introduced by
The variable df_demand_share does not seem to be defined for all execution paths.
Loading history...
903
            df_demand_share = df_demand_share.loc[
904
                df_demand_share["id"] == egon_building_id
905
            ]
906
        else:
907
            raise KeyError(f"Building with id {egon_building_id} not found")
908
    # TODO maybe add list
909
    # elif isinstance(egon_building_id, list):
910
911
    # get demand share of all buildings for selected bus id
912
    if isinstance(bus_id, int):
913
        if bus_id in df_demand_share["bus_id"]:
914
            df_demand_share = df_demand_share.loc[
915
                df_demand_share["bus_id"] == bus_id
916
            ]
917
        else:
918
            raise KeyError(f"Bus with id {bus_id} not found")
919
920
    # get demand profile for all buildings for selected demand share
921
    # TODO takes a few seconds per iteration
922
    df_building_profiles = pd.DataFrame()
923
    for bus_id, df in df_demand_share.groupby("bus_id"):
924
        shares = df.set_index("id", drop=True)["profile_share"]
925
        profile = df_cts_profiles.loc[:, bus_id]
0 ignored issues
show
introduced by
The variable df_cts_profiles does not seem to be defined for all execution paths.
Loading history...
926
        # building_profiles = profile.apply(lambda x: x * shares)
927
        building_profiles = np.outer(profile, shares)
928
        building_profiles = pd.DataFrame(
929
            building_profiles, index=profile.index, columns=shares.index
930
        )
931
        df_building_profiles = pd.concat(
932
            [df_building_profiles, building_profiles], axis=1
933
        )
934
935
    return df_building_profiles
936
937
938
def delete_synthetic_cts_buildings():
939
    """
940
    All synthetic cts buildings are deleted from the DB. This is necessary if
941
    the task is run multiple times as the existing synthetic buildings
942
    influence the results.
943
    """
944
    # import db tables
945
    from saio.openstreetmap import osm_buildings_synthetic
946
947
    # cells mit amenities
948
    with db.session_scope() as session:
949
        session.query(osm_buildings_synthetic).filter(
950
            osm_buildings_synthetic.building == "cts"
951
        ).delete()
952
953
954
def cts_buildings():
955
    """
956
    Assigns CTS demand to buildings and calculates the respective demand
957
    profiles. The demand profile per substation are disaggregated per
958
    annual demand share of each census cell and by the number of amenities
959
    per building within the cell. If no building data is available,
960
    synthetic buildings are generated around the amenities. If no amenities
961
    but cts demand is available, buildings are randomly selected. If no
962
    building nor amenity is available, random synthetic buildings are
963
    generated. The demand share is stored in the database.
964
965
    Note:
966
    -----
967
    Cells with CTS demand, amenities and buildings do not change within
968
    the scenarios, only the demand itself. Therefore scenario eGon2035
969
    can be used universally to determine the cts buildings but not for
970
    he demand share.
971
    """
972
973
    log = start_logging()
974
    log.info("Start logging!")
975
    # Buildings with amenities
976
    # df_buildings_with_amenities, df_lost_cells = buildings_with_amenities()
977
    df_buildings_with_amenities = buildings_with_amenities()
978
    log.info("Buildings with amenities selected!")
979
980
    # Median number of amenities per cell
981
    median_n_amenities = int(
982
        df_buildings_with_amenities.groupby("zensus_population_id")[
983
            "n_amenities_inside"
984
        ]
985
        .sum()
986
        .median()
987
    )
988
    # TODO remove
989
    print(f"Median amenity value: {median_n_amenities}")
990
991
    # Remove synthetic CTS buildings if existing
992
    delete_synthetic_cts_buildings()
993
    log.info("Old synthetic cts buildings deleted!")
994
995
    # Amenities not assigned to buildings
996
    df_amenities_without_buildings = amenities_without_buildings()
997
    log.info("Amenities without buildlings selected!")
998
999
    # # Append lost cells due to duplicated ids, to cover all demand cells
1000
    # if not df_lost_cells.empty:
1001
    #
1002
    #     df_lost_cells["amenities"] = median_n_amenities
1003
    #     # create row for every amenity
1004
    #     df_lost_cells["amenities"] = (
1005
    #         df_lost_cells["amenities"].astype(int).apply(range)
1006
    #     )
1007
    #     df_lost_cells = df_lost_cells.explode("amenities")
1008
    #     df_lost_cells.drop(columns="amenities", inplace=True)
1009
    #     df_amenities_without_buildings = df_amenities_without_buildings.append(
1010
    #         df_lost_cells, ignore_index=True
1011
    #     )
1012
    #     log.info("Lost cells due to substation intersection appended!")
1013
1014
    # One building per amenity
1015
    df_amenities_without_buildings["n_amenities_inside"] = 1
1016
    # Create synthetic buildings for amenites without buildings
1017
    df_synthetic_buildings_with_amenities = create_synthetic_buildings(
1018
        df_amenities_without_buildings, points="geom_amenity"
1019
    )
1020
    log.info("Synthetic buildings created!")
1021
1022
    # TODO write to DB and remove renaming
1023
    write_table_to_postgis(
1024
        df_synthetic_buildings_with_amenities.rename(
1025
            columns={
1026
                "zensus_population_id": "cell_id",
1027
                "egon_building_id": "id",
1028
            }
1029
        ),
1030
        OsmBuildingsSynthetic,
1031
        drop=False,
1032
    )
1033
    log.info("Synthetic buildings exported to DB!")
1034
1035
    # Cells without amenities but CTS demand and buildings
1036
    df_buildings_without_amenities = buildings_without_amenities()
1037
    log.info("Buildings without amenities in demand cells identified!")
1038
1039
    # TODO Fix Adhoc Bugfix duplicated buildings
1040
    # drop building ids which have already been used
1041
    mask = df_buildings_without_amenities.loc[
1042
        df_buildings_without_amenities["id"].isin(
1043
            df_buildings_with_amenities["id"]
1044
        )
1045
    ].index
1046
    df_buildings_without_amenities = df_buildings_without_amenities.drop(
1047
        index=mask
1048
    ).reset_index(drop=True)
1049
    log.info(f"{len(mask)} duplicated ids removed!")
1050
1051
    # select median n buildings per cell
1052
    df_buildings_without_amenities = select_cts_buildings(
1053
        df_buildings_without_amenities, max_n=median_n_amenities
1054
    )
1055
    df_buildings_without_amenities["n_amenities_inside"] = 1
1056
    log.info(f"{median_n_amenities} buildings per cell selected!")
1057
1058
    # Create synthetic amenities and buildings in cells with only CTS demand
1059
    df_cells_with_cts_demand_only = cells_with_cts_demand_only(
1060
        df_buildings_without_amenities
1061
    )
1062
    log.info("Cells with only demand identified!")
1063
1064
    # Median n Amenities per cell
1065
    df_cells_with_cts_demand_only["amenities"] = median_n_amenities
1066
    # create row for every amenity
1067
    df_cells_with_cts_demand_only["amenities"] = (
1068
        df_cells_with_cts_demand_only["amenities"].astype(int).apply(range)
1069
    )
1070
    df_cells_with_cts_demand_only = df_cells_with_cts_demand_only.explode(
1071
        "amenities"
1072
    )
1073
    df_cells_with_cts_demand_only.drop(columns="amenities", inplace=True)
1074
1075
    # Only 1 Amenity per Building
1076
    df_cells_with_cts_demand_only["n_amenities_inside"] = 1
1077
    df_cells_with_cts_demand_only = place_buildings_with_amenities(
1078
        df_cells_with_cts_demand_only, amenities=1
1079
    )
1080
    df_synthetic_buildings_without_amenities = create_synthetic_buildings(
1081
        df_cells_with_cts_demand_only, points="geom_point"
1082
    )
1083
    log.info(f"{median_n_amenities} synthetic buildings per cell created")
1084
1085
    # TODO write to DB and remove (backup) renaming
1086
    write_table_to_postgis(
1087
        df_synthetic_buildings_without_amenities.rename(
1088
            columns={
1089
                "zensus_population_id": "cell_id",
1090
                "egon_building_id": "id",
1091
            }
1092
        ),
1093
        OsmBuildingsSynthetic,
1094
        drop=False,
1095
    )
1096
    log.info("Synthetic buildings exported to DB")
1097
1098
    # Concat all buildings
1099
    columns = [
1100
        "zensus_population_id",
1101
        "id",
1102
        "geom_building",
1103
        "n_amenities_inside",
1104
        "source",
1105
    ]
1106
1107
    df_buildings_with_amenities["source"] = "bwa"
1108
    df_synthetic_buildings_with_amenities["source"] = "sbwa"
1109
    df_buildings_without_amenities["source"] = "bwoa"
1110
    df_synthetic_buildings_without_amenities["source"] = "sbwoa"
1111
1112
    df_cts_buildings = pd.concat(
1113
        [
1114
            df_buildings_with_amenities[columns],
1115
            df_synthetic_buildings_with_amenities[columns],
1116
            df_buildings_without_amenities[columns],
1117
            df_synthetic_buildings_without_amenities[columns],
1118
        ],
1119
        axis=0,
1120
        ignore_index=True,
1121
    )
1122
    # TODO maybe remove after #772
1123
    df_cts_buildings["id"] = df_cts_buildings["id"].astype(int)
1124
1125
    # Write table to db for debugging
1126
    # TODO remove later
1127
    df_cts_buildings = gpd.GeoDataFrame(
1128
        df_cts_buildings, geometry="geom_building", crs=3035
1129
    )
1130
    df_cts_buildings = df_cts_buildings.reset_index().rename(
1131
        columns={"index": "serial"}
1132
    )
1133
    write_table_to_postgis(
1134
        df_cts_buildings,
1135
        CtsBuildings,
1136
        drop=True,
1137
    )
1138
    log.info("CTS buildings exported to DB!")
1139
1140
1141 View Code Duplication
def cts_electricity():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1142
    """
1143
    Calculate cts electricity demand share of hvmv substation profile
1144
     for buildings.
1145
    """
1146
    log = start_logging()
1147
    log.info("Start logging!")
1148
    with db.session_scope() as session:
1149
        cells_query = session.query(CtsBuildings)
1150
1151
    df_cts_buildings = pd.read_sql(
1152
        cells_query.statement, cells_query.session.bind, index_col=None
1153
    )
1154
    log.info("CTS buildings from DB imported!")
1155
    df_demand_share_2035 = calc_building_demand_profile_share(
1156
        df_cts_buildings, scenario="eGon2035", sector="electricity"
1157
    )
1158
    log.info("Profile share for egon2035 calculated!")
1159
    df_demand_share_100RE = calc_building_demand_profile_share(
1160
        df_cts_buildings, scenario="eGon100RE", sector="electricity"
1161
    )
1162
    log.info("Profile share for egon100RE calculated!")
1163
    df_demand_share = pd.concat(
1164
        [df_demand_share_2035, df_demand_share_100RE],
1165
        axis=0,
1166
        ignore_index=True,
1167
    )
1168
1169
    write_table_to_postgres(
1170
        df_demand_share, EgonCtsElectricityDemandBuildingShare, drop=True
1171
    )
1172
    log.info("Profile share exported to DB!")
1173
1174
1175 View Code Duplication
def cts_heat():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1176
    """
1177
    Calculate cts electricity demand share of hvmv substation profile
1178
     for buildings.
1179
    """
1180
    log = start_logging()
1181
    log.info("Start logging!")
1182
    with db.session_scope() as session:
1183
        cells_query = session.query(CtsBuildings)
1184
1185
    df_cts_buildings = pd.read_sql(
1186
        cells_query.statement, cells_query.session.bind, index_col=None
1187
    )
1188
    log.info("CTS buildings from DB imported!")
1189
1190
    df_demand_share_2035 = calc_building_demand_profile_share(
1191
        df_cts_buildings, scenario="eGon2035", sector="heat"
1192
    )
1193
    log.info("Profile share for egon2035 calculated!")
1194
    df_demand_share_100RE = calc_building_demand_profile_share(
1195
        df_cts_buildings, scenario="eGon100RE", sector="heat"
1196
    )
1197
    log.info("Profile share for egon100RE calculated!")
1198
    df_demand_share = pd.concat(
1199
        [df_demand_share_2035, df_demand_share_100RE],
1200
        axis=0,
1201
        ignore_index=True,
1202
    )
1203
1204
    write_table_to_postgres(
1205
        df_demand_share, EgonCtsHeatDemandBuildingShare, drop=True
1206
    )
1207
    log.info("Profile share exported to DB!")
1208
1209
1210
def get_cts_electricity_peak_load():
1211
    """
1212
    Get peak load of all CTS buildings for both scenarios and store in DB.
1213
    """
1214
    log = start_logging()
1215
    log.info("Start logging!")
1216
    # Delete rows with cts demand
1217
    with db.session_scope() as session:
1218
        session.query(BuildingPeakLoads).filter(
1219
            BuildingPeakLoads.sector == "cts"
1220
        ).delete()
1221
    log.info("CTS Peak load removed from DB!")
1222
1223
    for scenario in ["eGon2035", "eGon100RE"]:
1224
1225
        with db.session_scope() as session:
1226
            cells_query = session.query(
1227
                EgonCtsElectricityDemandBuildingShare
1228
            ).filter(
1229
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
1230
            )
1231
1232
        df_demand_share = pd.read_sql(
1233
            cells_query.statement, cells_query.session.bind, index_col=None
1234
        )
1235
1236
        df_cts_profiles = calc_load_curves_cts(scenario=scenario)
1237
1238
        df_peak_load = pd.merge(
1239
            left=df_cts_profiles.max(axis=0).astype(float).rename("max"),
1240
            right=df_demand_share,
1241
            left_on="bus_id",
1242
            right_on="bus_id",
1243
        )
1244
1245
        # Convert unit from MWh to W
1246
        df_peak_load["max"] = df_peak_load["max"] * 1e6
1247
        df_peak_load["peak_load_in_w"] = (
1248
            df_peak_load["max"] * df_peak_load["profile_share"]
1249
        )
1250
        log.info(f"Peak load for {scenario} determined!")
1251
1252
        df_peak_load.rename(columns={"id": "building_id"}, inplace=True)
1253
        df_peak_load["sector"] = "cts"
1254
1255
        df_peak_load = df_peak_load[
1256
            ["building_id", "sector", "scenario", "peak_load_in_w"]
1257
        ]
1258
1259
        # Write peak loads into db
1260
        with db.session_scope() as session:
1261
            session.bulk_insert_mappings(
1262
                BuildingPeakLoads,
1263
                df_peak_load.to_dict(orient="records"),
1264
            )
1265
        log.info(f"Peak load for {scenario} exported to DB!")
1266
1267
1268
class CtsElectricityBuildings(Dataset):
1269
    def __init__(self, dependencies):
1270
        super().__init__(
1271
            name="CtsElectricityBuildings",
1272
            version="0.0.0",
1273
            dependencies=dependencies,
1274
            tasks=(
1275
                cts_buildings,
1276
                {cts_electricity, cts_heat},
1277
                get_cts_electricity_peak_load,
1278
            ),
1279
        )
1280