Passed
Pull Request — dev (#826)
by
unknown
01:28
created

data.datasets.electricity_demand_timeseries.cts_buildings   C

Complexity

Total Complexity 53

Size/Duplication

Total Lines 1261
Duplicated Lines 5.15 %

Importance

Changes 0
Metric Value
wmc 53
eloc 649
dl 65
loc 1261
rs 6.911
c 0
b 0
f 0

16 Functions

Rating   Name   Duplication   Size   Complexity  
A place_buildings_with_amenities() 0 41 3
B create_synthetic_buildings() 0 67 6
A amenities_without_buildings() 0 60 2
A select_cts_buildings() 0 25 1
B cells_with_cts_demand_only() 0 72 2
A start_logging() 0 11 1
C buildings_with_amenities() 0 183 5
A calc_census_cell_share() 0 56 4
B buildings_without_amenities() 0 99 2
A cts_heat() 33 33 2
A delete_synthetic_cts_buildings() 0 14 2
B get_cts_electricity_peak_load() 0 56 5
C calc_building_profiles() 0 106 10
B calc_building_demand_profile_share() 0 104 3
A cts_electricity() 32 32 2
B cts_buildings() 0 184 2

1 Method

Rating   Name   Duplication   Size   Complexity  
A CtsElectricityBuildings.__init__() 0 9 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.electricity_demand_timeseries.cts_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
2
3
from geoalchemy2 import Geometry
4
from geoalchemy2.shape import to_shape
5
from sqlalchemy import REAL, Column, Integer, String, func
6
from sqlalchemy.ext.declarative import declarative_base
7
import geopandas as gpd
8
import numpy as np
9
import pandas as pd
10
import saio
11
12
from egon.data import db
13
from egon.data.datasets import Dataset
14
from egon.data.datasets.electricity_demand import (
15
    EgonDemandRegioZensusElectricity,
16
)
17
from egon.data.datasets.electricity_demand.temporal import (
18
    EgonEtragoElectricityCts,
19
    calc_load_curves_cts,
20
)
21
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
22
    BuildingPeakLoads,
23
    OsmBuildingsSynthetic,
24
)
25
from egon.data.datasets.electricity_demand_timeseries.tools import (
26
    random_ints_until_sum,
27
    random_point_in_square,
28
    specific_int_until_sum,
29
    write_table_to_postgis,
30
    write_table_to_postgres,
31
)
32
from egon.data.datasets.heat_demand import EgonPetaHeat
33
from egon.data.datasets.zensus_mv_grid_districts import MapZensusGridDistricts
34
from egon.data.datasets.zensus_vg250 import DestatisZensusPopulationPerHa
35
36
engine = db.engine()
37
Base = declarative_base()
38
39
# import db tables
40
saio.register_schema("openstreetmap", engine=engine)
41
saio.register_schema("boundaries", engine=engine)
42
43
44
class EgonCtsElectricityDemandBuildingShare(Base):
45
    __tablename__ = "egon_cts_electricity_demand_building_share"
46
    __table_args__ = {"schema": "demand"}
47
48
    id = Column(Integer, primary_key=True)
49
    scenario = Column(String, primary_key=True)
50
    bus_id = Column(Integer, index=True)
51
    profile_share = Column(REAL)
52
53
54
class EgonCtsHeatDemandBuildingShare(Base):
55
    __tablename__ = "egon_cts_heat_demand_building_share"
56
    __table_args__ = {"schema": "demand"}
57
58
    id = Column(Integer, primary_key=True)
59
    scenario = Column(String, primary_key=True)
60
    bus_id = Column(Integer, index=True)
61
    profile_share = Column(REAL)
62
63
64
class CtsBuildings(Base):
65
    __tablename__ = "egon_cts_buildings"
66
    __table_args__ = {"schema": "openstreetmap"}
67
68
    serial = Column(Integer, primary_key=True)
69
    id = Column(Integer, index=True)
70
    zensus_population_id = Column(Integer, index=True)
71
    geom_building = Column(Geometry("Polygon", 3035))
72
    n_amenities_inside = Column(Integer)
73
    source = Column(String)
74
75
76
def start_logging():
77
    """Start logging into console"""
78
    log = logging.getLogger()
79
    log.setLevel(logging.INFO)
80
    logformat = logging.Formatter(
81
        "%(asctime)s %(message)s", "%m/%d/%Y %H:%M:%S"
82
    )
83
    sh = logging.StreamHandler()
84
    sh.setFormatter(logformat)
85
    log.addHandler(sh)
86
    return log
87
88
89
def amenities_without_buildings():
90
    """
91
    Amenities which have no buildings assigned and are in
92
    a cell with cts demand are determined.
93
94
    Returns
95
    -------
96
    pd.DataFrame
97
        Table of amenities without buildings
98
    """
99
    from saio.openstreetmap import osm_amenities_not_in_buildings_filtered
100
101
    with db.session_scope() as session:
102
        cells_query = (
103
            session.query(
104
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
105
                # TODO can be used for square around amenity
106
                #  (1 geom_amenity: 1 geom_building)
107
                #  not unique amenity_ids yet
108
                osm_amenities_not_in_buildings_filtered.geom_amenity,
109
                osm_amenities_not_in_buildings_filtered.egon_amenity_id,
110
                # EgonDemandRegioZensusElectricity.demand,
111
                # # TODO can be used to generate n random buildings
112
                # # (n amenities : 1 randombuilding)
113
                # func.count(
114
                #     osm_amenities_not_in_buildings_filtered.egon_amenity_id
115
                # ).label("n_amenities_inside"),
116
                # DestatisZensusPopulationPerHa.geom,
117
            )
118
            .filter(
119
                func.st_within(
120
                    osm_amenities_not_in_buildings_filtered.geom_amenity,
121
                    DestatisZensusPopulationPerHa.geom,
122
                )
123
            )
124
            .filter(
125
                DestatisZensusPopulationPerHa.id
126
                == EgonDemandRegioZensusElectricity.zensus_population_id
127
            )
128
            .filter(
129
                EgonDemandRegioZensusElectricity.sector == "service",
130
                EgonDemandRegioZensusElectricity.scenario == "eGon2035"
131
                #         ).group_by(
132
                #             EgonDemandRegioZensusElectricity.zensus_population_id,
133
                #             DestatisZensusPopulationPerHa.geom,
134
            )
135
        )
136
    # # TODO can be used to generate n random buildings
137
    # df_cells_with_amenities_not_in_buildings = gpd.read_postgis(
138
    #     cells_query.statement, cells_query.session.bind, geom_col="geom"
139
    # )
140
    #
141
142
    # # TODO can be used for square around amenity
143
    df_amenities_without_buildings = gpd.read_postgis(
144
        cells_query.statement,
145
        cells_query.session.bind,
146
        geom_col="geom_amenity",
147
    )
148
    return df_amenities_without_buildings
149
150
151
def place_buildings_with_amenities(df, amenities=None, max_amenities=None):
152
    """
153
    Building centroids are placed randomly within census cells.
154
    The Number of buildings is derived from n_amenity_inside, the selected
155
    method and number of amenities per building.
156
157
    Returns
158
    -------
159
    df: gpd.GeoDataFrame
160
        Table of buildings centroids
161
    """
162
    if isinstance(max_amenities, int):
163
        # amount of amenities is randomly generated within bounds
164
        # (max_amenities, amenities per cell)
165
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
166
            random_ints_until_sum, args=[max_amenities]
167
        )
168
    if isinstance(amenities, int):
169
        # Specific amount of amenities per building
170
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
171
            specific_int_until_sum, args=[amenities]
172
        )
173
174
    # Unnest each building
175
    df = df.explode(column="n_amenities_inside")
176
177
    # building count per cell
178
    df["building_count"] = df.groupby(["zensus_population_id"]).cumcount() + 1
179
180
    # generate random synthetic buildings
181
    edge_length = 5
182
    # create random points within census cells
183
    points = random_point_in_square(geom=df["geom"], tol=edge_length / 2)
184
185
    df.reset_index(drop=True, inplace=True)
186
    # Store center of polygon
187
    df["geom_point"] = points
188
    # Drop geometry of census cell
189
    df = df.drop(columns=["geom"])
190
191
    return df
192
193
194
def create_synthetic_buildings(df, points=None, crs="EPSG:3035"):
195
    """
196
    Synthetic buildings are generated around points.
197
198
    Parameters
199
    ----------
200
    df: pd.DataFrame
201
        Table of census cells
202
    points: gpd.GeoSeries or str
203
        List of points to place buildings around or column name of df
204
    crs: str
205
        CRS of result table
206
207
    Returns
208
    -------
209
    df: gpd.GeoDataFrame
210
        Synthetic buildings
211
    """
212
213
    if isinstance(points, str) and points in df.columns:
214
        points = df[points]
215
    elif isinstance(points, gpd.GeoSeries):
216
        pass
217
    else:
218
        raise ValueError("Points are of the wrong type")
219
220
    # Create building using a square around point
221
    edge_length = 5
222
    df["geom_building"] = points.buffer(distance=edge_length / 2, cap_style=3)
223
224
    if "geom_point" not in df.columns:
225
        df["geom_point"] = df["geom_building"].centroid
226
227
    # TODO Check CRS
228
    df = gpd.GeoDataFrame(
229
        df,
230
        crs=crs,
231
        geometry="geom_building",
232
    )
233
234
    # TODO remove after implementation of egon_building_id
235
    df.rename(columns={"id": "egon_building_id"}, inplace=True)
236
237
    # get max number of building ids from synthetic residential table
238
    with db.session_scope() as session:
239
        max_synth_residential_id = session.execute(
240
            func.max(OsmBuildingsSynthetic.id)
241
        ).scalar()
242
    max_synth_residential_id = int(max_synth_residential_id)
243
244
    # create sequential ids
245
    df["egon_building_id"] = range(
246
        max_synth_residential_id + 1,
247
        max_synth_residential_id + df.shape[0] + 1,
248
    )
249
250
    df["area"] = df["geom_building"].area
251
    # set building type of synthetic building
252
    df["building"] = "cts"
253
    # TODO remove in #772
254
    df = df.rename(
255
        columns={
256
            # "zensus_population_id": "cell_id",
257
            "egon_building_id": "id",
258
        }
259
    )
260
    return df
261
262
263
def buildings_with_amenities():
264
    """
265
    Amenities which are assigned to buildings are determined
266
    and grouped per building and zensus cell. Buildings
267
    covering multiple cells therefore exists multiple times
268
    but in different zensus cells. This is necessary to cover
269
    all cells with a cts demand. If buildings exist in multiple
270
    substations, their amenities are summed and assigned and kept in
271
    one substation only. If as a result, a census cell is uncovered,
272
    a synthetic amenity is placed. The buildings are aggregated
273
    afterwards during the calculation of the profile_share.
274
275
    Returns
276
    -------
277
    df_buildings_with_amenities: gpd.GeoDataFrame
278
        Contains all buildings with amenities per zensus cell.
279
    df_lost_cells: gpd.GeoDataFrame
280
        Contains synthetic amenities in lost cells. Might be empty
281
    """
282
283
    from saio.openstreetmap import osm_amenities_in_buildings_filtered
284
285
    with db.session_scope() as session:
286
        cells_query = (
287
            session.query(
288
                osm_amenities_in_buildings_filtered,
289
                MapZensusGridDistricts.bus_id,
290
            )
291
            .filter(
292
                MapZensusGridDistricts.zensus_population_id
293
                == osm_amenities_in_buildings_filtered.zensus_population_id
294
            )
295
            .filter(
296
                EgonDemandRegioZensusElectricity.zensus_population_id
297
                == osm_amenities_in_buildings_filtered.zensus_population_id
298
            )
299
            .filter(
300
                EgonDemandRegioZensusElectricity.sector == "service",
301
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
302
            )
303
        )
304
    df_amenities_in_buildings = pd.read_sql(
305
        cells_query.statement, cells_query.session.bind, index_col=None
306
    )
307
308
    df_amenities_in_buildings["geom_building"] = df_amenities_in_buildings[
309
        "geom_building"
310
    ].apply(to_shape)
311
    df_amenities_in_buildings["geom_amenity"] = df_amenities_in_buildings[
312
        "geom_amenity"
313
    ].apply(to_shape)
314
315
    df_amenities_in_buildings["n_amenities_inside"] = 1
316
317
    # add identifier column for buildings in multiple substations
318
    df_amenities_in_buildings[
319
        "duplicate_identifier"
320
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
321
        "n_amenities_inside"
322
    ].transform(
323
        "cumsum"
324
    )
325
    df_amenities_in_buildings = df_amenities_in_buildings.sort_values(
326
        ["id", "duplicate_identifier"]
327
    )
328
    # sum amenities of buildings with multiple substations
329
    df_amenities_in_buildings[
330
        "n_amenities_inside"
331
    ] = df_amenities_in_buildings.groupby(["id", "duplicate_identifier"])[
332
        "n_amenities_inside"
333
    ].transform(
334
        "sum"
335
    )
336
337
    # create column to always go for bus_id with max amenities
338
    df_amenities_in_buildings[
339
        "max_amenities"
340
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
341
        "n_amenities_inside"
342
    ].transform(
343
        "sum"
344
    )
345
    # sort to go for
346
    df_amenities_in_buildings.sort_values(
347
        ["id", "max_amenities"], ascending=False, inplace=True
348
    )
349
350
    # identify lost zensus cells
351
    df_lost_cells = df_amenities_in_buildings.loc[
352
        df_amenities_in_buildings.duplicated(
353
            subset=["id", "duplicate_identifier"], keep="first"
354
        )
355
    ]
356
    df_lost_cells.drop_duplicates(
357
        subset=["zensus_population_id"], inplace=True
358
    )
359
360
    # drop buildings with multiple substation and lower max amenity
361
    df_amenities_in_buildings.drop_duplicates(
362
        subset=["id", "duplicate_identifier"], keep="first", inplace=True
363
    )
364
365
    # check if lost zensus cells are already covered
366
    if not df_lost_cells.empty:
367
        if not (
368
            df_amenities_in_buildings["zensus_population_id"]
369
            .isin(df_lost_cells["zensus_population_id"])
370
            .empty
371
        ):
372
            # query geom data for cell if not
373
            with db.session_scope() as session:
374
                cells_query = session.query(
375
                    DestatisZensusPopulationPerHa.id,
376
                    DestatisZensusPopulationPerHa.geom,
377
                ).filter(
378
                    DestatisZensusPopulationPerHa.id.in_(
379
                        df_lost_cells["zensus_population_id"]
380
                    )
381
                )
382
383
            df_lost_cells = gpd.read_postgis(
384
                cells_query.statement,
385
                cells_query.session.bind,
386
                geom_col="geom",
387
            )
388
            # TODO maybe adapt method
389
            # place random amenity in cell
390
            df_lost_cells["n_amenities_inside"] = 1
391
            df_lost_cells.rename(
392
                columns={
393
                    "id": "zensus_population_id",
394
                },
395
                inplace=True,
396
            )
397
            df_lost_cells = place_buildings_with_amenities(
398
                df_lost_cells, amenities=1
399
            )
400
            df_lost_cells.rename(
401
                columns={
402
                    # "id": "zensus_population_id",
403
                    "geom_point": "geom_amenity",
404
                },
405
                inplace=True,
406
            )
407
            df_lost_cells.drop(
408
                columns=["building_count", "n_amenities_inside"], inplace=True
409
            )
410
        else:
411
            df_lost_cells = None
412
    else:
413
        df_lost_cells = None
414
415
    # drop helper columns
416
    df_amenities_in_buildings.drop(
417
        columns=["duplicate_identifier"], inplace=True
418
    )
419
420
    # sum amenities per building and cell
421
    df_amenities_in_buildings[
422
        "n_amenities_inside"
423
    ] = df_amenities_in_buildings.groupby(["zensus_population_id", "id"])[
424
        "n_amenities_inside"
425
    ].transform(
426
        "sum"
427
    )
428
    # drop duplicated buildings
429
    df_buildings_with_amenities = df_amenities_in_buildings.drop_duplicates(
430
        ["id", "zensus_population_id"]
431
    )
432
    df_buildings_with_amenities.reset_index(inplace=True, drop=True)
433
434
    df_buildings_with_amenities = df_buildings_with_amenities[
435
        ["id", "zensus_population_id", "geom_building", "n_amenities_inside"]
436
    ]
437
    df_buildings_with_amenities.rename(
438
        columns={
439
            # "zensus_population_id": "cell_id",
440
            "egon_building_id": "id"
441
        },
442
        inplace=True,
443
    )
444
445
    return df_buildings_with_amenities, df_lost_cells
446
447
448
def buildings_without_amenities():
449
    """
450
    Buildings (filtered and synthetic) in cells with
451
    cts demand but no amenities are determined.
452
453
    Returns
454
    -------
455
    df_buildings_without_amenities: gpd.GeoDataFrame
456
        Table of buildings without amenities in zensus cells
457
        with cts demand.
458
    """
459
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
460
    from saio.openstreetmap import (
461
        osm_amenities_shops_filtered,
462
        osm_buildings_filtered,
463
        osm_buildings_synthetic,
464
    )
465
466
    # buildings_filtered in cts-demand-cells without amenities
467
    with db.session_scope() as session:
468
469
        # Synthetic Buildings
470
        q_synth_buildings = session.query(
471
            osm_buildings_synthetic.cell_id.cast(Integer).label(
472
                "zensus_population_id"
473
            ),
474
            osm_buildings_synthetic.id.cast(Integer).label("id"),
475
            osm_buildings_synthetic.area.label("area"),
476
            osm_buildings_synthetic.geom_building.label("geom_building"),
477
            osm_buildings_synthetic.geom_point.label("geom_point"),
478
        )
479
480
        # Buildings filtered
481
        q_buildings_filtered = session.query(
482
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
483
            osm_buildings_filtered.id,
484
            osm_buildings_filtered.area,
485
            osm_buildings_filtered.geom_building,
486
            osm_buildings_filtered.geom_point,
487
        ).filter(
488
            osm_buildings_filtered.id
489
            == egon_map_zensus_buildings_filtered_all.id
490
        )
491
492
        # Amenities + zensus_population_id
493
        q_amenities = (
494
            session.query(
495
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
496
            )
497
            .filter(
498
                func.st_within(
499
                    osm_amenities_shops_filtered.geom_amenity,
500
                    DestatisZensusPopulationPerHa.geom,
501
                )
502
            )
503
            .distinct(DestatisZensusPopulationPerHa.id)
504
        )
505
506
        # Cells with CTS demand but without amenities
507
        q_cts_without_amenities = (
508
            session.query(
509
                EgonDemandRegioZensusElectricity.zensus_population_id,
510
            )
511
            .filter(
512
                EgonDemandRegioZensusElectricity.sector == "service",
513
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
514
            )
515
            .filter(
516
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
517
                    q_amenities
518
                )
519
            )
520
            .distinct()
521
        )
522
523
        # Buildings filtered + synthetic buildings residential in
524
        # cells with CTS demand but without amenities
525
        cells_query = q_synth_buildings.union(q_buildings_filtered).filter(
526
            egon_map_zensus_buildings_filtered_all.zensus_population_id.in_(
527
                q_cts_without_amenities
528
            )
529
        )
530
531
    # df_buildings_without_amenities = pd.read_sql(
532
    #     cells_query.statement, cells_query.session.bind, index_col=None)
533
    df_buildings_without_amenities = gpd.read_postgis(
534
        cells_query.statement,
535
        cells_query.session.bind,
536
        geom_col="geom_building",
537
    )
538
539
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
540
        columns={
541
            # "zensus_population_id": "cell_id",
542
            "egon_building_id": "id",
543
        }
544
    )
545
546
    return df_buildings_without_amenities
547
548
549
def select_cts_buildings(df_buildings_wo_amenities, max_n):
550
    """
551
    N Buildings (filtered and synthetic) in each cell with
552
    cts demand are selected. Only the first n buildings
553
    are taken for each cell. The buildings are sorted by surface
554
    area.
555
556
    Returns
557
    -------
558
    df_buildings_with_cts_demand: gpd.GeoDataFrame
559
        Table of buildings
560
    """
561
562
    df_buildings_wo_amenities.sort_values(
563
        "area", ascending=False, inplace=True
564
    )
565
    # select first n ids each census cell if available
566
    df_buildings_with_cts_demand = (
567
        df_buildings_wo_amenities.groupby("zensus_population_id")
568
        .nth(list(range(max_n)))
569
        .reset_index()
570
    )
571
    df_buildings_with_cts_demand.reset_index(drop=True, inplace=True)
572
573
    return df_buildings_with_cts_demand
574
575
576
def cells_with_cts_demand_only(df_buildings_without_amenities):
577
    """
578
    Cells with cts demand but no amenities or buildilngs
579
    are determined.
580
581
    Returns
582
    -------
583
    df_cells_only_cts_demand: gpd.GeoDataFrame
584
        Table of cells with cts demand but no amenities or buildings
585
    """
586
    from saio.openstreetmap import osm_amenities_shops_filtered
587
588
    # cells mit amenities
589
    with db.session_scope() as session:
590
        sub_query = (
591
            session.query(
592
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
593
            )
594
            .filter(
595
                func.st_within(
596
                    osm_amenities_shops_filtered.geom_amenity,
597
                    DestatisZensusPopulationPerHa.geom,
598
                )
599
            )
600
            .distinct(DestatisZensusPopulationPerHa.id)
601
        )
602
603
        cells_query = (
604
            session.query(
605
                EgonDemandRegioZensusElectricity.zensus_population_id,
606
                EgonDemandRegioZensusElectricity.scenario,
607
                EgonDemandRegioZensusElectricity.sector,
608
                EgonDemandRegioZensusElectricity.demand,
609
                DestatisZensusPopulationPerHa.geom,
610
            )
611
            .filter(
612
                EgonDemandRegioZensusElectricity.sector == "service",
613
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
614
            )
615
            .filter(
616
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
617
                    sub_query
618
                )
619
            )
620
            .filter(
621
                EgonDemandRegioZensusElectricity.zensus_population_id
622
                == DestatisZensusPopulationPerHa.id
623
            )
624
        )
625
626
    df_cts_cell_without_amenities = gpd.read_postgis(
627
        cells_query.statement,
628
        cells_query.session.bind,
629
        geom_col="geom",
630
        index_col=None,
631
    )
632
633
    # TODO maybe remove
634
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
635
        columns={"cell_id": "zensus_population_id"}
636
    )
637
638
    # Census cells with only cts demand
639
    df_cells_only_cts_demand = df_cts_cell_without_amenities.loc[
640
        ~df_cts_cell_without_amenities["zensus_population_id"].isin(
641
            df_buildings_without_amenities["zensus_population_id"].unique()
642
        )
643
    ]
644
645
    df_cells_only_cts_demand.reset_index(drop=True, inplace=True)
646
647
    return df_cells_only_cts_demand
648
649
650
def calc_census_cell_share(scenario="eGon2035", sector="electricity"):
651
    """
652
    The profile share for each census cell is calculated by it's
653
    share of annual demand per substation bus. The annual demand
654
    per cell is defined by DemandRegio/Peta5. The share is for both
655
    scenarios identical as the annual demand is linearly scaled.
656
657
    Parameters
658
    ----------
659
    scenario: str
660
        Scenario for which the share is calculated.
661
    sector: str
662
        Scenario for which the share is calculated.
663
664
    Returns
665
    -------
666
    df_census_share: pd.DataFrame
667
    """
668
    if sector == "electricity":
669
        demand_table = EgonDemandRegioZensusElectricity
670
    elif sector == "heat":
671
        demand_table = EgonPetaHeat
672
673
    with db.session_scope() as session:
674
        cells_query = (
675
            session.query(demand_table, MapZensusGridDistricts.bus_id)
0 ignored issues
show
introduced by
The variable demand_table does not seem to be defined for all execution paths.
Loading history...
676
            .filter(demand_table.sector == "service")
677
            .filter(demand_table.scenario == scenario)
678
            .filter(
679
                demand_table.zensus_population_id
680
                == MapZensusGridDistricts.zensus_population_id
681
            )
682
        )
683
684
    df_demand = pd.read_sql(
685
        cells_query.statement,
686
        cells_query.session.bind,
687
        index_col="zensus_population_id",
688
    )
689
690
    # get demand share of cell per bus
691
    df_census_share = df_demand["demand"] / df_demand.groupby("bus_id")[
692
        "demand"
693
    ].transform("sum")
694
    df_census_share = df_census_share.rename("cell_share")
695
696
    df_census_share = pd.concat(
697
        [
698
            df_census_share,
699
            df_demand[["bus_id", "scenario"]],
700
        ],
701
        axis=1,
702
    )
703
704
    df_census_share.reset_index(inplace=True)
705
    return df_census_share
706
707
708
def calc_building_demand_profile_share(
709
    df_cts_buildings, scenario="eGon2035", sector="electricity"
710
):
711
    """
712
    Share of cts electricity demand profile per bus for every selected building
713
    is calculated. Building-amenity share is multiplied with census cell share
714
    to get the substation bus profile share for each building. The share is
715
    grouped and aggregated per building as some buildings exceed the shape of
716
    census cells and have amenities assigned from multiple cells. Building
717
    therefore get the amenity share of all census cells.
718
719
    Parameters
720
    ----------
721
    df_cts_buildings: gpd.GeoDataFrame
722
        Table of all buildings with cts demand assigned
723
    scenario: str
724
        Scenario for which the share is calculated.
725
    sector: str
726
        Sector for which the share is calculated.
727
728
    Returns
729
    -------
730
    df_building_share: pd.DataFrame
731
        Table of bus profile share per building
732
733
    """
734
735
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
736
737
    def calc_building_amenity_share(df_cts_buildings):
738
        """
739
        Calculate the building share by the number amenities per building
740
        within a census cell. Building ids can exist multiple time but with
741
        different zensus_population_ids.
742
        """
743
        df_building_amenity_share = df_cts_buildings[
744
            "n_amenities_inside"
745
        ] / df_cts_buildings.groupby("zensus_population_id")[
746
            "n_amenities_inside"
747
        ].transform(
748
            "sum"
749
        )
750
        df_building_amenity_share = pd.concat(
751
            [
752
                df_building_amenity_share.rename("building_amenity_share"),
753
                df_cts_buildings[["zensus_population_id", "id"]],
754
            ],
755
            axis=1,
756
        )
757
        return df_building_amenity_share
758
759
    df_building_amenity_share = calc_building_amenity_share(df_cts_buildings)
760
761
    df_census_cell_share = calc_census_cell_share(
762
        scenario=scenario, sector=sector
763
    )
764
765
    df_demand_share = pd.merge(
766
        left=df_building_amenity_share,
767
        right=df_census_cell_share,
768
        left_on="zensus_population_id",
769
        right_on="zensus_population_id",
770
    )
771
    df_demand_share["profile_share"] = df_demand_share[
772
        "building_amenity_share"
773
    ].multiply(df_demand_share["cell_share"])
774
775
    # TODO bus_id fix
776
    # df_demand_share = df_demand_share[
777
    #     ["id", "bus_id", "scenario", "profile_share"]
778
    # ]
779
    df_demand_share = df_demand_share[["id", "scenario", "profile_share"]]
780
781
    with db.session_scope() as session:
782
        cells_query = session.query(
783
            egon_map_zensus_buildings_filtered_all.id,
784
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
785
            MapZensusGridDistricts.bus_id,
786
        ).filter(
787
            MapZensusGridDistricts.zensus_population_id
788
            == egon_map_zensus_buildings_filtered_all.zensus_population_id
789
        )
790
791
    df_egon_map_zensus_buildings_buses = pd.read_sql(
792
        cells_query.statement,
793
        cells_query.session.bind,
794
        index_col=None,
795
    )
796
    df_demand_share = pd.merge(
797
        left=df_demand_share, right=df_egon_map_zensus_buildings_buses, on="id"
798
    )
799
800
    # TODO adapt groupby?
801
    # Group and aggregate per building for multi cell buildings
802
    df_demand_share = (
803
        df_demand_share.groupby(["scenario", "id", "bus_id"])
804
        .sum()
805
        .reset_index()
806
    )
807
    if df_demand_share.duplicated("id", keep=False).any():
808
        print(
809
            df_demand_share.loc[df_demand_share.duplicated("id", keep=False)]
810
        )
811
    return df_demand_share
812
813
814
def calc_building_profiles(
815
    egon_building_id=None,
816
    bus_id=None,
817
    scenario="eGon2035",
818
    sector="electricity",
819
):
820
    """
821
    Calculate the demand profile for each building. The profile is
822
    calculated by the demand share of the building per substation bus.
823
824
    Parameters
825
    ----------
826
    egon_building_id: int
827
        Id of the building for which the profile is calculated.
828
        If not given, the profiles are calculated for all buildings.
829
    bus_id: int
830
        Id of the substation for which the all profiles are calculated.
831
        If not given, the profiles are calculated for all buildings.
832
    scenario: str
833
        Scenario for which the share is calculated.
834
    sector: str
835
        Sector for which the share is calculated.
836
837
    Returns
838
    -------
839
    df_building_profiles: pd.DataFrame
840
        Table of demand profile per building
841
    """
842
    if sector == "electricity":
843
        with db.session_scope() as session:
844
            cells_query = session.query(
845
                EgonCtsElectricityDemandBuildingShare,
846
            ).filter(
847
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
848
            )
849
850
        df_demand_share = pd.read_sql(
851
            cells_query.statement, cells_query.session.bind, index_col=None
852
        )
853
854
        # TODO maybe use demand.egon_etrago_electricity_cts
855
        # with db.session_scope() as session:
856
        #     cells_query = (
857
        #         session.query(
858
        #             EgonEtragoElectricityCts
859
        #         ).filter(
860
        #             EgonEtragoElectricityCts.scn_name == scenario)
861
        #     )
862
        #
863
        # df_cts_profiles = pd.read_sql(
864
        #     cells_query.statement,
865
        #     cells_query.session.bind,
866
        # )
867
        # df_cts_profiles = pd.DataFrame.from_dict(
868
        #   df_cts_profiles.set_index('bus_id')['p_set'].to_dict(),
869
        #   orient="index")
870
        df_cts_profiles = calc_load_curves_cts(scenario)
871
872
    elif sector == "heat":
873
        with db.session_scope() as session:
874
            cells_query = session.query(
875
                EgonCtsHeatDemandBuildingShare,
876
            ).filter(EgonCtsHeatDemandBuildingShare.scenario == scenario)
877
878
        df_demand_share = pd.read_sql(
879
            cells_query.statement, cells_query.session.bind, index_col=None
880
        )
881
882
        # TODO cts heat substation profiles missing
883
884
    # get demand share of selected building id
885
    if isinstance(egon_building_id, int):
886
        if egon_building_id in df_demand_share["id"]:
0 ignored issues
show
introduced by
The variable df_demand_share does not seem to be defined for all execution paths.
Loading history...
887
            df_demand_share = df_demand_share.loc[
888
                df_demand_share["id"] == egon_building_id
889
            ]
890
        else:
891
            raise KeyError(f"Building with id {egon_building_id} not found")
892
    # TODO maybe add list
893
    # elif isinstance(egon_building_id, list):
894
895
    # get demand share of all buildings for selected bus id
896
    if isinstance(bus_id, int):
897
        if bus_id in df_demand_share["bus_id"]:
898
            df_demand_share = df_demand_share.loc[
899
                df_demand_share["bus_id"] == bus_id
900
            ]
901
        else:
902
            raise KeyError(f"Bus with id {bus_id} not found")
903
904
    # get demand profile for all buildings for selected demand share
905
    # TODO takes a few seconds per iteration
906
    df_building_profiles = pd.DataFrame()
907
    for bus_id, df in df_demand_share.groupby("bus_id"):
908
        shares = df.set_index("id", drop=True)["profile_share"]
909
        profile = df_cts_profiles.loc[:, bus_id]
0 ignored issues
show
introduced by
The variable df_cts_profiles does not seem to be defined for all execution paths.
Loading history...
910
        # building_profiles = profile.apply(lambda x: x * shares)
911
        building_profiles = np.outer(profile, shares)
912
        building_profiles = pd.DataFrame(
913
            building_profiles, index=profile.index, columns=shares.index
914
        )
915
        df_building_profiles = pd.concat(
916
            [df_building_profiles, building_profiles], axis=1
917
        )
918
919
    return df_building_profiles
920
921
922
def delete_synthetic_cts_buildings():
923
    """
924
    All synthetic cts buildings are deleted from the DB. This is necessary if
925
    the task is run multiple times as the existing synthetic buildings
926
    influence the results.
927
    """
928
    # import db tables
929
    from saio.openstreetmap import osm_buildings_synthetic
930
931
    # cells mit amenities
932
    with db.session_scope() as session:
933
        session.query(osm_buildings_synthetic).filter(
934
            osm_buildings_synthetic.building == "cts"
935
        ).delete()
936
937
938
def cts_buildings():
939
    """
940
    Assigns CTS demand to buildings and calculates the respective demand
941
    profiles. The demand profile per substation are disaggregated per
942
    annual demand share of each census cell and by the number of amenities
943
    per building within the cell. If no building data is available,
944
    synthetic buildings are generated around the amenities. If no amenities
945
    but cts demand is available, buildings are randomly selected. If no
946
    building nor amenity is available, random synthetic buildings are
947
    generated. The demand share is stored in the database.
948
949
    Note:
950
    -----
951
    Cells with CTS demand, amenities and buildings do not change within
952
    the scenarios, only the demand itself. Therefore scenario eGon2035
953
    can be used universally to determine the cts buildings but not for
954
    he demand share.
955
    """
956
957
    log = start_logging()
958
    log.info("Start logging!")
959
    # Buildings with amenities
960
    df_buildings_with_amenities, df_lost_cells = buildings_with_amenities()
961
    log.info("Buildings with amenities selected!")
962
963
    # Median number of amenities per cell
964
    median_n_amenities = int(
965
        df_buildings_with_amenities.groupby("zensus_population_id")[
966
            "n_amenities_inside"
967
        ]
968
        .sum()
969
        .median()
970
    )
971
    # TODO remove
972
    print(f"Median amenity value: {median_n_amenities}")
973
974
    # Remove synthetic CTS buildings if existing
975
    delete_synthetic_cts_buildings()
976
    log.info("Old synthetic cts buildings deleted!")
977
978
    # Amenities not assigned to buildings
979
    df_amenities_without_buildings = amenities_without_buildings()
980
    log.info("Amenities without buildlings selected!")
981
982
    # Append lost cells due to duplicated ids, to cover all demand cells
983
    if not df_lost_cells.empty:
984
985
        df_lost_cells["amenities"] = median_n_amenities
986
        # create row for every amenity
987
        df_lost_cells["amenities"] = (
988
            df_lost_cells["amenities"].astype(int).apply(range)
989
        )
990
        df_lost_cells = df_lost_cells.explode("amenities")
991
        df_lost_cells.drop(columns="amenities", inplace=True)
992
        df_amenities_without_buildings = df_amenities_without_buildings.append(
993
            df_lost_cells, ignore_index=True
994
        )
995
        log.info("Lost cells due to substation intersection appended!")
996
997
    # One building per amenity
998
    df_amenities_without_buildings["n_amenities_inside"] = 1
999
    # Create synthetic buildings for amenites without buildings
1000
    df_synthetic_buildings_with_amenities = create_synthetic_buildings(
1001
        df_amenities_without_buildings, points="geom_amenity"
1002
    )
1003
    log.info("Synthetic buildings created!")
1004
1005
    # TODO write to DB and remove renaming
1006
    write_table_to_postgis(
1007
        df_synthetic_buildings_with_amenities.rename(
1008
            columns={
1009
                "zensus_population_id": "cell_id",
1010
                "egon_building_id": "id",
1011
            }
1012
        ),
1013
        OsmBuildingsSynthetic,
1014
        drop=False,
1015
    )
1016
    log.info("Synthetic buildings exported to DB!")
1017
1018
    # Cells without amenities but CTS demand and buildings
1019
    df_buildings_without_amenities = buildings_without_amenities()
1020
    log.info("Buildings without amenities in demand cells identified!")
1021
1022
    # TODO Fix Adhoc Bugfix duplicated buildings
1023
    # drop building ids which have already been used
1024
    mask = df_buildings_without_amenities.loc[
1025
        df_buildings_without_amenities["id"].isin(
1026
            df_buildings_with_amenities["id"]
1027
        )
1028
    ].index
1029
    df_buildings_without_amenities = df_buildings_without_amenities.drop(
1030
        index=mask
1031
    ).reset_index(drop=True)
1032
    log.info(f"{len(mask)} duplicated ids removed!")
1033
1034
    # select median n buildings per cell
1035
    df_buildings_without_amenities = select_cts_buildings(
1036
        df_buildings_without_amenities, max_n=median_n_amenities
1037
    )
1038
    df_buildings_without_amenities["n_amenities_inside"] = 1
1039
    log.info(f"{median_n_amenities} buildings per cell selected!")
1040
1041
    # Create synthetic amenities and buildings in cells with only CTS demand
1042
    df_cells_with_cts_demand_only = cells_with_cts_demand_only(
1043
        df_buildings_without_amenities
1044
    )
1045
    log.info("Cells with only demand identified!")
1046
1047
    # Median n Amenities per cell
1048
    df_cells_with_cts_demand_only["amenities"] = median_n_amenities
1049
    # create row for every amenity
1050
    df_cells_with_cts_demand_only["amenities"] = (
1051
        df_cells_with_cts_demand_only["amenities"].astype(int).apply(range)
1052
    )
1053
    df_cells_with_cts_demand_only = df_cells_with_cts_demand_only.explode(
1054
        "amenities"
1055
    )
1056
    df_cells_with_cts_demand_only.drop(columns="amenities", inplace=True)
1057
1058
    # Only 1 Amenity per Building
1059
    df_cells_with_cts_demand_only["n_amenities_inside"] = 1
1060
    df_cells_with_cts_demand_only = place_buildings_with_amenities(
1061
        df_cells_with_cts_demand_only, amenities=1
1062
    )
1063
    df_synthetic_buildings_without_amenities = create_synthetic_buildings(
1064
        df_cells_with_cts_demand_only, points="geom_point"
1065
    )
1066
    log.info(f"{median_n_amenities} synthetic buildings per cell created")
1067
1068
    # TODO write to DB and remove (backup) renaming
1069
    write_table_to_postgis(
1070
        df_synthetic_buildings_without_amenities.rename(
1071
            columns={
1072
                "zensus_population_id": "cell_id",
1073
                "egon_building_id": "id",
1074
            }
1075
        ),
1076
        OsmBuildingsSynthetic,
1077
        drop=False,
1078
    )
1079
    log.info("Synthetic buildings exported to DB")
1080
1081
    # Concat all buildings
1082
    columns = [
1083
        "zensus_population_id",
1084
        "id",
1085
        "geom_building",
1086
        "n_amenities_inside",
1087
        "source",
1088
    ]
1089
1090
    df_buildings_with_amenities["source"] = "bwa"
1091
    df_synthetic_buildings_with_amenities["source"] = "sbwa"
1092
    df_buildings_without_amenities["source"] = "bwoa"
1093
    df_synthetic_buildings_without_amenities["source"] = "sbwoa"
1094
1095
    df_cts_buildings = pd.concat(
1096
        [
1097
            df_buildings_with_amenities[columns],
1098
            df_synthetic_buildings_with_amenities[columns],
1099
            df_buildings_without_amenities[columns],
1100
            df_synthetic_buildings_without_amenities[columns],
1101
        ],
1102
        axis=0,
1103
        ignore_index=True,
1104
    )
1105
    # TODO maybe remove after #772
1106
    df_cts_buildings["id"] = df_cts_buildings["id"].astype(int)
1107
1108
    # Write table to db for debugging
1109
    # TODO remove later
1110
    df_cts_buildings = gpd.GeoDataFrame(
1111
        df_cts_buildings, geometry="geom_building", crs=3035
1112
    )
1113
    df_cts_buildings = df_cts_buildings.reset_index().rename(
1114
        columns={"index": "serial"}
1115
    )
1116
    write_table_to_postgis(
1117
        df_cts_buildings,
1118
        CtsBuildings,
1119
        drop=True,
1120
    )
1121
    log.info("CTS buildings exported to DB!")
1122
1123
1124 View Code Duplication
def cts_electricity():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1125
    """
1126
    Calculate cts electricity demand share of hvmv substation profile
1127
     for buildings.
1128
    """
1129
    log = start_logging()
1130
    log.info("Start logging!")
1131
    with db.session_scope() as session:
1132
        cells_query = session.query(CtsBuildings)
1133
1134
    df_cts_buildings = pd.read_sql(
1135
        cells_query.statement, cells_query.session.bind, index_col=None
1136
    )
1137
    log.info("CTS buildings from DB imported!")
1138
    df_demand_share_2035 = calc_building_demand_profile_share(
1139
        df_cts_buildings, scenario="eGon2035", sector="electricity"
1140
    )
1141
    log.info("Profile share for egon2035 calculated!")
1142
    df_demand_share_100RE = calc_building_demand_profile_share(
1143
        df_cts_buildings, scenario="eGon100RE", sector="electricity"
1144
    )
1145
    log.info("Profile share for egon100RE calculated!")
1146
    df_demand_share = pd.concat(
1147
        [df_demand_share_2035, df_demand_share_100RE],
1148
        axis=0,
1149
        ignore_index=True,
1150
    )
1151
1152
    write_table_to_postgres(
1153
        df_demand_share, EgonCtsElectricityDemandBuildingShare, drop=True
1154
    )
1155
    log.info("Profile share exported to DB!")
1156
1157
1158 View Code Duplication
def cts_heat():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1159
    """
1160
    Calculate cts electricity demand share of hvmv substation profile
1161
     for buildings.
1162
    """
1163
    log = start_logging()
1164
    log.info("Start logging!")
1165
    with db.session_scope() as session:
1166
        cells_query = session.query(CtsBuildings)
1167
1168
    df_cts_buildings = pd.read_sql(
1169
        cells_query.statement, cells_query.session.bind, index_col=None
1170
    )
1171
    log.info("CTS buildings from DB imported!")
1172
1173
    df_demand_share_2035 = calc_building_demand_profile_share(
1174
        df_cts_buildings, scenario="eGon2035", sector="heat"
1175
    )
1176
    log.info("Profile share for egon2035 calculated!")
1177
    df_demand_share_100RE = calc_building_demand_profile_share(
1178
        df_cts_buildings, scenario="eGon100RE", sector="heat"
1179
    )
1180
    log.info("Profile share for egon100RE calculated!")
1181
    df_demand_share = pd.concat(
1182
        [df_demand_share_2035, df_demand_share_100RE],
1183
        axis=0,
1184
        ignore_index=True,
1185
    )
1186
1187
    write_table_to_postgres(
1188
        df_demand_share, EgonCtsHeatDemandBuildingShare, drop=True
1189
    )
1190
    log.info("Profile share exported to DB!")
1191
1192
1193
def get_cts_electricity_peak_load():
1194
    """
1195
    Get peak load of all CTS buildings for both scenarios and store in DB.
1196
    """
1197
    log = start_logging()
1198
    log.info("Start logging!")
1199
    # Delete rows with cts demand
1200
    with db.session_scope() as session:
1201
        session.query(BuildingPeakLoads).filter(
1202
            BuildingPeakLoads.sector == "cts"
1203
        ).delete()
1204
    log.info("CTS Peak load removed from DB!")
1205
1206
    for scenario in ["eGon2035", "eGon100RE"]:
1207
1208
        with db.session_scope() as session:
1209
            cells_query = session.query(
1210
                EgonCtsElectricityDemandBuildingShare
1211
            ).filter(
1212
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
1213
            )
1214
1215
        df_demand_share = pd.read_sql(
1216
            cells_query.statement, cells_query.session.bind, index_col=None
1217
        )
1218
1219
        df_cts_profiles = calc_load_curves_cts(scenario=scenario)
1220
1221
        df_peak_load = pd.merge(
1222
            left=df_cts_profiles.max(axis=0).astype(float).rename("max"),
1223
            right=df_demand_share,
1224
            left_on="bus_id",
1225
            right_on="bus_id",
1226
        )
1227
1228
        # Convert unit from MWh to W
1229
        df_peak_load["max"] = df_peak_load["max"] * 1e6
1230
        df_peak_load["peak_load_in_w"] = (
1231
            df_peak_load["max"] * df_peak_load["profile_share"]
1232
        )
1233
        log.info(f"Peak load for {scenario} determined!")
1234
1235
        df_peak_load.rename(columns={"id": "building_id"}, inplace=True)
1236
        df_peak_load["sector"] = "cts"
1237
1238
        df_peak_load = df_peak_load[
1239
            ["building_id", "sector", "scenario", "peak_load_in_w"]
1240
        ]
1241
1242
        # Write peak loads into db
1243
        with db.session_scope() as session:
1244
            session.bulk_insert_mappings(
1245
                BuildingPeakLoads,
1246
                df_peak_load.to_dict(orient="records"),
1247
            )
1248
        log.info(f"Peak load for {scenario} exported to DB!")
1249
1250
1251
class CtsElectricityBuildings(Dataset):
1252
    def __init__(self, dependencies):
1253
        super().__init__(
1254
            name="CtsElectricityBuildings",
1255
            version="0.0.0",
1256
            dependencies=dependencies,
1257
            tasks=(
1258
                cts_buildings,
1259
                {cts_electricity, cts_heat},
1260
                get_cts_electricity_peak_load,
1261
            ),
1262
        )
1263