Passed
Pull Request — dev (#826)
by
unknown
01:49
created

data.datasets.electricity_demand_timeseries.cts_buildings   B

Complexity

Total Complexity 52

Size/Duplication

Total Lines 1237
Duplicated Lines 5.25 %

Importance

Changes 0
Metric Value
wmc 52
eloc 636
dl 65
loc 1237
rs 7.404
c 0
b 0
f 0

16 Functions

Rating   Name   Duplication   Size   Complexity  
A cts_heat() 33 33 2
A place_buildings_with_amenities() 0 41 3
B create_synthetic_buildings() 0 67 6
A delete_synthetic_cts_buildings() 0 14 2
B get_cts_electricity_peak_load() 0 59 5
C calc_building_profiles() 0 106 10
A amenities_without_buildings() 0 60 2
A select_cts_buildings() 0 25 1
B cells_with_cts_demand_only() 0 72 2
A start_logging() 0 11 1
A calc_building_demand_profile_share() 0 77 2
C buildings_with_amenities() 0 183 5
A cts_electricity() 32 32 2
A calc_census_cell_share() 0 56 4
B cts_buildings() 0 184 2
B buildings_without_amenities() 0 99 2

1 Method

Rating   Name   Duplication   Size   Complexity  
A CtsElectricityBuildings.__init__() 0 9 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.electricity_demand_timeseries.cts_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
2
3
from geoalchemy2 import Geometry
4
from geoalchemy2.shape import to_shape
5
from sqlalchemy import REAL, Column, Integer, String, func
6
from sqlalchemy.ext.declarative import declarative_base
7
import geopandas as gpd
8
import numpy as np
9
import pandas as pd
10
import saio
11
12
from egon.data import db
13
from egon.data.datasets import Dataset
14
from egon.data.datasets.electricity_demand import (
15
    EgonDemandRegioZensusElectricity,
16
)
17
from egon.data.datasets.electricity_demand.temporal import (
18
    EgonEtragoElectricityCts,
19
    calc_load_curves_cts,
20
)
21
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
22
    BuildingPeakLoads,
23
    OsmBuildingsSynthetic,
24
)
25
from egon.data.datasets.electricity_demand_timeseries.tools import (
26
    random_ints_until_sum,
27
    random_point_in_square,
28
    specific_int_until_sum,
29
    write_table_to_postgis,
30
    write_table_to_postgres,
31
)
32
from egon.data.datasets.heat_demand import EgonPetaHeat
33
from egon.data.datasets.zensus_mv_grid_districts import MapZensusGridDistricts
34
from egon.data.datasets.zensus_vg250 import DestatisZensusPopulationPerHa
35
36
engine = db.engine()
37
Base = declarative_base()
38
39
# import db tables
40
saio.register_schema("openstreetmap", engine=engine)
41
saio.register_schema("boundaries", engine=engine)
42
43
44
class EgonCtsElectricityDemandBuildingShare(Base):
45
    __tablename__ = "egon_cts_electricity_demand_building_share"
46
    __table_args__ = {"schema": "demand"}
47
48
    id = Column(Integer, primary_key=True)
49
    scenario = Column(String, primary_key=True)
50
    bus_id = Column(Integer, index=True)
51
    profile_share = Column(REAL)
52
53
54
class EgonCtsHeatDemandBuildingShare(Base):
55
    __tablename__ = "egon_cts_heat_demand_building_share"
56
    __table_args__ = {"schema": "demand"}
57
58
    id = Column(Integer, primary_key=True)
59
    scenario = Column(String, primary_key=True)
60
    bus_id = Column(Integer, index=True)
61
    profile_share = Column(REAL)
62
63
64
class CtsBuildings(Base):
65
    __tablename__ = "egon_cts_buildings"
66
    __table_args__ = {"schema": "openstreetmap"}
67
68
    serial = Column(Integer, primary_key=True)
69
    id = Column(Integer, index=True)
70
    zensus_population_id = Column(Integer, index=True)
71
    geom_building = Column(Geometry("Polygon", 3035))
72
    n_amenities_inside = Column(Integer)
73
    source = Column(String)
74
75
76
def start_logging():
77
    """Start logging into console"""
78
    log = logging.getLogger()
79
    log.setLevel(logging.INFO)
80
    logformat = logging.Formatter(
81
        "%(asctime)s %(message)s", "%m/%d/%Y %H:%M:%S"
82
    )
83
    sh = logging.StreamHandler()
84
    sh.setFormatter(logformat)
85
    log.addHandler(sh)
86
    return log
87
88
89
def amenities_without_buildings():
90
    """
91
    Amenities which have no buildings assigned and are in
92
    a cell with cts demand are determined.
93
94
    Returns
95
    -------
96
    pd.DataFrame
97
        Table of amenities without buildings
98
    """
99
    from saio.openstreetmap import osm_amenities_not_in_buildings_filtered
100
101
    with db.session_scope() as session:
102
        cells_query = (
103
            session.query(
104
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
105
                # TODO can be used for square around amenity
106
                #  (1 geom_amenity: 1 geom_building)
107
                #  not unique amenity_ids yet
108
                osm_amenities_not_in_buildings_filtered.geom_amenity,
109
                osm_amenities_not_in_buildings_filtered.egon_amenity_id,
110
                # EgonDemandRegioZensusElectricity.demand,
111
                # # TODO can be used to generate n random buildings
112
                # # (n amenities : 1 randombuilding)
113
                # func.count(
114
                #     osm_amenities_not_in_buildings_filtered.egon_amenity_id
115
                # ).label("n_amenities_inside"),
116
                # DestatisZensusPopulationPerHa.geom,
117
            )
118
            .filter(
119
                func.st_within(
120
                    osm_amenities_not_in_buildings_filtered.geom_amenity,
121
                    DestatisZensusPopulationPerHa.geom,
122
                )
123
            )
124
            .filter(
125
                DestatisZensusPopulationPerHa.id
126
                == EgonDemandRegioZensusElectricity.zensus_population_id
127
            )
128
            .filter(
129
                EgonDemandRegioZensusElectricity.sector == "service",
130
                EgonDemandRegioZensusElectricity.scenario == "eGon2035"
131
                #         ).group_by(
132
                #             EgonDemandRegioZensusElectricity.zensus_population_id,
133
                #             DestatisZensusPopulationPerHa.geom,
134
            )
135
        )
136
    # # TODO can be used to generate n random buildings
137
    # df_cells_with_amenities_not_in_buildings = gpd.read_postgis(
138
    #     cells_query.statement, cells_query.session.bind, geom_col="geom"
139
    # )
140
    #
141
142
    # # TODO can be used for square around amenity
143
    df_amenities_without_buildings = gpd.read_postgis(
144
        cells_query.statement,
145
        cells_query.session.bind,
146
        geom_col="geom_amenity",
147
    )
148
    return df_amenities_without_buildings
149
150
151
def place_buildings_with_amenities(df, amenities=None, max_amenities=None):
152
    """
153
    Building centroids are placed randomly within census cells.
154
    The Number of buildings is derived from n_amenity_inside, the selected
155
    method and number of amenities per building.
156
157
    Returns
158
    -------
159
    df: gpd.GeoDataFrame
160
        Table of buildings centroids
161
    """
162
    if isinstance(max_amenities, int):
163
        # amount of amenities is randomly generated within bounds
164
        # (max_amenities, amenities per cell)
165
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
166
            random_ints_until_sum, args=[max_amenities]
167
        )
168
    if isinstance(amenities, int):
169
        # Specific amount of amenities per building
170
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
171
            specific_int_until_sum, args=[amenities]
172
        )
173
174
    # Unnest each building
175
    df = df.explode(column="n_amenities_inside")
176
177
    # building count per cell
178
    df["building_count"] = df.groupby(["zensus_population_id"]).cumcount() + 1
179
180
    # generate random synthetic buildings
181
    edge_length = 5
182
    # create random points within census cells
183
    points = random_point_in_square(geom=df["geom"], tol=edge_length / 2)
184
185
    df.reset_index(drop=True, inplace=True)
186
    # Store center of polygon
187
    df["geom_point"] = points
188
    # Drop geometry of census cell
189
    df = df.drop(columns=["geom"])
190
191
    return df
192
193
194
def create_synthetic_buildings(df, points=None, crs="EPSG:3035"):
195
    """
196
    Synthetic buildings are generated around points.
197
198
    Parameters
199
    ----------
200
    df: pd.DataFrame
201
        Table of census cells
202
    points: gpd.GeoSeries or str
203
        List of points to place buildings around or column name of df
204
    crs: str
205
        CRS of result table
206
207
    Returns
208
    -------
209
    df: gpd.GeoDataFrame
210
        Synthetic buildings
211
    """
212
213
    if isinstance(points, str) and points in df.columns:
214
        points = df[points]
215
    elif isinstance(points, gpd.GeoSeries):
216
        pass
217
    else:
218
        raise ValueError("Points are of the wrong type")
219
220
    # Create building using a square around point
221
    edge_length = 5
222
    df["geom_building"] = points.buffer(distance=edge_length / 2, cap_style=3)
223
224
    if "geom_point" not in df.columns:
225
        df["geom_point"] = df["geom_building"].centroid
226
227
    # TODO Check CRS
228
    df = gpd.GeoDataFrame(
229
        df,
230
        crs=crs,
231
        geometry="geom_building",
232
    )
233
234
    # TODO remove after implementation of egon_building_id
235
    df.rename(columns={"id": "egon_building_id"}, inplace=True)
236
237
    # get max number of building ids from synthetic residential table
238
    with db.session_scope() as session:
239
        max_synth_residential_id = session.execute(
240
            func.max(OsmBuildingsSynthetic.id)
241
        ).scalar()
242
    max_synth_residential_id = int(max_synth_residential_id)
243
244
    # create sequential ids
245
    df["egon_building_id"] = range(
246
        max_synth_residential_id + 1,
247
        max_synth_residential_id + df.shape[0] + 1,
248
    )
249
250
    df["area"] = df["geom_building"].area
251
    # set building type of synthetic building
252
    df["building"] = "cts"
253
    # TODO remove in #772
254
    df = df.rename(
255
        columns={
256
            # "zensus_population_id": "cell_id",
257
            "egon_building_id": "id",
258
        }
259
    )
260
    return df
261
262
263
def buildings_with_amenities():
264
    """
265
    Amenities which are assigned to buildings are determined
266
    and grouped per building and zensus cell. Buildings
267
    covering multiple cells therefore exists multiple times
268
    but in different zensus cells. This is necessary to cover
269
    all cells with a cts demand. If buildings exist in multiple
270
    substations, their amenities are summed and assigned and kept in
271
    one substation only. If as a result, a census cell is uncovered,
272
    a synthetic amenity is placed. The buildings are aggregated
273
    afterwards during the calculation of the profile_share.
274
275
    Returns
276
    -------
277
    df_buildings_with_amenities: gpd.GeoDataFrame
278
        Contains all buildings with amenities per zensus cell.
279
    df_lost_cells: gpd.GeoDataFrame
280
        Contains synthetic amenities in lost cells. Might be empty
281
    """
282
283
    from saio.openstreetmap import osm_amenities_in_buildings_filtered
284
285
    with db.session_scope() as session:
286
        cells_query = (
287
            session.query(
288
                osm_amenities_in_buildings_filtered,
289
                MapZensusGridDistricts.bus_id,
290
            )
291
            .filter(
292
                MapZensusGridDistricts.zensus_population_id
293
                == osm_amenities_in_buildings_filtered.zensus_population_id
294
            )
295
            .filter(
296
                EgonDemandRegioZensusElectricity.zensus_population_id
297
                == osm_amenities_in_buildings_filtered.zensus_population_id
298
            )
299
            .filter(
300
                EgonDemandRegioZensusElectricity.sector == "service",
301
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
302
            )
303
        )
304
    df_amenities_in_buildings = pd.read_sql(
305
        cells_query.statement, cells_query.session.bind, index_col=None
306
    )
307
308
    df_amenities_in_buildings["geom_building"] = df_amenities_in_buildings[
309
        "geom_building"
310
    ].apply(to_shape)
311
    df_amenities_in_buildings["geom_amenity"] = df_amenities_in_buildings[
312
        "geom_amenity"
313
    ].apply(to_shape)
314
315
    df_amenities_in_buildings["n_amenities_inside"] = 1
316
317
    # add identifier column for buildings in multiple substations
318
    df_amenities_in_buildings[
319
        "duplicate_identifier"
320
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
321
        "n_amenities_inside"
322
    ].transform(
323
        "cumsum"
324
    )
325
    df_amenities_in_buildings = df_amenities_in_buildings.sort_values(
326
        ["id", "duplicate_identifier"]
327
    )
328
    # sum amenities of buildings with multiple substations
329
    df_amenities_in_buildings[
330
        "n_amenities_inside"
331
    ] = df_amenities_in_buildings.groupby(["id", "duplicate_identifier"])[
332
        "n_amenities_inside"
333
    ].transform(
334
        "sum"
335
    )
336
337
    # create column to always go for bus_id with max amenities
338
    df_amenities_in_buildings[
339
        "max_amenities"
340
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
341
        "n_amenities_inside"
342
    ].transform(
343
        "sum"
344
    )
345
    # sort to go for
346
    df_amenities_in_buildings.sort_values(
347
        ["id", "max_amenities"], ascending=False, inplace=True
348
    )
349
350
    # identify lost zensus cells
351
    df_lost_cells = df_amenities_in_buildings.loc[
352
        df_amenities_in_buildings.duplicated(
353
            subset=["id", "duplicate_identifier"], keep="first"
354
        )
355
    ]
356
    df_lost_cells.drop_duplicates(
357
        subset=["zensus_population_id"], inplace=True
358
    )
359
360
    # drop buildings with multiple substation and lower max amenity
361
    df_amenities_in_buildings.drop_duplicates(
362
        subset=["id", "duplicate_identifier"], keep="first", inplace=True
363
    )
364
365
    # check if lost zensus cells are already covered
366
    if not df_lost_cells.empty:
367
        if not (
368
            df_amenities_in_buildings["zensus_population_id"]
369
            .isin(df_lost_cells["zensus_population_id"])
370
            .empty
371
        ):
372
            # query geom data for cell if not
373
            with db.session_scope() as session:
374
                cells_query = session.query(
375
                    DestatisZensusPopulationPerHa.id,
376
                    DestatisZensusPopulationPerHa.geom,
377
                ).filter(
378
                    DestatisZensusPopulationPerHa.id.in_(
379
                        df_lost_cells["zensus_population_id"]
380
                    )
381
                )
382
383
            df_lost_cells = gpd.read_postgis(
384
                cells_query.statement,
385
                cells_query.session.bind,
386
                geom_col="geom",
387
            )
388
            # TODO maybe adapt method
389
            # place random amenity in cell
390
            df_lost_cells["n_amenities_inside"] = 1
391
            df_lost_cells.rename(
392
                columns={
393
                    "id": "zensus_population_id",
394
                },
395
                inplace=True,
396
            )
397
            df_lost_cells = place_buildings_with_amenities(
398
                df_lost_cells, amenities=1
399
            )
400
            df_lost_cells.rename(
401
                columns={
402
                    # "id": "zensus_population_id",
403
                    "geom_point": "geom_amenity",
404
                },
405
                inplace=True,
406
            )
407
            df_lost_cells.drop(
408
                columns=["building_count", "n_amenities_inside"], inplace=True
409
            )
410
        else:
411
            df_lost_cells = None
412
    else:
413
        df_lost_cells = None
414
415
    # drop helper columns
416
    df_amenities_in_buildings.drop(
417
        columns=["duplicate_identifier"], inplace=True
418
    )
419
420
    # sum amenities per building and cell
421
    df_amenities_in_buildings[
422
        "n_amenities_inside"
423
    ] = df_amenities_in_buildings.groupby(["zensus_population_id", "id"])[
424
        "n_amenities_inside"
425
    ].transform(
426
        "sum"
427
    )
428
    # drop duplicated buildings
429
    df_buildings_with_amenities = df_amenities_in_buildings.drop_duplicates(
430
        ["id", "zensus_population_id"]
431
    )
432
    df_buildings_with_amenities.reset_index(inplace=True, drop=True)
433
434
    df_buildings_with_amenities = df_buildings_with_amenities[
435
        ["id", "zensus_population_id", "geom_building", "n_amenities_inside"]
436
    ]
437
    df_buildings_with_amenities.rename(
438
        columns={
439
            # "zensus_population_id": "cell_id",
440
            "egon_building_id": "id"
441
        },
442
        inplace=True,
443
    )
444
445
    return df_buildings_with_amenities, df_lost_cells
446
447
448
def buildings_without_amenities():
449
    """
450
    Buildings (filtered and synthetic) in cells with
451
    cts demand but no amenities are determined.
452
453
    Returns
454
    -------
455
    df_buildings_without_amenities: gpd.GeoDataFrame
456
        Table of buildings without amenities in zensus cells
457
        with cts demand.
458
    """
459
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
460
    from saio.openstreetmap import (
461
        osm_amenities_shops_filtered,
462
        osm_buildings_filtered,
463
        osm_buildings_synthetic,
464
    )
465
466
    # buildings_filtered in cts-demand-cells without amenities
467
    with db.session_scope() as session:
468
469
        # Synthetic Buildings
470
        q_synth_buildings = session.query(
471
            osm_buildings_synthetic.cell_id.cast(Integer).label(
472
                "zensus_population_id"
473
            ),
474
            osm_buildings_synthetic.id.cast(Integer).label("id"),
475
            osm_buildings_synthetic.area.label("area"),
476
            osm_buildings_synthetic.geom_building.label("geom_building"),
477
            osm_buildings_synthetic.geom_point.label("geom_point"),
478
        )
479
480
        # Buildings filtered
481
        q_buildings_filtered = session.query(
482
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
483
            osm_buildings_filtered.id,
484
            osm_buildings_filtered.area,
485
            osm_buildings_filtered.geom_building,
486
            osm_buildings_filtered.geom_point,
487
        ).filter(
488
            osm_buildings_filtered.id
489
            == egon_map_zensus_buildings_filtered_all.id
490
        )
491
492
        # Amenities + zensus_population_id
493
        q_amenities = (
494
            session.query(
495
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
496
            )
497
            .filter(
498
                func.st_within(
499
                    osm_amenities_shops_filtered.geom_amenity,
500
                    DestatisZensusPopulationPerHa.geom,
501
                )
502
            )
503
            .distinct(DestatisZensusPopulationPerHa.id)
504
        )
505
506
        # Cells with CTS demand but without amenities
507
        q_cts_without_amenities = (
508
            session.query(
509
                EgonDemandRegioZensusElectricity.zensus_population_id,
510
            )
511
            .filter(
512
                EgonDemandRegioZensusElectricity.sector == "service",
513
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
514
            )
515
            .filter(
516
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
517
                    q_amenities
518
                )
519
            )
520
            .distinct()
521
        )
522
523
        # Buildings filtered + synthetic buildings residential in
524
        # cells with CTS demand but without amenities
525
        cells_query = q_synth_buildings.union(q_buildings_filtered).filter(
526
            egon_map_zensus_buildings_filtered_all.zensus_population_id.in_(
527
                q_cts_without_amenities
528
            )
529
        )
530
531
    # df_buildings_without_amenities = pd.read_sql(
532
    #     cells_query.statement, cells_query.session.bind, index_col=None)
533
    df_buildings_without_amenities = gpd.read_postgis(
534
        cells_query.statement,
535
        cells_query.session.bind,
536
        geom_col="geom_building",
537
    )
538
539
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
540
        columns={
541
            # "zensus_population_id": "cell_id",
542
            "egon_building_id": "id",
543
        }
544
    )
545
546
    return df_buildings_without_amenities
547
548
549
def select_cts_buildings(df_buildings_wo_amenities, max_n):
550
    """
551
    N Buildings (filtered and synthetic) in each cell with
552
    cts demand are selected. Only the first n buildings
553
    are taken for each cell. The buildings are sorted by surface
554
    area.
555
556
    Returns
557
    -------
558
    df_buildings_with_cts_demand: gpd.GeoDataFrame
559
        Table of buildings
560
    """
561
562
    df_buildings_wo_amenities.sort_values(
563
        "area", ascending=False, inplace=True
564
    )
565
    # select first n ids each census cell if available
566
    df_buildings_with_cts_demand = (
567
        df_buildings_wo_amenities.groupby("zensus_population_id")
568
        .nth(list(range(max_n)))
569
        .reset_index()
570
    )
571
    df_buildings_with_cts_demand.reset_index(drop=True, inplace=True)
572
573
    return df_buildings_with_cts_demand
574
575
576
def cells_with_cts_demand_only(df_buildings_without_amenities):
577
    """
578
    Cells with cts demand but no amenities or buildilngs
579
    are determined.
580
581
    Returns
582
    -------
583
    df_cells_only_cts_demand: gpd.GeoDataFrame
584
        Table of cells with cts demand but no amenities or buildings
585
    """
586
    from saio.openstreetmap import osm_amenities_shops_filtered
587
588
    # cells mit amenities
589
    with db.session_scope() as session:
590
        sub_query = (
591
            session.query(
592
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
593
            )
594
            .filter(
595
                func.st_within(
596
                    osm_amenities_shops_filtered.geom_amenity,
597
                    DestatisZensusPopulationPerHa.geom,
598
                )
599
            )
600
            .distinct(DestatisZensusPopulationPerHa.id)
601
        )
602
603
        cells_query = (
604
            session.query(
605
                EgonDemandRegioZensusElectricity.zensus_population_id,
606
                EgonDemandRegioZensusElectricity.scenario,
607
                EgonDemandRegioZensusElectricity.sector,
608
                EgonDemandRegioZensusElectricity.demand,
609
                DestatisZensusPopulationPerHa.geom,
610
            )
611
            .filter(
612
                EgonDemandRegioZensusElectricity.sector == "service",
613
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
614
            )
615
            .filter(
616
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
617
                    sub_query
618
                )
619
            )
620
            .filter(
621
                EgonDemandRegioZensusElectricity.zensus_population_id
622
                == DestatisZensusPopulationPerHa.id
623
            )
624
        )
625
626
    df_cts_cell_without_amenities = gpd.read_postgis(
627
        cells_query.statement,
628
        cells_query.session.bind,
629
        geom_col="geom",
630
        index_col=None,
631
    )
632
633
    # TODO maybe remove
634
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
635
        columns={"cell_id": "zensus_population_id"}
636
    )
637
638
    # Census cells with only cts demand
639
    df_cells_only_cts_demand = df_cts_cell_without_amenities.loc[
640
        ~df_cts_cell_without_amenities["zensus_population_id"].isin(
641
            df_buildings_without_amenities["zensus_population_id"].unique()
642
        )
643
    ]
644
645
    df_cells_only_cts_demand.reset_index(drop=True, inplace=True)
646
647
    return df_cells_only_cts_demand
648
649
650
def calc_census_cell_share(scenario="eGon2035", sector="electricity"):
651
    """
652
    The profile share for each census cell is calculated by it's
653
    share of annual demand per substation bus. The annual demand
654
    per cell is defined by DemandRegio/Peta5. The share is for both
655
    scenarios identical as the annual demand is linearly scaled.
656
657
    Parameters
658
    ----------
659
    scenario: str
660
        Scenario for which the share is calculated.
661
    sector: str
662
        Scenario for which the share is calculated.
663
664
    Returns
665
    -------
666
    df_census_share: pd.DataFrame
667
    """
668
    if sector == "electricity":
669
        demand_table = EgonDemandRegioZensusElectricity
670
    elif sector == "heat":
671
        demand_table = EgonPetaHeat
672
673
    with db.session_scope() as session:
674
        cells_query = (
675
            session.query(demand_table, MapZensusGridDistricts.bus_id)
0 ignored issues
show
introduced by
The variable demand_table does not seem to be defined for all execution paths.
Loading history...
676
            .filter(demand_table.sector == "service")
677
            .filter(demand_table.scenario == scenario)
678
            .filter(
679
                demand_table.zensus_population_id
680
                == MapZensusGridDistricts.zensus_population_id
681
            )
682
        )
683
684
    df_demand = pd.read_sql(
685
        cells_query.statement,
686
        cells_query.session.bind,
687
        index_col="zensus_population_id",
688
    )
689
690
    # get demand share of cell per bus
691
    df_census_share = df_demand["demand"] / df_demand.groupby("bus_id")[
692
        "demand"
693
    ].transform("sum")
694
    df_census_share = df_census_share.rename("cell_share")
695
696
    df_census_share = pd.concat(
697
        [
698
            df_census_share,
699
            df_demand[["bus_id", "scenario"]],
700
        ],
701
        axis=1,
702
    )
703
704
    df_census_share.reset_index(inplace=True)
705
    return df_census_share
706
707
708
def calc_building_demand_profile_share(
709
    df_cts_buildings, scenario="eGon2035", sector="electricity"
710
):
711
    """
712
    Share of cts electricity demand profile per bus for every selected building
713
    is calculated. Building-amenity share is multiplied with census cell share
714
    to get the substation bus profile share for each building. The share is
715
    grouped and aggregated per building as some cover multiple cells.
716
717
    Parameters
718
    ----------
719
    df_cts_buildings: gpd.GeoDataFrame
720
        Table of all buildings with cts demand assigned
721
    scenario: str
722
        Scenario for which the share is calculated.
723
    sector: str
724
        Sector for which the share is calculated.
725
726
    Returns
727
    -------
728
    df_building_share: pd.DataFrame
729
        Table of bus profile share per building
730
731
    """
732
733
    def calc_building_amenity_share(df_cts_buildings):
734
        """
735
        Calculate the building share by the number amenities per building
736
        within a census cell.
737
        """
738
        df_building_amenity_share = df_cts_buildings[
739
            "n_amenities_inside"
740
        ] / df_cts_buildings.groupby("zensus_population_id")[
741
            "n_amenities_inside"
742
        ].transform(
743
            "sum"
744
        )
745
        df_building_amenity_share = pd.concat(
746
            [
747
                df_building_amenity_share.rename("building_amenity_share"),
748
                df_cts_buildings[["zensus_population_id", "id"]],
749
            ],
750
            axis=1,
751
        )
752
        return df_building_amenity_share
753
754
    df_building_amenity_share = calc_building_amenity_share(df_cts_buildings)
755
756
    df_census_cell_share = calc_census_cell_share(
757
        scenario=scenario, sector=sector
758
    )
759
760
    df_demand_share = pd.merge(
761
        left=df_building_amenity_share,
762
        right=df_census_cell_share,
763
        left_on="zensus_population_id",
764
        right_on="zensus_population_id",
765
    )
766
    df_demand_share["profile_share"] = df_demand_share[
767
        "building_amenity_share"
768
    ].multiply(df_demand_share["cell_share"])
769
770
    df_demand_share = df_demand_share[
771
        ["id", "bus_id", "scenario", "profile_share"]
772
    ]
773
    # TODO adapt groupby?
774
    # Group and aggregate per building for multi cell buildings
775
    df_demand_share = (
776
        df_demand_share.groupby(["scenario", "id", "bus_id"])
777
        .sum()
778
        .reset_index()
779
    )
780
    if df_demand_share.duplicated("id", keep=False).any():
781
        print(
782
            df_demand_share.loc[df_demand_share.duplicated("id", keep=False)]
783
        )
784
    return df_demand_share
785
786
787
def calc_building_profiles(
788
    egon_building_id=None,
789
    bus_id=None,
790
    scenario="eGon2035",
791
    sector="electricity",
792
):
793
    """
794
    Calculate the demand profile for each building. The profile is
795
    calculated by the demand share of the building per substation bus.
796
797
    Parameters
798
    ----------
799
    egon_building_id: int
800
        Id of the building for which the profile is calculated.
801
        If not given, the profiles are calculated for all buildings.
802
    bus_id: int
803
        Id of the substation for which the all profiles are calculated.
804
        If not given, the profiles are calculated for all buildings.
805
    scenario: str
806
        Scenario for which the share is calculated.
807
    sector: str
808
        Sector for which the share is calculated.
809
810
    Returns
811
    -------
812
    df_building_profiles: pd.DataFrame
813
        Table of demand profile per building
814
    """
815
    if sector == "electricity":
816
        with db.session_scope() as session:
817
            cells_query = session.query(
818
                EgonCtsElectricityDemandBuildingShare,
819
            ).filter(
820
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
821
            )
822
823
        df_demand_share = pd.read_sql(
824
            cells_query.statement, cells_query.session.bind, index_col=None
825
        )
826
827
        # TODO maybe use demand.egon_etrago_electricity_cts
828
        # with db.session_scope() as session:
829
        #     cells_query = (
830
        #         session.query(
831
        #             EgonEtragoElectricityCts
832
        #         ).filter(
833
        #             EgonEtragoElectricityCts.scn_name == scenario)
834
        #     )
835
        #
836
        # df_cts_profiles = pd.read_sql(
837
        #     cells_query.statement,
838
        #     cells_query.session.bind,
839
        # )
840
        # df_cts_profiles = pd.DataFrame.from_dict(
841
        #   df_cts_profiles.set_index('bus_id')['p_set'].to_dict(),
842
        #   orient="index")
843
        df_cts_profiles = calc_load_curves_cts(scenario)
844
845
    elif sector == "heat":
846
        with db.session_scope() as session:
847
            cells_query = session.query(
848
                EgonCtsHeatDemandBuildingShare,
849
            ).filter(EgonCtsHeatDemandBuildingShare.scenario == scenario)
850
851
        df_demand_share = pd.read_sql(
852
            cells_query.statement, cells_query.session.bind, index_col=None
853
        )
854
855
        # TODO cts heat substation profiles missing
856
857
    # get demand share of selected building id
858
    if isinstance(egon_building_id, int):
859
        if egon_building_id in df_demand_share["id"]:
0 ignored issues
show
introduced by
The variable df_demand_share does not seem to be defined for all execution paths.
Loading history...
860
            df_demand_share = df_demand_share.loc[
861
                df_demand_share["id"] == egon_building_id
862
            ]
863
        else:
864
            raise KeyError(f"Building with id {egon_building_id} not found")
865
    # TODO maybe add list
866
    # elif isinstance(egon_building_id, list):
867
868
    # get demand share of all buildings for selected bus id
869
    if isinstance(bus_id, int):
870
        if bus_id in df_demand_share["bus_id"]:
871
            df_demand_share = df_demand_share.loc[
872
                df_demand_share["bus_id"] == bus_id
873
            ]
874
        else:
875
            raise KeyError(f"Bus with id {bus_id} not found")
876
877
    # get demand profile for all buildings for selected demand share
878
    # TODO takes a few seconds per iteration
879
    df_building_profiles = pd.DataFrame()
880
    for bus_id, df in df_demand_share.groupby("bus_id"):
881
        shares = df.set_index("id", drop=True)["profile_share"]
882
        profile = df_cts_profiles.loc[:, bus_id]
0 ignored issues
show
introduced by
The variable df_cts_profiles does not seem to be defined for all execution paths.
Loading history...
883
        # building_profiles = profile.apply(lambda x: x * shares)
884
        building_profiles = np.outer(profile, shares)
885
        building_profiles = pd.DataFrame(
886
            building_profiles, index=profile.index, columns=shares.index
887
        )
888
        df_building_profiles = pd.concat(
889
            [df_building_profiles, building_profiles], axis=1
890
        )
891
892
    return df_building_profiles
893
894
895
def delete_synthetic_cts_buildings():
896
    """
897
    All synthetic cts buildings are deleted from the DB. This is necessary if
898
    the task is run multiple times as the existing synthetic buildings
899
    influence the results.
900
    """
901
    # import db tables
902
    from saio.openstreetmap import osm_buildings_synthetic
903
904
    # cells mit amenities
905
    with db.session_scope() as session:
906
        session.query(osm_buildings_synthetic).filter(
907
            osm_buildings_synthetic.building == "cts"
908
        ).delete()
909
910
911
def cts_buildings():
912
    """
913
    Assigns CTS demand to buildings and calculates the respective demand
914
    profiles. The demand profile per substation are disaggregated per
915
    annual demand share of each census cell and by the number of amenities
916
    per building within the cell. If no building data is available,
917
    synthetic buildings are generated around the amenities. If no amenities
918
    but cts demand is available, buildings are randomly selected. If no
919
    building nor amenity is available, random synthetic buildings are
920
    generated. The demand share is stored in the database.
921
922
    Note:
923
    -----
924
    Cells with CTS demand, amenities and buildings do not change within
925
    the scenarios, only the demand itself. Therefore scenario eGon2035
926
    can be used universally to determine the cts buildings but not for
927
    he demand share.
928
    """
929
930
    log = start_logging()
931
    log.info("Start logging!")
932
    # Buildings with amenities
933
    df_buildings_with_amenities, df_lost_cells = buildings_with_amenities()
934
    log.info("Buildings with amenities selected!")
935
936
    # Median number of amenities per cell
937
    median_n_amenities = int(
938
        df_buildings_with_amenities.groupby("zensus_population_id")[
939
            "n_amenities_inside"
940
        ]
941
        .sum()
942
        .median()
943
    )
944
    # TODO remove
945
    print(f"Median amenity value: {median_n_amenities}")
946
947
    # Remove synthetic CTS buildings if existing
948
    delete_synthetic_cts_buildings()
949
    log.info("Old synthetic cts buildings deleted!")
950
951
    # Amenities not assigned to buildings
952
    df_amenities_without_buildings = amenities_without_buildings()
953
    log.info("Amenities without buildlings selected!")
954
955
    # Append lost cells due to duplicated ids, to cover all demand cells
956
    if not df_lost_cells.empty:
957
958
        df_lost_cells["amenities"] = median_n_amenities
959
        # create row for every amenity
960
        df_lost_cells["amenities"] = (
961
            df_lost_cells["amenities"].astype(int).apply(range)
962
        )
963
        df_lost_cells = df_lost_cells.explode("amenities")
964
        df_lost_cells.drop(columns="amenities", inplace=True)
965
        df_amenities_without_buildings = df_amenities_without_buildings.append(
966
            df_lost_cells, ignore_index=True
967
        )
968
        log.info("Lost cells due to substation intersection appended!")
969
970
    # One building per amenity
971
    df_amenities_without_buildings["n_amenities_inside"] = 1
972
    # Create synthetic buildings for amenites without buildings
973
    df_synthetic_buildings_with_amenities = create_synthetic_buildings(
974
        df_amenities_without_buildings, points="geom_amenity"
975
    )
976
    log.info("Synthetic buildings created!")
977
978
    # TODO write to DB and remove renaming
979
    write_table_to_postgis(
980
        df_synthetic_buildings_with_amenities.rename(
981
            columns={
982
                "zensus_population_id": "cell_id",
983
                "egon_building_id": "id",
984
            }
985
        ),
986
        OsmBuildingsSynthetic,
987
        drop=False,
988
    )
989
    log.info("Synthetic buildings exported to DB!")
990
991
    # Cells without amenities but CTS demand and buildings
992
    df_buildings_without_amenities = buildings_without_amenities()
993
    log.info("Buildings without amenities in demand cells identified!")
994
995
    # TODO Fix Adhoc Bugfix duplicated buildings
996
    # drop building ids which have already been used
997
    mask = df_buildings_without_amenities.loc[
998
        df_buildings_without_amenities["id"].isin(
999
            df_buildings_with_amenities["id"]
1000
        )
1001
    ].index
1002
    df_buildings_without_amenities = df_buildings_without_amenities.drop(
1003
        index=mask
1004
    ).reset_index(drop=True)
1005
    log.info(f"{len(mask)} duplicated ids removed!")
1006
1007
    # select median n buildings per cell
1008
    df_buildings_without_amenities = select_cts_buildings(
1009
        df_buildings_without_amenities, max_n=median_n_amenities
1010
    )
1011
    df_buildings_without_amenities["n_amenities_inside"] = 1
1012
    log.info(f"{median_n_amenities} buildings per cell selected!")
1013
1014
    # Create synthetic amenities and buildings in cells with only CTS demand
1015
    df_cells_with_cts_demand_only = cells_with_cts_demand_only(
1016
        df_buildings_without_amenities
1017
    )
1018
    log.info("Cells with only demand identified!")
1019
1020
    # Median n Amenities per cell
1021
    df_cells_with_cts_demand_only["amenities"] = median_n_amenities
1022
    # create row for every amenity
1023
    df_cells_with_cts_demand_only["amenities"] = (
1024
        df_cells_with_cts_demand_only["amenities"].astype(int).apply(range)
1025
    )
1026
    df_cells_with_cts_demand_only = df_cells_with_cts_demand_only.explode(
1027
        "amenities"
1028
    )
1029
    df_cells_with_cts_demand_only.drop(columns="amenities", inplace=True)
1030
1031
    # Only 1 Amenity per Building
1032
    df_cells_with_cts_demand_only["n_amenities_inside"] = 1
1033
    df_cells_with_cts_demand_only = place_buildings_with_amenities(
1034
        df_cells_with_cts_demand_only, amenities=1
1035
    )
1036
    df_synthetic_buildings_without_amenities = create_synthetic_buildings(
1037
        df_cells_with_cts_demand_only, points="geom_point"
1038
    )
1039
    log.info(f"{median_n_amenities} synthetic buildings per cell created")
1040
1041
    # TODO write to DB and remove (backup) renaming
1042
    write_table_to_postgis(
1043
        df_synthetic_buildings_without_amenities.rename(
1044
            columns={
1045
                "zensus_population_id": "cell_id",
1046
                "egon_building_id": "id",
1047
            }
1048
        ),
1049
        OsmBuildingsSynthetic,
1050
        drop=False,
1051
    )
1052
    log.info("Synthetic buildings exported to DB")
1053
1054
    # Concat all buildings
1055
    columns = [
1056
        "zensus_population_id",
1057
        "id",
1058
        "geom_building",
1059
        "n_amenities_inside",
1060
        "source",
1061
    ]
1062
1063
    df_buildings_with_amenities["source"] = "bwa"
1064
    df_synthetic_buildings_with_amenities["source"] = "sbwa"
1065
    df_buildings_without_amenities["source"] = "bwoa"
1066
    df_synthetic_buildings_without_amenities["source"] = "sbwoa"
1067
1068
    df_cts_buildings = pd.concat(
1069
        [
1070
            df_buildings_with_amenities[columns],
1071
            df_synthetic_buildings_with_amenities[columns],
1072
            df_buildings_without_amenities[columns],
1073
            df_synthetic_buildings_without_amenities[columns],
1074
        ],
1075
        axis=0,
1076
        ignore_index=True,
1077
    )
1078
    # TODO maybe remove after #772
1079
    df_cts_buildings["id"] = df_cts_buildings["id"].astype(int)
1080
1081
    # Write table to db for debugging
1082
    # TODO remove later
1083
    df_cts_buildings = gpd.GeoDataFrame(
1084
        df_cts_buildings, geometry="geom_building", crs=3035
1085
    )
1086
    df_cts_buildings = df_cts_buildings.reset_index().rename(
1087
        columns={"index": "serial"}
1088
    )
1089
    write_table_to_postgis(
1090
        df_cts_buildings,
1091
        CtsBuildings,
1092
        drop=True,
1093
    )
1094
    log.info("CTS buildings exported to DB!")
1095
1096
1097 View Code Duplication
def cts_electricity():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1098
    """
1099
    Calculate cts electricity demand share of hvmv substation profile
1100
     for buildings.
1101
    """
1102
    log = start_logging()
1103
    log.info("Start logging!")
1104
    with db.session_scope() as session:
1105
        cells_query = session.query(CtsBuildings)
1106
1107
    df_cts_buildings = pd.read_sql(
1108
        cells_query.statement, cells_query.session.bind, index_col=None
1109
    )
1110
    log.info("CTS buildings from DB imported!")
1111
    df_demand_share_2035 = calc_building_demand_profile_share(
1112
        df_cts_buildings, scenario="eGon2035", sector="electricity"
1113
    )
1114
    log.info("Profile share for egon2035 calculated!")
1115
    df_demand_share_100RE = calc_building_demand_profile_share(
1116
        df_cts_buildings, scenario="eGon100RE", sector="electricity"
1117
    )
1118
    log.info("Profile share for egon100RE calculated!")
1119
    df_demand_share = pd.concat(
1120
        [df_demand_share_2035, df_demand_share_100RE],
1121
        axis=0,
1122
        ignore_index=True,
1123
    )
1124
1125
    write_table_to_postgres(
1126
        df_demand_share, EgonCtsElectricityDemandBuildingShare, drop=True
1127
    )
1128
    log.info("Profile share exported to DB!")
1129
1130
1131 View Code Duplication
def cts_heat():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1132
    """
1133
    Calculate cts electricity demand share of hvmv substation profile
1134
     for buildings.
1135
    """
1136
    log = start_logging()
1137
    log.info("Start logging!")
1138
    with db.session_scope() as session:
1139
        cells_query = session.query(CtsBuildings)
1140
1141
    df_cts_buildings = pd.read_sql(
1142
        cells_query.statement, cells_query.session.bind, index_col=None
1143
    )
1144
    log.info("CTS buildings from DB imported!")
1145
1146
    df_demand_share_2035 = calc_building_demand_profile_share(
1147
        df_cts_buildings, scenario="eGon2035", sector="heat"
1148
    )
1149
    log.info("Profile share for egon2035 calculated!")
1150
    df_demand_share_100RE = calc_building_demand_profile_share(
1151
        df_cts_buildings, scenario="eGon100RE", sector="heat"
1152
    )
1153
    log.info("Profile share for egon100RE calculated!")
1154
    df_demand_share = pd.concat(
1155
        [df_demand_share_2035, df_demand_share_100RE],
1156
        axis=0,
1157
        ignore_index=True,
1158
    )
1159
1160
    write_table_to_postgres(
1161
        df_demand_share, EgonCtsHeatDemandBuildingShare, drop=True
1162
    )
1163
    log.info("Profile share exported to DB!")
1164
1165
1166
def get_cts_electricity_peak_load():
1167
    """
1168
    Get peak load of all CTS buildings for both scenarios and store in DB.
1169
    """
1170
    log = start_logging()
1171
    log.info("Start logging!")
1172
    # Delete rows with cts demand
1173
    with db.session_scope() as session:
1174
        session.query(BuildingPeakLoads).filter(
1175
            BuildingPeakLoads.sector == "cts"
1176
        ).delete()
1177
    log.info("CTS Peak load removed from DB!")
1178
1179
    for scenario in ["eGon2035", "eGon100RE"]:
1180
1181
        with db.session_scope() as session:
1182
            cells_query = session.query(
1183
                EgonCtsElectricityDemandBuildingShare
1184
            ).filter(
1185
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
1186
            )
1187
1188
        df_demand_share = pd.read_sql(
1189
            cells_query.statement, cells_query.session.bind, index_col=None
1190
        )
1191
1192
        df_cts_profiles = calc_load_curves_cts(scenario=scenario)
1193
1194
        df_peak_load = pd.merge(
1195
            left=df_cts_profiles.max(axis=0).rename("max"),
1196
            right=df_demand_share,
1197
            left_on="bus_id",
1198
            right_on="bus_id",
1199
        )
1200
1201
        # Convert unit from MWh to W
1202
        df_peak_load["max"] = df_peak_load["max"] * 1e6
1203
        df_peak_load["peak_load_in_w"] = (
1204
            df_peak_load["max"] * df_peak_load["profile_share"]
1205
        )
1206
        log.info(f"Peak load for {scenario} determined!")
1207
1208
        df_peak_load.rename(columns={"id": "building_id"}, inplace=True)
1209
        df_peak_load["sector"] = "cts"
1210
1211
        df_peak_load = df_peak_load[
1212
            ["building_id", "sector", "scenario", "peak_load_in_w"]
1213
        ]
1214
1215
        # Convert unit to W
1216
        df_peak_load["peak_load_in_w"] = df_peak_load["peak_load_in_w"] * 1e6
1217
1218
        # Write peak loads into db
1219
        with db.session_scope() as session:
1220
            session.bulk_insert_mappings(
1221
                BuildingPeakLoads,
1222
                df_peak_load.to_dict(orient="records"),
1223
            )
1224
        log.info(f"Peak load for {scenario} exported to DB!")
1225
1226
1227
class CtsElectricityBuildings(Dataset):
1228
    def __init__(self, dependencies):
1229
        super().__init__(
1230
            name="CtsElectricityBuildings",
1231
            version="0.0.0",
1232
            dependencies=dependencies,
1233
            tasks=(
1234
                cts_buildings,
1235
                {cts_electricity, cts_heat},
1236
                get_cts_electricity_peak_load,
1237
            ),
1238
        )
1239