Passed
Pull Request — dev (#826)
by
unknown
01:32
created

data.datasets.electricity_demand_timeseries.cts_buildings   C

Complexity

Total Complexity 55

Size/Duplication

Total Lines 1285
Duplicated Lines 5.06 %

Importance

Changes 0
Metric Value
wmc 55
eloc 665
dl 65
loc 1285
rs 5.9349
c 0
b 0
f 0

17 Functions

Rating   Name   Duplication   Size   Complexity  
A delete_synthetic_cts_buildings() 0 14 2
A place_buildings_with_amenities() 0 41 3
B create_synthetic_buildings() 0 67 6
A amenities_without_buildings() 0 60 2
A select_cts_buildings() 0 25 1
B cells_with_cts_demand_only() 0 72 2
A start_logging() 0 11 1
C buildings_with_amenities() 0 183 5
A calc_census_cell_share() 0 56 4
B buildings_without_amenities() 0 99 2
A cts_heat() 33 33 2
A remove_double_bus_id() 0 19 2
B get_cts_electricity_peak_load() 0 56 5
C calc_building_profiles() 0 106 10
B calc_building_demand_profile_share() 0 105 3
A cts_electricity() 32 32 2
B cts_buildings() 0 186 2

1 Method

Rating   Name   Duplication   Size   Complexity  
A CtsElectricityBuildings.__init__() 0 9 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.electricity_demand_timeseries.cts_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import logging
2
3
from geoalchemy2 import Geometry
4
from geoalchemy2.shape import to_shape
5
from sqlalchemy import REAL, Column, Integer, String, func
6
from sqlalchemy.ext.declarative import declarative_base
7
import geopandas as gpd
8
import numpy as np
9
import pandas as pd
10
import saio
11
12
from egon.data import db
13
from egon.data.datasets import Dataset
14
from egon.data.datasets.electricity_demand import (
15
    EgonDemandRegioZensusElectricity,
16
)
17
from egon.data.datasets.electricity_demand.temporal import (
18
    EgonEtragoElectricityCts,
19
    calc_load_curves_cts,
20
)
21
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
22
    BuildingPeakLoads,
23
    OsmBuildingsSynthetic,
24
)
25
from egon.data.datasets.electricity_demand_timeseries.tools import (
26
    random_ints_until_sum,
27
    random_point_in_square,
28
    specific_int_until_sum,
29
    write_table_to_postgis,
30
    write_table_to_postgres,
31
)
32
from egon.data.datasets.heat_demand import EgonPetaHeat
33
from egon.data.datasets.zensus_mv_grid_districts import MapZensusGridDistricts
34
from egon.data.datasets.zensus_vg250 import DestatisZensusPopulationPerHa
35
36
engine = db.engine()
37
Base = declarative_base()
38
39
# import db tables
40
saio.register_schema("openstreetmap", engine=engine)
41
saio.register_schema("boundaries", engine=engine)
42
43
44
class EgonCtsElectricityDemandBuildingShare(Base):
45
    __tablename__ = "egon_cts_electricity_demand_building_share"
46
    __table_args__ = {"schema": "demand"}
47
48
    id = Column(Integer, primary_key=True)
49
    scenario = Column(String, primary_key=True)
50
    bus_id = Column(Integer, index=True)
51
    profile_share = Column(REAL)
52
53
54
class EgonCtsHeatDemandBuildingShare(Base):
55
    __tablename__ = "egon_cts_heat_demand_building_share"
56
    __table_args__ = {"schema": "demand"}
57
58
    id = Column(Integer, primary_key=True)
59
    scenario = Column(String, primary_key=True)
60
    bus_id = Column(Integer, index=True)
61
    profile_share = Column(REAL)
62
63
64
class CtsBuildings(Base):
65
    __tablename__ = "egon_cts_buildings"
66
    __table_args__ = {"schema": "openstreetmap"}
67
68
    serial = Column(Integer, primary_key=True)
69
    id = Column(Integer, index=True)
70
    zensus_population_id = Column(Integer, index=True)
71
    geom_building = Column(Geometry("Polygon", 3035))
72
    n_amenities_inside = Column(Integer)
73
    source = Column(String)
74
75
76
def start_logging():
77
    """Start logging into console"""
78
    log = logging.getLogger()
79
    log.setLevel(logging.INFO)
80
    logformat = logging.Formatter(
81
        "%(asctime)s %(message)s", "%m/%d/%Y %H:%M:%S"
82
    )
83
    sh = logging.StreamHandler()
84
    sh.setFormatter(logformat)
85
    log.addHandler(sh)
86
    return log
87
88
89
def amenities_without_buildings():
90
    """
91
    Amenities which have no buildings assigned and are in
92
    a cell with cts demand are determined.
93
94
    Returns
95
    -------
96
    pd.DataFrame
97
        Table of amenities without buildings
98
    """
99
    from saio.openstreetmap import osm_amenities_not_in_buildings_filtered
100
101
    with db.session_scope() as session:
102
        cells_query = (
103
            session.query(
104
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
105
                # TODO can be used for square around amenity
106
                #  (1 geom_amenity: 1 geom_building)
107
                #  not unique amenity_ids yet
108
                osm_amenities_not_in_buildings_filtered.geom_amenity,
109
                osm_amenities_not_in_buildings_filtered.egon_amenity_id,
110
                # EgonDemandRegioZensusElectricity.demand,
111
                # # TODO can be used to generate n random buildings
112
                # # (n amenities : 1 randombuilding)
113
                # func.count(
114
                #     osm_amenities_not_in_buildings_filtered.egon_amenity_id
115
                # ).label("n_amenities_inside"),
116
                # DestatisZensusPopulationPerHa.geom,
117
            )
118
            .filter(
119
                func.st_within(
120
                    osm_amenities_not_in_buildings_filtered.geom_amenity,
121
                    DestatisZensusPopulationPerHa.geom,
122
                )
123
            )
124
            .filter(
125
                DestatisZensusPopulationPerHa.id
126
                == EgonDemandRegioZensusElectricity.zensus_population_id
127
            )
128
            .filter(
129
                EgonDemandRegioZensusElectricity.sector == "service",
130
                EgonDemandRegioZensusElectricity.scenario == "eGon2035"
131
                #         ).group_by(
132
                #             EgonDemandRegioZensusElectricity.zensus_population_id,
133
                #             DestatisZensusPopulationPerHa.geom,
134
            )
135
        )
136
    # # TODO can be used to generate n random buildings
137
    # df_cells_with_amenities_not_in_buildings = gpd.read_postgis(
138
    #     cells_query.statement, cells_query.session.bind, geom_col="geom"
139
    # )
140
    #
141
142
    # # TODO can be used for square around amenity
143
    df_amenities_without_buildings = gpd.read_postgis(
144
        cells_query.statement,
145
        cells_query.session.bind,
146
        geom_col="geom_amenity",
147
    )
148
    return df_amenities_without_buildings
149
150
151
def place_buildings_with_amenities(df, amenities=None, max_amenities=None):
152
    """
153
    Building centroids are placed randomly within census cells.
154
    The Number of buildings is derived from n_amenity_inside, the selected
155
    method and number of amenities per building.
156
157
    Returns
158
    -------
159
    df: gpd.GeoDataFrame
160
        Table of buildings centroids
161
    """
162
    if isinstance(max_amenities, int):
163
        # amount of amenities is randomly generated within bounds
164
        # (max_amenities, amenities per cell)
165
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
166
            random_ints_until_sum, args=[max_amenities]
167
        )
168
    if isinstance(amenities, int):
169
        # Specific amount of amenities per building
170
        df["n_amenities_inside"] = df["n_amenities_inside"].apply(
171
            specific_int_until_sum, args=[amenities]
172
        )
173
174
    # Unnest each building
175
    df = df.explode(column="n_amenities_inside")
176
177
    # building count per cell
178
    df["building_count"] = df.groupby(["zensus_population_id"]).cumcount() + 1
179
180
    # generate random synthetic buildings
181
    edge_length = 5
182
    # create random points within census cells
183
    points = random_point_in_square(geom=df["geom"], tol=edge_length / 2)
184
185
    df.reset_index(drop=True, inplace=True)
186
    # Store center of polygon
187
    df["geom_point"] = points
188
    # Drop geometry of census cell
189
    df = df.drop(columns=["geom"])
190
191
    return df
192
193
194
def create_synthetic_buildings(df, points=None, crs="EPSG:3035"):
195
    """
196
    Synthetic buildings are generated around points.
197
198
    Parameters
199
    ----------
200
    df: pd.DataFrame
201
        Table of census cells
202
    points: gpd.GeoSeries or str
203
        List of points to place buildings around or column name of df
204
    crs: str
205
        CRS of result table
206
207
    Returns
208
    -------
209
    df: gpd.GeoDataFrame
210
        Synthetic buildings
211
    """
212
213
    if isinstance(points, str) and points in df.columns:
214
        points = df[points]
215
    elif isinstance(points, gpd.GeoSeries):
216
        pass
217
    else:
218
        raise ValueError("Points are of the wrong type")
219
220
    # Create building using a square around point
221
    edge_length = 5
222
    df["geom_building"] = points.buffer(distance=edge_length / 2, cap_style=3)
223
224
    if "geom_point" not in df.columns:
225
        df["geom_point"] = df["geom_building"].centroid
226
227
    # TODO Check CRS
228
    df = gpd.GeoDataFrame(
229
        df,
230
        crs=crs,
231
        geometry="geom_building",
232
    )
233
234
    # TODO remove after implementation of egon_building_id
235
    df.rename(columns={"id": "egon_building_id"}, inplace=True)
236
237
    # get max number of building ids from synthetic residential table
238
    with db.session_scope() as session:
239
        max_synth_residential_id = session.execute(
240
            func.max(OsmBuildingsSynthetic.id)
241
        ).scalar()
242
    max_synth_residential_id = int(max_synth_residential_id)
243
244
    # create sequential ids
245
    df["egon_building_id"] = range(
246
        max_synth_residential_id + 1,
247
        max_synth_residential_id + df.shape[0] + 1,
248
    )
249
250
    df["area"] = df["geom_building"].area
251
    # set building type of synthetic building
252
    df["building"] = "cts"
253
    # TODO remove in #772
254
    df = df.rename(
255
        columns={
256
            # "zensus_population_id": "cell_id",
257
            "egon_building_id": "id",
258
        }
259
    )
260
    return df
261
262
263
def buildings_with_amenities():
264
    """
265
    Amenities which are assigned to buildings are determined
266
    and grouped per building and zensus cell. Buildings
267
    covering multiple cells therefore exists multiple times
268
    but in different zensus cells. This is necessary to cover
269
    all cells with a cts demand. If buildings exist in multiple
270
    substations, their amenities are summed and assigned and kept in
271
    one substation only. If as a result, a census cell is uncovered,
272
    a synthetic amenity is placed. The buildings are aggregated
273
    afterwards during the calculation of the profile_share.
274
275
    Returns
276
    -------
277
    df_buildings_with_amenities: gpd.GeoDataFrame
278
        Contains all buildings with amenities per zensus cell.
279
    df_lost_cells: gpd.GeoDataFrame
280
        Contains synthetic amenities in lost cells. Might be empty
281
    """
282
283
    from saio.openstreetmap import osm_amenities_in_buildings_filtered
284
285
    with db.session_scope() as session:
286
        cells_query = (
287
            session.query(
288
                osm_amenities_in_buildings_filtered,
289
                MapZensusGridDistricts.bus_id,
290
            )
291
            .filter(
292
                MapZensusGridDistricts.zensus_population_id
293
                == osm_amenities_in_buildings_filtered.zensus_population_id
294
            )
295
            .filter(
296
                EgonDemandRegioZensusElectricity.zensus_population_id
297
                == osm_amenities_in_buildings_filtered.zensus_population_id
298
            )
299
            .filter(
300
                EgonDemandRegioZensusElectricity.sector == "service",
301
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
302
            )
303
        )
304
    df_amenities_in_buildings = pd.read_sql(
305
        cells_query.statement, cells_query.session.bind, index_col=None
306
    )
307
308
    df_amenities_in_buildings["geom_building"] = df_amenities_in_buildings[
309
        "geom_building"
310
    ].apply(to_shape)
311
    df_amenities_in_buildings["geom_amenity"] = df_amenities_in_buildings[
312
        "geom_amenity"
313
    ].apply(to_shape)
314
315
    df_amenities_in_buildings["n_amenities_inside"] = 1
316
317
    # add identifier column for buildings in multiple substations
318
    df_amenities_in_buildings[
319
        "duplicate_identifier"
320
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
321
        "n_amenities_inside"
322
    ].transform(
323
        "cumsum"
324
    )
325
    df_amenities_in_buildings = df_amenities_in_buildings.sort_values(
326
        ["id", "duplicate_identifier"]
327
    )
328
    # sum amenities of buildings with multiple substations
329
    df_amenities_in_buildings[
330
        "n_amenities_inside"
331
    ] = df_amenities_in_buildings.groupby(["id", "duplicate_identifier"])[
332
        "n_amenities_inside"
333
    ].transform(
334
        "sum"
335
    )
336
337
    # create column to always go for bus_id with max amenities
338
    df_amenities_in_buildings[
339
        "max_amenities"
340
    ] = df_amenities_in_buildings.groupby(["id", "bus_id"])[
341
        "n_amenities_inside"
342
    ].transform(
343
        "sum"
344
    )
345
    # sort to go for
346
    df_amenities_in_buildings.sort_values(
347
        ["id", "max_amenities"], ascending=False, inplace=True
348
    )
349
350
    # identify lost zensus cells
351
    df_lost_cells = df_amenities_in_buildings.loc[
352
        df_amenities_in_buildings.duplicated(
353
            subset=["id", "duplicate_identifier"], keep="first"
354
        )
355
    ]
356
    df_lost_cells.drop_duplicates(
357
        subset=["zensus_population_id"], inplace=True
358
    )
359
360
    # drop buildings with multiple substation and lower max amenity
361
    df_amenities_in_buildings.drop_duplicates(
362
        subset=["id", "duplicate_identifier"], keep="first", inplace=True
363
    )
364
365
    # check if lost zensus cells are already covered
366
    if not df_lost_cells.empty:
367
        if not (
368
            df_amenities_in_buildings["zensus_population_id"]
369
            .isin(df_lost_cells["zensus_population_id"])
370
            .empty
371
        ):
372
            # query geom data for cell if not
373
            with db.session_scope() as session:
374
                cells_query = session.query(
375
                    DestatisZensusPopulationPerHa.id,
376
                    DestatisZensusPopulationPerHa.geom,
377
                ).filter(
378
                    DestatisZensusPopulationPerHa.id.in_(
379
                        df_lost_cells["zensus_population_id"]
380
                    )
381
                )
382
383
            df_lost_cells = gpd.read_postgis(
384
                cells_query.statement,
385
                cells_query.session.bind,
386
                geom_col="geom",
387
            )
388
            # TODO maybe adapt method
389
            # place random amenity in cell
390
            df_lost_cells["n_amenities_inside"] = 1
391
            df_lost_cells.rename(
392
                columns={
393
                    "id": "zensus_population_id",
394
                },
395
                inplace=True,
396
            )
397
            df_lost_cells = place_buildings_with_amenities(
398
                df_lost_cells, amenities=1
399
            )
400
            df_lost_cells.rename(
401
                columns={
402
                    # "id": "zensus_population_id",
403
                    "geom_point": "geom_amenity",
404
                },
405
                inplace=True,
406
            )
407
            df_lost_cells.drop(
408
                columns=["building_count", "n_amenities_inside"], inplace=True
409
            )
410
        else:
411
            df_lost_cells = None
412
    else:
413
        df_lost_cells = None
414
415
    # drop helper columns
416
    df_amenities_in_buildings.drop(
417
        columns=["duplicate_identifier"], inplace=True
418
    )
419
420
    # sum amenities per building and cell
421
    df_amenities_in_buildings[
422
        "n_amenities_inside"
423
    ] = df_amenities_in_buildings.groupby(["zensus_population_id", "id"])[
424
        "n_amenities_inside"
425
    ].transform(
426
        "sum"
427
    )
428
    # drop duplicated buildings
429
    df_buildings_with_amenities = df_amenities_in_buildings.drop_duplicates(
430
        ["id", "zensus_population_id"]
431
    )
432
    df_buildings_with_amenities.reset_index(inplace=True, drop=True)
433
434
    df_buildings_with_amenities = df_buildings_with_amenities[
435
        ["id", "zensus_population_id", "geom_building", "n_amenities_inside"]
436
    ]
437
    df_buildings_with_amenities.rename(
438
        columns={
439
            # "zensus_population_id": "cell_id",
440
            "egon_building_id": "id"
441
        },
442
        inplace=True,
443
    )
444
445
    return df_buildings_with_amenities, df_lost_cells
446
447
448
def buildings_without_amenities():
449
    """
450
    Buildings (filtered and synthetic) in cells with
451
    cts demand but no amenities are determined.
452
453
    Returns
454
    -------
455
    df_buildings_without_amenities: gpd.GeoDataFrame
456
        Table of buildings without amenities in zensus cells
457
        with cts demand.
458
    """
459
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
460
    from saio.openstreetmap import (
461
        osm_amenities_shops_filtered,
462
        osm_buildings_filtered,
463
        osm_buildings_synthetic,
464
    )
465
466
    # buildings_filtered in cts-demand-cells without amenities
467
    with db.session_scope() as session:
468
469
        # Synthetic Buildings
470
        q_synth_buildings = session.query(
471
            osm_buildings_synthetic.cell_id.cast(Integer).label(
472
                "zensus_population_id"
473
            ),
474
            osm_buildings_synthetic.id.cast(Integer).label("id"),
475
            osm_buildings_synthetic.area.label("area"),
476
            osm_buildings_synthetic.geom_building.label("geom_building"),
477
            osm_buildings_synthetic.geom_point.label("geom_point"),
478
        )
479
480
        # Buildings filtered
481
        q_buildings_filtered = session.query(
482
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
483
            osm_buildings_filtered.id,
484
            osm_buildings_filtered.area,
485
            osm_buildings_filtered.geom_building,
486
            osm_buildings_filtered.geom_point,
487
        ).filter(
488
            osm_buildings_filtered.id
489
            == egon_map_zensus_buildings_filtered_all.id
490
        )
491
492
        # Amenities + zensus_population_id
493
        q_amenities = (
494
            session.query(
495
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
496
            )
497
            .filter(
498
                func.st_within(
499
                    osm_amenities_shops_filtered.geom_amenity,
500
                    DestatisZensusPopulationPerHa.geom,
501
                )
502
            )
503
            .distinct(DestatisZensusPopulationPerHa.id)
504
        )
505
506
        # Cells with CTS demand but without amenities
507
        q_cts_without_amenities = (
508
            session.query(
509
                EgonDemandRegioZensusElectricity.zensus_population_id,
510
            )
511
            .filter(
512
                EgonDemandRegioZensusElectricity.sector == "service",
513
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
514
            )
515
            .filter(
516
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
517
                    q_amenities
518
                )
519
            )
520
            .distinct()
521
        )
522
523
        # Buildings filtered + synthetic buildings residential in
524
        # cells with CTS demand but without amenities
525
        cells_query = q_synth_buildings.union(q_buildings_filtered).filter(
526
            egon_map_zensus_buildings_filtered_all.zensus_population_id.in_(
527
                q_cts_without_amenities
528
            )
529
        )
530
531
    # df_buildings_without_amenities = pd.read_sql(
532
    #     cells_query.statement, cells_query.session.bind, index_col=None)
533
    df_buildings_without_amenities = gpd.read_postgis(
534
        cells_query.statement,
535
        cells_query.session.bind,
536
        geom_col="geom_building",
537
    )
538
539
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
540
        columns={
541
            # "zensus_population_id": "cell_id",
542
            "egon_building_id": "id",
543
        }
544
    )
545
546
    return df_buildings_without_amenities
547
548
549
def select_cts_buildings(df_buildings_wo_amenities, max_n):
550
    """
551
    N Buildings (filtered and synthetic) in each cell with
552
    cts demand are selected. Only the first n buildings
553
    are taken for each cell. The buildings are sorted by surface
554
    area.
555
556
    Returns
557
    -------
558
    df_buildings_with_cts_demand: gpd.GeoDataFrame
559
        Table of buildings
560
    """
561
562
    df_buildings_wo_amenities.sort_values(
563
        "area", ascending=False, inplace=True
564
    )
565
    # select first n ids each census cell if available
566
    df_buildings_with_cts_demand = (
567
        df_buildings_wo_amenities.groupby("zensus_population_id")
568
        .nth(list(range(max_n)))
569
        .reset_index()
570
    )
571
    df_buildings_with_cts_demand.reset_index(drop=True, inplace=True)
572
573
    return df_buildings_with_cts_demand
574
575
576
def cells_with_cts_demand_only(df_buildings_without_amenities):
577
    """
578
    Cells with cts demand but no amenities or buildilngs
579
    are determined.
580
581
    Returns
582
    -------
583
    df_cells_only_cts_demand: gpd.GeoDataFrame
584
        Table of cells with cts demand but no amenities or buildings
585
    """
586
    from saio.openstreetmap import osm_amenities_shops_filtered
587
588
    # cells mit amenities
589
    with db.session_scope() as session:
590
        sub_query = (
591
            session.query(
592
                DestatisZensusPopulationPerHa.id.label("zensus_population_id"),
593
            )
594
            .filter(
595
                func.st_within(
596
                    osm_amenities_shops_filtered.geom_amenity,
597
                    DestatisZensusPopulationPerHa.geom,
598
                )
599
            )
600
            .distinct(DestatisZensusPopulationPerHa.id)
601
        )
602
603
        cells_query = (
604
            session.query(
605
                EgonDemandRegioZensusElectricity.zensus_population_id,
606
                EgonDemandRegioZensusElectricity.scenario,
607
                EgonDemandRegioZensusElectricity.sector,
608
                EgonDemandRegioZensusElectricity.demand,
609
                DestatisZensusPopulationPerHa.geom,
610
            )
611
            .filter(
612
                EgonDemandRegioZensusElectricity.sector == "service",
613
                EgonDemandRegioZensusElectricity.scenario == "eGon2035",
614
            )
615
            .filter(
616
                EgonDemandRegioZensusElectricity.zensus_population_id.notin_(
617
                    sub_query
618
                )
619
            )
620
            .filter(
621
                EgonDemandRegioZensusElectricity.zensus_population_id
622
                == DestatisZensusPopulationPerHa.id
623
            )
624
        )
625
626
    df_cts_cell_without_amenities = gpd.read_postgis(
627
        cells_query.statement,
628
        cells_query.session.bind,
629
        geom_col="geom",
630
        index_col=None,
631
    )
632
633
    # TODO maybe remove
634
    df_buildings_without_amenities = df_buildings_without_amenities.rename(
635
        columns={"cell_id": "zensus_population_id"}
636
    )
637
638
    # Census cells with only cts demand
639
    df_cells_only_cts_demand = df_cts_cell_without_amenities.loc[
640
        ~df_cts_cell_without_amenities["zensus_population_id"].isin(
641
            df_buildings_without_amenities["zensus_population_id"].unique()
642
        )
643
    ]
644
645
    df_cells_only_cts_demand.reset_index(drop=True, inplace=True)
646
647
    return df_cells_only_cts_demand
648
649
650
def calc_census_cell_share(scenario="eGon2035", sector="electricity"):
651
    """
652
    The profile share for each census cell is calculated by it's
653
    share of annual demand per substation bus. The annual demand
654
    per cell is defined by DemandRegio/Peta5. The share is for both
655
    scenarios identical as the annual demand is linearly scaled.
656
657
    Parameters
658
    ----------
659
    scenario: str
660
        Scenario for which the share is calculated.
661
    sector: str
662
        Scenario for which the share is calculated.
663
664
    Returns
665
    -------
666
    df_census_share: pd.DataFrame
667
    """
668
    if sector == "electricity":
669
        demand_table = EgonDemandRegioZensusElectricity
670
    elif sector == "heat":
671
        demand_table = EgonPetaHeat
672
673
    with db.session_scope() as session:
674
        cells_query = (
675
            session.query(demand_table, MapZensusGridDistricts.bus_id)
0 ignored issues
show
introduced by
The variable demand_table does not seem to be defined for all execution paths.
Loading history...
676
            .filter(demand_table.sector == "service")
677
            .filter(demand_table.scenario == scenario)
678
            .filter(
679
                demand_table.zensus_population_id
680
                == MapZensusGridDistricts.zensus_population_id
681
            )
682
        )
683
684
    df_demand = pd.read_sql(
685
        cells_query.statement,
686
        cells_query.session.bind,
687
        index_col="zensus_population_id",
688
    )
689
690
    # get demand share of cell per bus
691
    df_census_share = df_demand["demand"] / df_demand.groupby("bus_id")[
692
        "demand"
693
    ].transform("sum")
694
    df_census_share = df_census_share.rename("cell_share")
695
696
    df_census_share = pd.concat(
697
        [
698
            df_census_share,
699
            df_demand[["bus_id", "scenario"]],
700
        ],
701
        axis=1,
702
    )
703
704
    df_census_share.reset_index(inplace=True)
705
    return df_census_share
706
707
708
def calc_building_demand_profile_share(
709
    df_cts_buildings, scenario="eGon2035", sector="electricity"
710
):
711
    """
712
    Share of cts electricity demand profile per bus for every selected building
713
    is calculated. Building-amenity share is multiplied with census cell share
714
    to get the substation bus profile share for each building. The share is
715
    grouped and aggregated per building as some buildings exceed the shape of
716
    census cells and have amenities assigned from multiple cells. Building
717
    therefore get the amenity share of all census cells.
718
719
    Parameters
720
    ----------
721
    df_cts_buildings: gpd.GeoDataFrame
722
        Table of all buildings with cts demand assigned
723
    scenario: str
724
        Scenario for which the share is calculated.
725
    sector: str
726
        Sector for which the share is calculated.
727
728
    Returns
729
    -------
730
    df_building_share: pd.DataFrame
731
        Table of bus profile share per building
732
733
    """
734
735
    from saio.boundaries import egon_map_zensus_buildings_filtered_all
736
737
    def calc_building_amenity_share(df_cts_buildings):
738
        """
739
        Calculate the building share by the number amenities per building
740
        within a census cell. Building ids can exist multiple time but with
741
        different zensus_population_ids.
742
        """
743
        df_building_amenity_share = df_cts_buildings[
744
            "n_amenities_inside"
745
        ] / df_cts_buildings.groupby("zensus_population_id")[
746
            "n_amenities_inside"
747
        ].transform(
748
            "sum"
749
        )
750
        df_building_amenity_share = pd.concat(
751
            [
752
                df_building_amenity_share.rename("building_amenity_share"),
753
                df_cts_buildings[["zensus_population_id", "id"]],
754
            ],
755
            axis=1,
756
        )
757
        return df_building_amenity_share
758
759
    df_building_amenity_share = calc_building_amenity_share(df_cts_buildings)
760
761
    df_census_cell_share = calc_census_cell_share(
762
        scenario=scenario, sector=sector
763
    )
764
765
    df_demand_share = pd.merge(
766
        left=df_building_amenity_share,
767
        right=df_census_cell_share,
768
        left_on="zensus_population_id",
769
        right_on="zensus_population_id",
770
    )
771
    df_demand_share["profile_share"] = df_demand_share[
772
        "building_amenity_share"
773
    ].multiply(df_demand_share["cell_share"])
774
775
    # TODO bus_id fix
776
    # df_demand_share = df_demand_share[
777
    #     ["id", "bus_id", "scenario", "profile_share"]
778
    # ]
779
    df_demand_share = df_demand_share[["id", "scenario", "profile_share"]]
780
781
    # assign bus_id via census cell of building centroid
782
    with db.session_scope() as session:
783
        cells_query = session.query(
784
            egon_map_zensus_buildings_filtered_all.id,
785
            egon_map_zensus_buildings_filtered_all.zensus_population_id,
786
            MapZensusGridDistricts.bus_id,
787
        ).filter(
788
            MapZensusGridDistricts.zensus_population_id
789
            == egon_map_zensus_buildings_filtered_all.zensus_population_id
790
        )
791
792
    df_egon_map_zensus_buildings_buses = pd.read_sql(
793
        cells_query.statement,
794
        cells_query.session.bind,
795
        index_col=None,
796
    )
797
    df_demand_share = pd.merge(
798
        left=df_demand_share, right=df_egon_map_zensus_buildings_buses, on="id"
799
    )
800
801
    # TODO adapt groupby?
802
    # Group and aggregate per building for multi cell buildings
803
    df_demand_share = (
804
        df_demand_share.groupby(["scenario", "id", "bus_id"])
805
        .sum()
806
        .reset_index()
807
    )
808
    if df_demand_share.duplicated("id", keep=False).any():
809
        print(
810
            df_demand_share.loc[df_demand_share.duplicated("id", keep=False)]
811
        )
812
    return df_demand_share
813
814
815
def calc_building_profiles(
816
    egon_building_id=None,
817
    bus_id=None,
818
    scenario="eGon2035",
819
    sector="electricity",
820
):
821
    """
822
    Calculate the demand profile for each building. The profile is
823
    calculated by the demand share of the building per substation bus.
824
825
    Parameters
826
    ----------
827
    egon_building_id: int
828
        Id of the building for which the profile is calculated.
829
        If not given, the profiles are calculated for all buildings.
830
    bus_id: int
831
        Id of the substation for which the all profiles are calculated.
832
        If not given, the profiles are calculated for all buildings.
833
    scenario: str
834
        Scenario for which the share is calculated.
835
    sector: str
836
        Sector for which the share is calculated.
837
838
    Returns
839
    -------
840
    df_building_profiles: pd.DataFrame
841
        Table of demand profile per building
842
    """
843
    if sector == "electricity":
844
        with db.session_scope() as session:
845
            cells_query = session.query(
846
                EgonCtsElectricityDemandBuildingShare,
847
            ).filter(
848
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
849
            )
850
851
        df_demand_share = pd.read_sql(
852
            cells_query.statement, cells_query.session.bind, index_col=None
853
        )
854
855
        # TODO maybe use demand.egon_etrago_electricity_cts
856
        # with db.session_scope() as session:
857
        #     cells_query = (
858
        #         session.query(
859
        #             EgonEtragoElectricityCts
860
        #         ).filter(
861
        #             EgonEtragoElectricityCts.scn_name == scenario)
862
        #     )
863
        #
864
        # df_cts_profiles = pd.read_sql(
865
        #     cells_query.statement,
866
        #     cells_query.session.bind,
867
        # )
868
        # df_cts_profiles = pd.DataFrame.from_dict(
869
        #   df_cts_profiles.set_index('bus_id')['p_set'].to_dict(),
870
        #   orient="index")
871
        df_cts_profiles = calc_load_curves_cts(scenario)
872
873
    elif sector == "heat":
874
        with db.session_scope() as session:
875
            cells_query = session.query(
876
                EgonCtsHeatDemandBuildingShare,
877
            ).filter(EgonCtsHeatDemandBuildingShare.scenario == scenario)
878
879
        df_demand_share = pd.read_sql(
880
            cells_query.statement, cells_query.session.bind, index_col=None
881
        )
882
883
        # TODO cts heat substation profiles missing
884
885
    # get demand share of selected building id
886
    if isinstance(egon_building_id, int):
887
        if egon_building_id in df_demand_share["id"]:
0 ignored issues
show
introduced by
The variable df_demand_share does not seem to be defined for all execution paths.
Loading history...
888
            df_demand_share = df_demand_share.loc[
889
                df_demand_share["id"] == egon_building_id
890
            ]
891
        else:
892
            raise KeyError(f"Building with id {egon_building_id} not found")
893
    # TODO maybe add list
894
    # elif isinstance(egon_building_id, list):
895
896
    # get demand share of all buildings for selected bus id
897
    if isinstance(bus_id, int):
898
        if bus_id in df_demand_share["bus_id"]:
899
            df_demand_share = df_demand_share.loc[
900
                df_demand_share["bus_id"] == bus_id
901
            ]
902
        else:
903
            raise KeyError(f"Bus with id {bus_id} not found")
904
905
    # get demand profile for all buildings for selected demand share
906
    # TODO takes a few seconds per iteration
907
    df_building_profiles = pd.DataFrame()
908
    for bus_id, df in df_demand_share.groupby("bus_id"):
909
        shares = df.set_index("id", drop=True)["profile_share"]
910
        profile = df_cts_profiles.loc[:, bus_id]
0 ignored issues
show
introduced by
The variable df_cts_profiles does not seem to be defined for all execution paths.
Loading history...
911
        # building_profiles = profile.apply(lambda x: x * shares)
912
        building_profiles = np.outer(profile, shares)
913
        building_profiles = pd.DataFrame(
914
            building_profiles, index=profile.index, columns=shares.index
915
        )
916
        df_building_profiles = pd.concat(
917
            [df_building_profiles, building_profiles], axis=1
918
        )
919
920
    return df_building_profiles
921
922
923
def delete_synthetic_cts_buildings():
924
    """
925
    All synthetic cts buildings are deleted from the DB. This is necessary if
926
    the task is run multiple times as the existing synthetic buildings
927
    influence the results.
928
    """
929
    # import db tables
930
    from saio.openstreetmap import osm_buildings_synthetic
931
932
    # cells mit amenities
933
    with db.session_scope() as session:
934
        session.query(osm_buildings_synthetic).filter(
935
            osm_buildings_synthetic.building == "cts"
936
        ).delete()
937
938
939
def cts_buildings():
940
    """
941
    Assigns CTS demand to buildings and calculates the respective demand
942
    profiles. The demand profile per substation are disaggregated per
943
    annual demand share of each census cell and by the number of amenities
944
    per building within the cell. If no building data is available,
945
    synthetic buildings are generated around the amenities. If no amenities
946
    but cts demand is available, buildings are randomly selected. If no
947
    building nor amenity is available, random synthetic buildings are
948
    generated. The demand share is stored in the database.
949
950
    Note:
951
    -----
952
    Cells with CTS demand, amenities and buildings do not change within
953
    the scenarios, only the demand itself. Therefore scenario eGon2035
954
    can be used universally to determine the cts buildings but not for
955
    he demand share.
956
    """
957
958
    log = start_logging()
959
    log.info("Start logging!")
960
    # Buildings with amenities
961
    df_buildings_with_amenities, df_lost_cells = buildings_with_amenities()
962
    log.info("Buildings with amenities selected!")
963
964
    # Median number of amenities per cell
965
    median_n_amenities = int(
966
        df_buildings_with_amenities.groupby("zensus_population_id")[
967
            "n_amenities_inside"
968
        ]
969
        .sum()
970
        .median()
971
    )
972
    # TODO remove
973
    print(f"Median amenity value: {median_n_amenities}")
974
975
    # Remove synthetic CTS buildings if existing
976
    delete_synthetic_cts_buildings()
977
    log.info("Old synthetic cts buildings deleted!")
978
979
    # Amenities not assigned to buildings
980
    df_amenities_without_buildings = amenities_without_buildings()
981
    log.info("Amenities without buildlings selected!")
982
983
    # Append lost cells due to duplicated ids, to cover all demand cells
984
    if not df_lost_cells.empty:
985
986
        df_lost_cells["amenities"] = median_n_amenities
987
        # create row for every amenity
988
        df_lost_cells["amenities"] = (
989
            df_lost_cells["amenities"].astype(int).apply(range)
990
        )
991
        df_lost_cells = df_lost_cells.explode("amenities")
992
        df_lost_cells.drop(columns="amenities", inplace=True)
993
        df_amenities_without_buildings = df_amenities_without_buildings.append(
994
            df_lost_cells, ignore_index=True
995
        )
996
        log.info("Lost cells due to substation intersection appended!")
997
998
    # One building per amenity
999
    df_amenities_without_buildings["n_amenities_inside"] = 1
1000
    # Create synthetic buildings for amenites without buildings
1001
    df_synthetic_buildings_with_amenities = create_synthetic_buildings(
1002
        df_amenities_without_buildings, points="geom_amenity"
1003
    )
1004
    log.info("Synthetic buildings created!")
1005
1006
    # TODO write to DB and remove renaming
1007
    write_table_to_postgis(
1008
        df_synthetic_buildings_with_amenities.rename(
1009
            columns={
1010
                "zensus_population_id": "cell_id",
1011
                "egon_building_id": "id",
1012
            }
1013
        ),
1014
        OsmBuildingsSynthetic,
1015
        drop=False,
1016
    )
1017
    log.info("Synthetic buildings exported to DB!")
1018
1019
    # Cells without amenities but CTS demand and buildings
1020
    df_buildings_without_amenities = buildings_without_amenities()
1021
    log.info("Buildings without amenities in demand cells identified!")
1022
1023
    # TODO Fix Adhoc Bugfix duplicated buildings
1024
    # drop building ids which have already been used
1025
    mask = df_buildings_without_amenities.loc[
1026
        df_buildings_without_amenities["id"].isin(
1027
            df_buildings_with_amenities["id"]
1028
        )
1029
    ].index
1030
    df_buildings_without_amenities = df_buildings_without_amenities.drop(
1031
        index=mask
1032
    ).reset_index(drop=True)
1033
    log.info(f"{len(mask)} duplicated ids removed!")
1034
1035
    # select median n buildings per cell
1036
    df_buildings_without_amenities = select_cts_buildings(
1037
        df_buildings_without_amenities, max_n=median_n_amenities
1038
    )
1039
    df_buildings_without_amenities["n_amenities_inside"] = 1
1040
    log.info(f"{median_n_amenities} buildings per cell selected!")
1041
1042
    # Create synthetic amenities and buildings in cells with only CTS demand
1043
    df_cells_with_cts_demand_only = cells_with_cts_demand_only(
1044
        df_buildings_without_amenities
1045
    )
1046
    log.info("Cells with only demand identified!")
1047
1048
    # Median n Amenities per cell
1049
    df_cells_with_cts_demand_only["amenities"] = median_n_amenities
1050
    # create row for every amenity
1051
    df_cells_with_cts_demand_only["amenities"] = (
1052
        df_cells_with_cts_demand_only["amenities"].astype(int).apply(range)
1053
    )
1054
    df_cells_with_cts_demand_only = df_cells_with_cts_demand_only.explode(
1055
        "amenities"
1056
    )
1057
    df_cells_with_cts_demand_only.drop(columns="amenities", inplace=True)
1058
1059
    # Only 1 Amenity per Building
1060
    df_cells_with_cts_demand_only["n_amenities_inside"] = 1
1061
    df_cells_with_cts_demand_only = place_buildings_with_amenities(
1062
        df_cells_with_cts_demand_only, amenities=1
1063
    )
1064
    df_synthetic_buildings_without_amenities = create_synthetic_buildings(
1065
        df_cells_with_cts_demand_only, points="geom_point"
1066
    )
1067
    log.info(f"{median_n_amenities} synthetic buildings per cell created")
1068
1069
    # TODO write to DB and remove (backup) renaming
1070
    write_table_to_postgis(
1071
        df_synthetic_buildings_without_amenities.rename(
1072
            columns={
1073
                "zensus_population_id": "cell_id",
1074
                "egon_building_id": "id",
1075
            }
1076
        ),
1077
        OsmBuildingsSynthetic,
1078
        drop=False,
1079
    )
1080
    log.info("Synthetic buildings exported to DB")
1081
1082
    # Concat all buildings
1083
    columns = [
1084
        "zensus_population_id",
1085
        "id",
1086
        "geom_building",
1087
        "n_amenities_inside",
1088
        "source",
1089
    ]
1090
1091
    df_buildings_with_amenities["source"] = "bwa"
1092
    df_synthetic_buildings_with_amenities["source"] = "sbwa"
1093
    df_buildings_without_amenities["source"] = "bwoa"
1094
    df_synthetic_buildings_without_amenities["source"] = "sbwoa"
1095
1096
    df_cts_buildings = pd.concat(
1097
        [
1098
            df_buildings_with_amenities[columns],
1099
            df_synthetic_buildings_with_amenities[columns],
1100
            df_buildings_without_amenities[columns],
1101
            df_synthetic_buildings_without_amenities[columns],
1102
        ],
1103
        axis=0,
1104
        ignore_index=True,
1105
    )
1106
    df_cts_buildings = remove_double_bus_id(df_cts_buildings)
1107
1108
    # TODO maybe remove after #772
1109
    df_cts_buildings["id"] = df_cts_buildings["id"].astype(int)
1110
1111
    # Write table to db for debugging
1112
    # TODO remove later
1113
    df_cts_buildings = gpd.GeoDataFrame(
1114
        df_cts_buildings, geometry="geom_building", crs=3035
1115
    )
1116
    df_cts_buildings = df_cts_buildings.reset_index().rename(
1117
        columns={"index": "serial"}
1118
    )
1119
    write_table_to_postgis(
1120
        df_cts_buildings,
1121
        CtsBuildings,
1122
        drop=True,
1123
    )
1124
    log.info("CTS buildings exported to DB!")
1125
1126
1127
def remove_double_bus_id(df_cts_buildings):
1128
1129
    substation_per_building = df_cts_buildings.groupby("id")[
1130
        "bus_id"
1131
    ].nunique()
1132
    building_id = substation_per_building.loc[
1133
        substation_per_building > 1
1134
    ].index
1135
    df_duplicates = df_cts_buildings.loc[
1136
        df_cts_buildings["id"].isin(building_id)
1137
    ]
1138
    for unique_id in df_duplicates["id"].unique():
1139
        drop_index = df_duplicates[df_duplicates["id"] == unique_id].index[0]
1140
        print(
1141
            f"Building droped because of double substation {df_cts_buildings.loc[drop_index, 'id']}"
1142
        )
1143
        df_cts_buildings.drop(index=drop_index, inplace=True)
1144
1145
    return df_cts_buildings
1146
1147
1148 View Code Duplication
def cts_electricity():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1149
    """
1150
    Calculate cts electricity demand share of hvmv substation profile
1151
     for buildings.
1152
    """
1153
    log = start_logging()
1154
    log.info("Start logging!")
1155
    with db.session_scope() as session:
1156
        cells_query = session.query(CtsBuildings)
1157
1158
    df_cts_buildings = pd.read_sql(
1159
        cells_query.statement, cells_query.session.bind, index_col=None
1160
    )
1161
    log.info("CTS buildings from DB imported!")
1162
    df_demand_share_2035 = calc_building_demand_profile_share(
1163
        df_cts_buildings, scenario="eGon2035", sector="electricity"
1164
    )
1165
    log.info("Profile share for egon2035 calculated!")
1166
    df_demand_share_100RE = calc_building_demand_profile_share(
1167
        df_cts_buildings, scenario="eGon100RE", sector="electricity"
1168
    )
1169
    log.info("Profile share for egon100RE calculated!")
1170
    df_demand_share = pd.concat(
1171
        [df_demand_share_2035, df_demand_share_100RE],
1172
        axis=0,
1173
        ignore_index=True,
1174
    )
1175
1176
    write_table_to_postgres(
1177
        df_demand_share, EgonCtsElectricityDemandBuildingShare, drop=True
1178
    )
1179
    log.info("Profile share exported to DB!")
1180
1181
1182 View Code Duplication
def cts_heat():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1183
    """
1184
    Calculate cts electricity demand share of hvmv substation profile
1185
     for buildings.
1186
    """
1187
    log = start_logging()
1188
    log.info("Start logging!")
1189
    with db.session_scope() as session:
1190
        cells_query = session.query(CtsBuildings)
1191
1192
    df_cts_buildings = pd.read_sql(
1193
        cells_query.statement, cells_query.session.bind, index_col=None
1194
    )
1195
    log.info("CTS buildings from DB imported!")
1196
1197
    df_demand_share_2035 = calc_building_demand_profile_share(
1198
        df_cts_buildings, scenario="eGon2035", sector="heat"
1199
    )
1200
    log.info("Profile share for egon2035 calculated!")
1201
    df_demand_share_100RE = calc_building_demand_profile_share(
1202
        df_cts_buildings, scenario="eGon100RE", sector="heat"
1203
    )
1204
    log.info("Profile share for egon100RE calculated!")
1205
    df_demand_share = pd.concat(
1206
        [df_demand_share_2035, df_demand_share_100RE],
1207
        axis=0,
1208
        ignore_index=True,
1209
    )
1210
1211
    write_table_to_postgres(
1212
        df_demand_share, EgonCtsHeatDemandBuildingShare, drop=True
1213
    )
1214
    log.info("Profile share exported to DB!")
1215
1216
1217
def get_cts_electricity_peak_load():
1218
    """
1219
    Get peak load of all CTS buildings for both scenarios and store in DB.
1220
    """
1221
    log = start_logging()
1222
    log.info("Start logging!")
1223
    # Delete rows with cts demand
1224
    with db.session_scope() as session:
1225
        session.query(BuildingPeakLoads).filter(
1226
            BuildingPeakLoads.sector == "cts"
1227
        ).delete()
1228
    log.info("CTS Peak load removed from DB!")
1229
1230
    for scenario in ["eGon2035", "eGon100RE"]:
1231
1232
        with db.session_scope() as session:
1233
            cells_query = session.query(
1234
                EgonCtsElectricityDemandBuildingShare
1235
            ).filter(
1236
                EgonCtsElectricityDemandBuildingShare.scenario == scenario
1237
            )
1238
1239
        df_demand_share = pd.read_sql(
1240
            cells_query.statement, cells_query.session.bind, index_col=None
1241
        )
1242
1243
        df_cts_profiles = calc_load_curves_cts(scenario=scenario)
1244
1245
        df_peak_load = pd.merge(
1246
            left=df_cts_profiles.max(axis=0).astype(float).rename("max"),
1247
            right=df_demand_share,
1248
            left_on="bus_id",
1249
            right_on="bus_id",
1250
        )
1251
1252
        # Convert unit from MWh to W
1253
        df_peak_load["max"] = df_peak_load["max"] * 1e6
1254
        df_peak_load["peak_load_in_w"] = (
1255
            df_peak_load["max"] * df_peak_load["profile_share"]
1256
        )
1257
        log.info(f"Peak load for {scenario} determined!")
1258
1259
        df_peak_load.rename(columns={"id": "building_id"}, inplace=True)
1260
        df_peak_load["sector"] = "cts"
1261
1262
        df_peak_load = df_peak_load[
1263
            ["building_id", "sector", "scenario", "peak_load_in_w"]
1264
        ]
1265
1266
        # Write peak loads into db
1267
        with db.session_scope() as session:
1268
            session.bulk_insert_mappings(
1269
                BuildingPeakLoads,
1270
                df_peak_load.to_dict(orient="records"),
1271
            )
1272
        log.info(f"Peak load for {scenario} exported to DB!")
1273
1274
1275
class CtsElectricityBuildings(Dataset):
1276
    def __init__(self, dependencies):
1277
        super().__init__(
1278
            name="CtsElectricityBuildings",
1279
            version="0.0.0",
1280
            dependencies=dependencies,
1281
            tasks=(
1282
                cts_buildings,
1283
                {cts_electricity, cts_heat},
1284
                get_cts_electricity_peak_load,
1285
            ),
1286
        )
1287