Passed
Pull Request — dev (#934)
by
unknown
01:31
created

data.datasets.power_plants.pv_rooftop_buildings   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 2649
Duplicated Lines 1.21 %

Importance

Changes 0
Metric Value
wmc 115
eloc 1189
dl 32
loc 2649
rs 0.844
c 0
b 0
f 0

52 Functions

Rating   Name   Duplication   Size   Complexity  
A synthetic_buildings() 0 25 2
A geocoding_data() 0 18 1
A geocoded_data_from_db() 0 23 2
A add_ags_to_gens() 0 24 1
A drop_buildings_outside_muns() 0 25 1
B clean_mastr_data() 0 143 2
A mastr_data() 0 60 2
A geocode_data() 0 46 3
B zip_and_municipality_from_standort() 0 50 7
A merge_geocode_with_mastr() 0 27 1
A municipality_data() 0 14 2
A osm_buildings() 0 25 2
A add_ags_to_buildings() 0 24 1
A timer_func() 0 13 1
A create_geocoded_table() 0 17 1
A load_mastr_data() 0 35 1
A drop_gens_outside_muns() 0 25 1
A geocoder() 0 22 1
A drop_invalid_entries_from_gdf() 0 25 1
A most_plausible() 0 41 4
A egon_building_peak_loads() 0 9 1
A load_building_data() 0 65 3
A get_probability_for_property() 0 46 2
A calculate_building_load_factor() 0 30 1
A determine_end_of_life_gens() 0 28 1
A add_overlay_id_to_buildings() 0 31 1
A cap_per_bus_id() 0 25 1
B desaggregate_pv_in_mv_grid() 0 161 5
A calculate_max_pv_cap_per_building() 0 49 1
A scenario_data() 0 30 2
A drop_buildings_outside_grids() 0 25 1
C building_area_range_per_cap_range() 0 87 8
B allocate_pv() 0 122 6
A sort_and_qcut_df() 0 29 1
A federal_state_data() 0 24 2
A mean_load_factor_per_cap_range() 0 38 3
A grid_districts() 0 32 1
A frame_to_numeric() 0 20 4
A allocate_to_buildings() 0 32 1
A overlay_grid_districts_with_counties() 0 33 1
A validate_output() 0 50 1
A cap_share_per_cap_range() 0 43 3
A drop_unallocated_gens() 0 27 1
B probabilities() 0 57 5
A add_start_up_date() 0 30 1
B add_voltage_level() 0 31 6
A add_buildings_meta_data() 0 55 4
A geocode_mastr_data() 0 28 1
B desaggregate_pv() 0 131 6
A pv_rooftop_to_buildings() 0 47 2
B allocate_scenarios() 0 120 1
A create_scenario_table() 0 17 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.power_plants.pv_rooftop_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate new
3
PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
Data cleaning: Drop duplicates and entries with missing critical data. Determine most
5
plausible capacity from multiple values given in MaStR data. Drop generators which don't
6
have any plausible capacity data (23.5MW > P > 0.1). Randomly and weighted add a
7
start-up date if it is missing. Extract zip and municipality from 'Standort' given in
8
MaStR data. Geocode unique zip and municipality combinations with Nominatim (1sec
9
delay). Drop generators for which geocoding failed or which are located outside the
10
municipalities of Germany. Add some visual sanity checks for cleaned data.
11
Allocation of MaStR data: Allocate each generator to an existing building from OSM.
12
Determine the quantile each generator and building is in depending on the capacity of
13
the generator and the area of the polygon of the building. Randomly distribute
14
generators within each municipality preferably within the same building area quantile as
15
the generators are capacity wise. If not enough buildings exists within a municipality
16
and quantile additional buildings from other quantiles are chosen randomly.
17
Desegregation of pv rooftop scenarios: The scenario data per federal state is linear
18
distributed to the mv grid districts according to the pv rooftop potential per mv grid
19
district. The rooftop potential is estimated from the building area given from the OSM
20
buildings. Grid districts, which are located in several federal states, are allocated PV
21
capacity according to their respective roof potential in the individual federal states.
22
The desegregation of PV plants within a grid districts respects existing plants from
23
MaStR, which did not reach their end of life. New PV plants are randomly and weighted
24
generated using a breakdown of MaStR data as generator basis. Plant metadata (e.g. plant
25
orientation) is also added random and weighted from MaStR data as basis.
26
"""
27
from __future__ import annotations
28
29
from collections import Counter
30
from functools import wraps
31
from pathlib import Path
32
from time import perf_counter
33
from typing import Any
34
35
from geoalchemy2 import Geometry
36
from geopy.extra.rate_limiter import RateLimiter
37
from geopy.geocoders import Nominatim
38
from loguru import logger
39
from numpy.random import RandomState, default_rng
40
from pyproj.crs.crs import CRS
41
from sqlalchemy import BigInteger, Column, Float, Integer, String
42
from sqlalchemy.dialects.postgresql import HSTORE
43
from sqlalchemy.ext.declarative import declarative_base
44
import geopandas as gpd
45
import numpy as np
46
import pandas as pd
47
48
from egon.data import config, db
49
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
50
    OsmBuildingsSynthetic,
51
)
52
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
53
from egon.data.datasets.zensus_vg250 import Vg250Gem
54
55
engine = db.engine()
56
Base = declarative_base()
57
SEED = int(config.settings()["egon-data"]["--random-seed"])
58
59
# TODO: move to yml
60
# mastr data
61
MASTR_RELEVANT_COLS = [
62
    "EinheitMastrNummer",
63
    "Bruttoleistung",
64
    "StatisikFlag",
65
    "Bruttoleistung_extended",
66
    "Nettonennleistung",
67
    "InstallierteLeistung",
68
    "zugeordneteWirkleistungWechselrichter",
69
    "EinheitBetriebsstatus",
70
    "Standort",
71
    "Bundesland",
72
    "Land",
73
    "Landkreis",
74
    "Gemeinde",
75
    "Postleitzahl",
76
    "Ort",
77
    "GeplantesInbetriebnahmedatum",
78
    "Inbetriebnahmedatum",
79
    "GemeinsamerWechselrichterMitSpeicher",
80
    "Lage",
81
    "Leistungsbegrenzung",
82
    "EinheitlicheAusrichtungUndNeigungswinkel",
83
    "Hauptausrichtung",
84
    "HauptausrichtungNeigungswinkel",
85
    "Nebenausrichtung",
86
]
87
88
MASTR_DTYPES = {
89
    "EinheitMastrNummer": str,
90
    "Bruttoleistung": float,
91
    "StatisikFlag": str,
92
    "Bruttoleistung_extended": float,
93
    "Nettonennleistung": float,
94
    "InstallierteLeistung": float,
95
    "zugeordneteWirkleistungWechselrichter": float,
96
    "EinheitBetriebsstatus": str,
97
    "Standort": str,
98
    "Bundesland": str,
99
    "Land": str,
100
    "Landkreis": str,
101
    "Gemeinde": str,
102
    # "Postleitzahl": int,  # fails because of nan values
103
    "Ort": str,
104
    "GemeinsamerWechselrichterMitSpeicher": str,
105
    "Lage": str,
106
    "Leistungsbegrenzung": str,
107
    # this will parse nan values as false wich is not always correct
108
    # "EinheitlicheAusrichtungUndNeigungswinkel": bool,
109
    "Hauptausrichtung": str,
110
    "HauptausrichtungNeigungswinkel": str,
111
    "Nebenausrichtung": str,
112
    "NebenausrichtungNeigungswinkel": str,
113
}
114
115
MASTR_PARSE_DATES = [
116
    "GeplantesInbetriebnahmedatum",
117
    "Inbetriebnahmedatum",
118
]
119
120
MASTR_INDEX_COL = "EinheitMastrNummer"
121
122
EPSG = 4326
123
SRID = 3035
124
125
# data cleaning
126
MAX_REALISTIC_PV_CAP = 23500
127
MIN_REALISTIC_PV_CAP = 0.1
128
ROUNDING = 1
129
130
# geopy
131
MIN_DELAY_SECONDS = 1
132
USER_AGENT = "rli_kh_geocoder"
133
134
# show additional logging information
135
VERBOSE = False
136
137
EXPORT_DIR = Path(__name__).resolve().parent / "data"
138
EXPORT_FILE = "mastr_geocoded.gpkg"
139
EXPORT_PATH = EXPORT_DIR / EXPORT_FILE
140
DRIVER = "GPKG"
141
142
# Number of quantiles
143
Q = 5
144
145
# Scenario Data
146
CARRIER = "solar_rooftop"
147
SCENARIOS = ["eGon2035"]  # , "eGon100RE"]
148
SCENARIO_TIMESTAMP = {
149
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
150
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
151
}
152
PV_ROOFTOP_LIFETIME = pd.Timedelta(30 * 365, unit="D")
153
154
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
155
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
156
MODUL_CAP = 0.4  # kWp
157
MODUL_SIZE = 1.096 * 1.754  # m²
158
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
159
160
# Estimation of usable roof area
161
# Factor for the conversion of building area to roof area
162
# estimation mean roof pitch: 35°
163
# estimation usable roof share: 80%
164
# estimation that only the south side of the building is used for pv
165
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
166
# AREA_FACTOR = 1.221
167
# USABLE_ROOF_SHARE = 0.8
168
# SOUTH_SHARE = 0.5
169
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
170
ROOF_FACTOR = 0.5
171
172
CAP_RANGES = [
173
    (0, 30),
174
    (30, 100),
175
    (100, float("inf")),
176
]
177
178
MIN_BUILDING_SIZE = 10.0
179
UPPER_QUNATILE = 0.95
180
LOWER_QUANTILE = 0.05
181
182
COLS_TO_RENAME = {
183
    "EinheitlicheAusrichtungUndNeigungswinkel": (
184
        "einheitliche_ausrichtung_und_neigungswinkel"
185
    ),
186
    "Hauptausrichtung": "hauptausrichtung",
187
    "HauptausrichtungNeigungswinkel": "hauptausrichtung_neigungswinkel",
188
}
189
190
COLS_TO_EXPORT = [
191
    "scenario",
192
    "building_id",
193
    "gens_id",
194
    "capacity",
195
    "einheitliche_ausrichtung_und_neigungswinkel",
196
    "hauptausrichtung",
197
    "hauptausrichtung_neigungswinkel",
198
    "voltage_level",
199
]
200
201
INCLUDE_SYNTHETIC_BUILDINGS = False
202
ONLY_BUILDINGS_WITH_DEMAND = True
203
204
205
def timer_func(func):
206
    @wraps(func)
207
    def timeit_wrapper(*args, **kwargs):
208
        start_time = perf_counter()
209
        result = func(*args, **kwargs)
210
        end_time = perf_counter()
211
        total_time = end_time - start_time
212
        logger.debug(
213
            f"Function {func.__name__} took {total_time:.4f} seconds."
214
        )
215
        return result
216
217
    return timeit_wrapper
218
219
220
@timer_func
221
def mastr_data(
222
    index_col: str | int | list[str] | list[int],
223
    usecols: list[str],
224
    dtype: dict[str, Any] | None,
225
    parse_dates: list[str] | None,
226
) -> pd.DataFrame:
227
    """
228
    Read MaStR data from csv.
229
230
    Parameters
231
    -----------
232
    index_col : str, int or list of str or int
233
        Column(s) to use as the row labels of the DataFrame.
234
    usecols : list of str
235
        Return a subset of the columns.
236
    dtype : dict of column (str) -> type (any), optional
237
        Data type for data or columns.
238
    parse_dates : list of names (str), optional
239
        Try to parse given columns to datetime.
240
    Returns
241
    -------
242
    pandas.DataFrame
243
        DataFrame containing MaStR data.
244
    """
245
    mastr_path = Path(
246
        config.datasets()["power_plants"]["sources"]["mastr_pv"]
247
    ).resolve()
248
249
    mastr_df = pd.read_csv(
250
        mastr_path,
251
        index_col=index_col,
252
        usecols=usecols,
253
        dtype=dtype,
254
        parse_dates=parse_dates,
255
    )
256
257
    mastr_df = mastr_df.loc[
258
        (mastr_df.StatisikFlag == "B")
259
        & (mastr_df.EinheitBetriebsstatus == "InBetrieb")
260
        & (mastr_df.Land == "Deutschland")
261
        & (mastr_df.Lage == "BaulicheAnlagen")
262
    ]
263
264
    if (
265
        config.settings()["egon-data"]["--dataset-boundary"]
266
        == "Schleswig-Holstein"
267
    ):
268
        init_len = len(mastr_df)
269
270
        mastr_df = mastr_df.loc[mastr_df.Bundesland == "SchleswigHolstein"]
271
272
        logger.info(
273
            f"Using only MaStR data within Schleswig-Holstein. "
274
            f"{init_len - len(mastr_df)} of {init_len} generators are dropped."
275
        )
276
277
    logger.debug("MaStR data loaded.")
278
279
    return mastr_df
280
281
282
@timer_func
283
def clean_mastr_data(
284
    mastr_df: pd.DataFrame,
285
    max_realistic_pv_cap: int | float,
286
    min_realistic_pv_cap: int | float,
287
    rounding: int,
288
    seed: int,
289
) -> pd.DataFrame:
290
    """
291
    Clean the MaStR data from implausible data.
292
293
    * Drop MaStR ID duplicates.
294
    * Drop generators with implausible capacities.
295
    * Drop generators without any kind of start-up date.
296
    * Clean up Standort column and capacity.
297
298
    Parameters
299
    -----------
300
    mastr_df : pandas.DataFrame
301
        DataFrame containing MaStR data.
302
    max_realistic_pv_cap : int or float
303
        Maximum capacity, which is considered to be realistic.
304
    min_realistic_pv_cap : int or float
305
        Minimum capacity, which is considered to be realistic.
306
    rounding : int
307
        Rounding to use when cleaning up capacity. E.g. when
308
        rounding is 1 a capacity of 9.93 will be rounded to 9.9.
309
    seed : int
310
        Seed to use for random operations with NumPy and pandas.
311
    Returns
312
    -------
313
    pandas.DataFrame
314
        DataFrame containing cleaned MaStR data.
315
    """
316
    init_len = len(mastr_df)
317
318
    # drop duplicates
319
    mastr_df = mastr_df.loc[~mastr_df.index.duplicated()]
320
321
    # drop invalid entries in standort
322
    index_to_drop = mastr_df.loc[
323
        (mastr_df.Standort.isna()) | (mastr_df.Standort.isnull())
324
    ].index
325
326
    mastr_df = mastr_df.loc[~mastr_df.index.isin(index_to_drop)]
327
328
    df = mastr_df[
329
        [
330
            "Bruttoleistung",
331
            "Bruttoleistung_extended",
332
            "Nettonennleistung",
333
            "zugeordneteWirkleistungWechselrichter",
334
            "InstallierteLeistung",
335
        ]
336
    ].round(rounding)
337
338
    # use only the smallest capacity rating if multiple are given
339
    mastr_df = mastr_df.assign(
340
        capacity=[
341
            most_plausible(p_tub, min_realistic_pv_cap)
342
            for p_tub in df.itertuples(index=False)
343
        ]
344
    )
345
346
    # drop generators without any capacity info
347
    # and capacity of zero
348
    # and if the capacity is > 23.5 MW, because
349
    # Germanies largest rooftop PV is 23 MW
350
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
351
    mastr_df = mastr_df.loc[
352
        (~mastr_df.capacity.isna())
353
        & (mastr_df.capacity <= max_realistic_pv_cap)
354
        & (mastr_df.capacity > min_realistic_pv_cap)
355
    ]
356
357
    # get zip and municipality
358
    mastr_df[["zip_and_municipality", "drop_this"]] = pd.DataFrame(
359
        mastr_df.Standort.astype(str)
360
        .apply(
361
            zip_and_municipality_from_standort,
362
            args=(VERBOSE,),
363
        )
364
        .tolist(),
365
        index=mastr_df.index,
366
    )
367
368
    # drop invalid entries
369
    mastr_df = mastr_df.loc[mastr_df.drop_this].drop(columns="drop_this")
370
371
    # add ", Deutschland" just in case
372
    mastr_df = mastr_df.assign(
373
        zip_and_municipality=(mastr_df.zip_and_municipality + ", Deutschland")
374
    )
375
376
    # get consistent start-up date
377
    mastr_df = mastr_df.assign(
378
        start_up_date=mastr_df.Inbetriebnahmedatum,
379
    )
380
381
    mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
382
        mastr_df.start_up_date.isna()
383
    ].assign(
384
        start_up_date=mastr_df.GeplantesInbetriebnahmedatum.loc[
385
            mastr_df.start_up_date.isna()
386
        ]
387
    )
388
389
    # randomly and weighted fill missing start-up dates
390
    pool = mastr_df.loc[
391
        ~mastr_df.start_up_date.isna()
392
    ].start_up_date.to_numpy()
393
394
    size = len(mastr_df) - len(pool)
395
396
    if size > 0:
397
        np.random.seed(seed)
398
399
        choice = np.random.choice(
400
            pool,
401
            size=size,
402
            replace=False,
403
        )
404
405
        mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
406
            mastr_df.start_up_date.isna()
407
        ].assign(start_up_date=choice)
408
409
        logger.info(
410
            f"Randomly and weigthed added start-up date to {size} generators."
411
        )
412
413
    mastr_df = mastr_df.assign(
414
        start_up_date=pd.to_datetime(mastr_df.start_up_date, utc=True)
415
    )
416
417
    end_len = len(mastr_df)
418
    logger.debug(
419
        f"Dropped {init_len - end_len} "
420
        f"({((init_len - end_len) / init_len) * 100:g}%)"
421
        f" of {init_len} rows from MaStR DataFrame."
422
    )
423
424
    return mastr_df
425
426
427
def zip_and_municipality_from_standort(
428
    standort: str,
429
    verbose: bool = False,
430
) -> tuple[str, bool]:
431
    """
432
    Get zip code and municipality from Standort string split into a list.
433
    Parameters
434
    -----------
435
    standort : str
436
        Standort as given from MaStR data.
437
    verbose : bool
438
        Logs additional info if True.
439
    Returns
440
    -------
441
    str
442
        Standort with only the zip code and municipality
443
        as well a ', Germany' added.
444
    """
445
    if verbose:
446
        logger.debug(f"Uncleaned String: {standort}")
447
448
    standort_list = standort.split()
449
450
    found = False
451
    count = 0
452
453
    for count, elem in enumerate(standort_list):
454
        if len(elem) != 5:
455
            continue
456
        if not elem.isnumeric():
457
            continue
458
459
        found = True
460
461
        break
462
463
    if found:
464
        cleaned_str = " ".join(standort_list[count:])
465
466
        if verbose:
467
            logger.debug(f"Cleaned String:   {cleaned_str}")
468
469
        return cleaned_str, found
470
471
    logger.warning(
472
        "Couldn't identify zip code. This entry will be dropped."
473
        f" Original standort: {standort}."
474
    )
475
476
    return standort, found
477
478
479
def most_plausible(
480
    p_tub: tuple,
481
    min_realistic_pv_cap: int | float,
482
) -> float:
483
    """
484
    Try to determine the most plausible capacity.
485
    Try to determine the most plausible capacity from a given
486
    generator from MaStR data.
487
    Parameters
488
    -----------
489
    p_tub : tuple
490
        Tuple containing the different capacities given in
491
        the MaStR data.
492
    min_realistic_pv_cap : int or float
493
        Minimum capacity, which is considered to be realistic.
494
    Returns
495
    -------
496
    float
497
        Capacity of the generator estimated as the most realistic.
498
    """
499
    count = Counter(p_tub).most_common(3)
500
501
    if len(count) == 1:
502
        return count[0][0]
503
504
    val1 = count[0][0]
505
    val2 = count[1][0]
506
507
    if len(count) == 2:
508
        min_val = min(val1, val2)
509
        max_val = max(val1, val2)
510
    else:
511
        val3 = count[2][0]
512
513
        min_val = min(val1, val2, val3)
514
        max_val = max(val1, val2, val3)
515
516
    if min_val < min_realistic_pv_cap:
517
        return max_val
518
519
    return min_val
520
521
522
def geocoder(
523
    user_agent: str,
524
    min_delay_seconds: int,
525
) -> RateLimiter:
526
    """
527
    Setup Nominatim geocoding class.
528
    Parameters
529
    -----------
530
    user_agent : str
531
        The app name.
532
    min_delay_seconds : int
533
        Delay in seconds to use between requests to Nominatim.
534
        A minimum of 1 is advised.
535
    Returns
536
    -------
537
    geopy.extra.rate_limiter.RateLimiter
538
        Nominatim RateLimiter geocoding class to use for geocoding.
539
    """
540
    locator = Nominatim(user_agent=user_agent)
541
    return RateLimiter(
542
        locator.geocode,
543
        min_delay_seconds=min_delay_seconds,
544
    )
545
546
547
def geocoding_data(
548
    clean_mastr_df: pd.DataFrame,
549
) -> pd.DataFrame:
550
    """
551
    Setup DataFrame to geocode.
552
    Parameters
553
    -----------
554
    clean_mastr_df : pandas.DataFrame
555
        DataFrame containing cleaned MaStR data.
556
    Returns
557
    -------
558
    pandas.DataFrame
559
        DataFrame containing all unique combinations of
560
        zip codes with municipalities for geocoding.
561
    """
562
    return pd.DataFrame(
563
        data=clean_mastr_df.zip_and_municipality.unique(),
564
        columns=["zip_and_municipality"],
565
    )
566
567
568
@timer_func
569
def geocode_data(
570
    geocoding_df: pd.DataFrame,
571
    ratelimiter: RateLimiter,
572
    epsg: int,
573
) -> gpd.GeoDataFrame:
574
    """
575
    Geocode zip code and municipality.
576
    Extract latitude, longitude and altitude.
577
    Transfrom latitude and longitude to shapely
578
    Point and return a geopandas GeoDataFrame.
579
    Parameters
580
    -----------
581
    geocoding_df : pandas.DataFrame
582
        DataFrame containing all unique combinations of
583
        zip codes with municipalities for geocoding.
584
    ratelimiter : geopy.extra.rate_limiter.RateLimiter
585
        Nominatim RateLimiter geocoding class to use for geocoding.
586
    epsg : int
587
        EPSG ID to use as CRS.
588
    Returns
589
    -------
590
    geopandas.GeoDataFrame
591
        GeoDataFrame containing all unique combinations of
592
        zip codes with municipalities with matching geolocation.
593
    """
594
    logger.info(f"Geocoding {len(geocoding_df)} locations.")
595
596
    geocode_df = geocoding_df.assign(
597
        location=geocoding_df.zip_and_municipality.apply(ratelimiter)
598
    )
599
600
    geocode_df = geocode_df.assign(
601
        point=geocode_df.location.apply(
602
            lambda loc: tuple(loc.point) if loc else None
603
        )
604
    )
605
606
    geocode_df[["latitude", "longitude", "altitude"]] = pd.DataFrame(
607
        geocode_df.point.tolist(), index=geocode_df.index
608
    )
609
610
    return gpd.GeoDataFrame(
611
        geocode_df,
612
        geometry=gpd.points_from_xy(geocode_df.longitude, geocode_df.latitude),
613
        crs=f"EPSG:{epsg}",
614
    )
615
616
617
def merge_geocode_with_mastr(
618
    clean_mastr_df: pd.DataFrame, geocode_gdf: gpd.GeoDataFrame
619
) -> gpd.GeoDataFrame:
620
    """
621
    Merge geometry to original mastr data.
622
    Parameters
623
    -----------
624
    clean_mastr_df : pandas.DataFrame
625
        DataFrame containing cleaned MaStR data.
626
    geocode_gdf : geopandas.GeoDataFrame
627
        GeoDataFrame containing all unique combinations of
628
        zip codes with municipalities with matching geolocation.
629
    Returns
630
    -------
631
    gepandas.GeoDataFrame
632
        GeoDataFrame containing cleaned MaStR data with
633
        matching geolocation from geocoding.
634
    """
635
    return gpd.GeoDataFrame(
636
        clean_mastr_df.merge(
637
            geocode_gdf[["zip_and_municipality", "geometry"]],
638
            how="left",
639
            left_on="zip_and_municipality",
640
            right_on="zip_and_municipality",
641
        ),
642
        crs=geocode_gdf.crs,
643
    ).set_index(clean_mastr_df.index)
644
645
646
def drop_invalid_entries_from_gdf(
647
    gdf: gpd.GeoDataFrame,
648
) -> gpd.GeoDataFrame:
649
    """
650
    Drop invalid entries from geopandas GeoDataFrame.
651
    TODO: how to omit the logging from geos here???
652
    Parameters
653
    -----------
654
    gdf : geopandas.GeoDataFrame
655
        GeoDataFrame to be checked for validity.
656
    Returns
657
    -------
658
    gepandas.GeoDataFrame
659
        GeoDataFrame with rows with invalid geometries
660
        dropped.
661
    """
662
    valid_gdf = gdf.loc[gdf.is_valid]
663
664
    logger.debug(
665
        f"{len(gdf) - len(valid_gdf)} "
666
        f"({(len(gdf) - len(valid_gdf)) / len(gdf) * 100:g}%) "
667
        f"of {len(gdf)} values were invalid and are dropped."
668
    )
669
670
    return valid_gdf
671
672
673
@timer_func
674
def municipality_data() -> gpd.GeoDataFrame:
675
    """
676
    Get municipality data from eGo^n Database.
677
    Returns
678
    -------
679
    gepandas.GeoDataFrame
680
        GeoDataFrame with municipality data.
681
    """
682
    with db.session_scope() as session:
683
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
684
685
    return gpd.read_postgis(
686
        query.statement, query.session.bind, index_col="ags"
687
    )
688
689
690
@timer_func
691
def add_ags_to_gens(
692
    valid_mastr_gdf: gpd.GeoDataFrame,
693
    municipalities_gdf: gpd.GeoDataFrame,
694
) -> gpd.GeoDataFrame:
695
    """
696
    Add information about AGS ID to generators.
697
    Parameters
698
    -----------
699
    valid_mastr_gdf : geopandas.GeoDataFrame
700
        GeoDataFrame with valid and cleaned MaStR data.
701
    municipalities_gdf : geopandas.GeoDataFrame
702
        GeoDataFrame with municipality data.
703
    Returns
704
    -------
705
    gepandas.GeoDataFrame
706
        GeoDataFrame with valid and cleaned MaStR data
707
        with AGS ID added.
708
    """
709
    return valid_mastr_gdf.sjoin(
710
        municipalities_gdf,
711
        how="left",
712
        predicate="intersects",
713
    ).rename(columns={"index_right": "ags"})
714
715
716
def drop_gens_outside_muns(
717
    valid_mastr_gdf: gpd.GeoDataFrame,
718
) -> gpd.GeoDataFrame:
719
    """
720
    Drop all generators outside of municipalities.
721
    Parameters
722
    -----------
723
    valid_mastr_gdf : geopandas.GeoDataFrame
724
        GeoDataFrame with valid and cleaned MaStR data.
725
    Returns
726
    -------
727
    gepandas.GeoDataFrame
728
        GeoDataFrame with valid and cleaned MaStR data
729
        with generatos without an AGS ID dropped.
730
    """
731
    gdf = valid_mastr_gdf.loc[~valid_mastr_gdf.ags.isna()]
732
733
    logger.debug(
734
        f"{len(valid_mastr_gdf) - len(gdf)} "
735
        f"({(len(valid_mastr_gdf) - len(gdf)) / len(valid_mastr_gdf) * 100:g}%) "
736
        f"of {len(valid_mastr_gdf)} values are outside of the municipalities"
737
        " and are therefore dropped."
738
    )
739
740
    return gdf
741
742
743
class EgonMastrPvRoofGeocoded(Base):
744
    __tablename__ = "egon_mastr_pv_roof_geocoded"
745
    __table_args__ = {"schema": "supply"}
746
747
    zip_and_municipality = Column(String, primary_key=True, index=True)
748
    location = Column(String)
749
    point = Column(String)
750
    latitude = Column(Float)
751
    longitude = Column(Float)
752
    altitude = Column(Float)
753
    geometry = Column(Geometry(srid=EPSG))
754
755
756
def create_geocoded_table(geocode_gdf):
757
    """
758
    Create geocoded table mastr pv rooftop
759
    Parameters
760
    -----------
761
    geocode_gdf : geopandas.GeoDataFrame
762
        GeoDataFrame containing geocoding information on pv rooftop locations.
763
    """
764
    EgonMastrPvRoofGeocoded.__table__.drop(bind=engine, checkfirst=True)
765
    EgonMastrPvRoofGeocoded.__table__.create(bind=engine, checkfirst=True)
766
767
    geocode_gdf.to_postgis(
768
        name=EgonMastrPvRoofGeocoded.__table__.name,
769
        schema=EgonMastrPvRoofGeocoded.__table__.schema,
770
        con=db.engine(),
771
        if_exists="append",
772
        index=False,
773
        # dtype={}
774
    )
775
776
777
def geocoded_data_from_db(
778
    epsg: str | int,
779
) -> gpd.GeoDataFrame:
780
    """
781
    Read OSM buildings data from eGo^n Database.
782
    Parameters
783
    -----------
784
    to_crs : pyproj.crs.crs.CRS
785
        CRS to transform geometries to.
786
    Returns
787
    -------
788
    geopandas.GeoDataFrame
789
        GeoDataFrame containing OSM buildings data.
790
    """
791
    with db.session_scope() as session:
792
        query = session.query(
793
            EgonMastrPvRoofGeocoded.zip_and_municipality,
794
            EgonMastrPvRoofGeocoded.geometry,
795
        )
796
797
    return gpd.read_postgis(
798
        query.statement, query.session.bind, geom_col="geometry"
799
    ).to_crs(f"EPSG:{epsg}")
800
801
802
def load_mastr_data():
803
    """Read PV rooftop data from MaStR CSV
804
    Note: the source will be replaced as soon as the MaStR data is available
805
    in DB.
806
    Returns
807
    -------
808
    geopandas.GeoDataFrame
809
        GeoDataFrame containing MaStR data with geocoded locations.
810
    """
811
    mastr_df = mastr_data(
812
        MASTR_INDEX_COL,
813
        MASTR_RELEVANT_COLS,
814
        MASTR_DTYPES,
815
        MASTR_PARSE_DATES,
816
    )
817
818
    clean_mastr_df = clean_mastr_data(
819
        mastr_df,
820
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
821
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
822
        seed=SEED,
823
        rounding=ROUNDING,
824
    )
825
826
    geocode_gdf = geocoded_data_from_db(EPSG)
827
828
    mastr_gdf = merge_geocode_with_mastr(clean_mastr_df, geocode_gdf)
829
830
    valid_mastr_gdf = drop_invalid_entries_from_gdf(mastr_gdf)
831
832
    municipalities_gdf = municipality_data()
833
834
    valid_mastr_gdf = add_ags_to_gens(valid_mastr_gdf, municipalities_gdf)
835
836
    return drop_gens_outside_muns(valid_mastr_gdf)
837
838
839
class OsmBuildingsFiltered(Base):
840
    __tablename__ = "osm_buildings_filtered"
841
    __table_args__ = {"schema": "openstreetmap"}
842
843
    osm_id = Column(BigInteger)
844
    amenity = Column(String)
845
    building = Column(String)
846
    name = Column(String)
847
    geom = Column(Geometry(srid=SRID), index=True)
848
    area = Column(Float)
849
    geom_point = Column(Geometry(srid=SRID), index=True)
850
    tags = Column(HSTORE)
851
    id = Column(BigInteger, primary_key=True, index=True)
852
853
854
@timer_func
855
def osm_buildings(
856
    to_crs: CRS,
857
) -> gpd.GeoDataFrame:
858
    """
859
    Read OSM buildings data from eGo^n Database.
860
    Parameters
861
    -----------
862
    to_crs : pyproj.crs.crs.CRS
863
        CRS to transform geometries to.
864
    Returns
865
    -------
866
    geopandas.GeoDataFrame
867
        GeoDataFrame containing OSM buildings data.
868
    """
869
    with db.session_scope() as session:
870
        query = session.query(
871
            OsmBuildingsFiltered.id,
872
            OsmBuildingsFiltered.area,
873
            OsmBuildingsFiltered.geom_point.label("geom"),
874
        )
875
876
    return gpd.read_postgis(
877
        query.statement, query.session.bind, index_col="id"
878
    ).to_crs(to_crs)
879
880
881
@timer_func
882
def synthetic_buildings(
883
    to_crs: CRS,
884
) -> gpd.GeoDataFrame:
885
    """
886
    Read synthetic buildings data from eGo^n Database.
887
    Parameters
888
    -----------
889
    to_crs : pyproj.crs.crs.CRS
890
        CRS to transform geometries to.
891
    Returns
892
    -------
893
    geopandas.GeoDataFrame
894
        GeoDataFrame containing OSM buildings data.
895
    """
896
    with db.session_scope() as session:
897
        query = session.query(
898
            OsmBuildingsSynthetic.id,
899
            OsmBuildingsSynthetic.area,
900
            OsmBuildingsSynthetic.geom_point.label("geom"),
901
        )
902
903
    return gpd.read_postgis(
904
        query.statement, query.session.bind, index_col="id"
905
    ).to_crs(to_crs)
906
907
908
@timer_func
909
def add_ags_to_buildings(
910
    buildings_gdf: gpd.GeoDataFrame,
911
    municipalities_gdf: gpd.GeoDataFrame,
912
) -> gpd.GeoDataFrame:
913
    """
914
    Add information about AGS ID to buildings.
915
    Parameters
916
    -----------
917
    buildings_gdf : geopandas.GeoDataFrame
918
        GeoDataFrame containing OSM buildings data.
919
    municipalities_gdf : geopandas.GeoDataFrame
920
        GeoDataFrame with municipality data.
921
    Returns
922
    -------
923
    gepandas.GeoDataFrame
924
        GeoDataFrame containing OSM buildings data
925
        with AGS ID added.
926
    """
927
    return buildings_gdf.sjoin(
928
        municipalities_gdf,
929
        how="left",
930
        predicate="intersects",
931
    ).rename(columns={"index_right": "ags"})
932
933
934
def drop_buildings_outside_muns(
935
    buildings_gdf: gpd.GeoDataFrame,
936
) -> gpd.GeoDataFrame:
937
    """
938
    Drop all buildings outside of municipalities.
939
    Parameters
940
    -----------
941
    buildings_gdf : geopandas.GeoDataFrame
942
        GeoDataFrame containing OSM buildings data.
943
    Returns
944
    -------
945
    gepandas.GeoDataFrame
946
        GeoDataFrame containing OSM buildings data
947
        with buildings without an AGS ID dropped.
948
    """
949
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
950
951
    logger.debug(
952
        f"{len(buildings_gdf) - len(gdf)} "
953
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
954
        f"of {len(buildings_gdf)} values are outside of the municipalities "
955
        "and are therefore dropped."
956
    )
957
958
    return gdf
959
960
961
def egon_building_peak_loads():
962
    sql = """
963
    SELECT building_id
964
    FROM demand.egon_building_electricity_peak_loads
965
    WHERE scenario = 'eGon2035'
966
    """
967
968
    return (
969
        db.select_dataframe(sql).building_id.astype(int).sort_values().unique()
970
    )
971
972
973
@timer_func
974
def load_building_data():
975
    """
976
    Read buildings from DB
977
    Tables:
978
979
    * `openstreetmap.osm_buildings_filtered` (from OSM)
980
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
981
982
    Use column `id` for both as it is unique hence you concat both datasets. If
983
    INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be loaded.
984
985
    Returns
986
    -------
987
    gepandas.GeoDataFrame
988
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
989
        dropped.
990
    """
991
992
    municipalities_gdf = municipality_data()
993
994
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
995
996
    if INCLUDE_SYNTHETIC_BUILDINGS:
997
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
998
999
        buildings_gdf = gpd.GeoDataFrame(
1000
            pd.concat(
1001
                [
1002
                    osm_buildings_gdf,
1003
                    synthetic_buildings_gdf,
1004
                ]
1005
            ),
1006
            geometry="geom",
1007
            crs=osm_buildings_gdf.crs,
1008
        ).rename(columns={"area": "building_area"})
1009
1010
    else:
1011
        buildings_gdf = osm_buildings_gdf.rename(
1012
            columns={"area": "building_area"}
1013
        )
1014
1015
    if ONLY_BUILDINGS_WITH_DEMAND:
1016
        building_ids = egon_building_peak_loads()
1017
1018
        init_len = len(building_ids)
1019
1020
        building_ids = np.intersect1d(
1021
            building_ids, buildings_gdf.index.to_numpy()
1022
        )
1023
1024
        end_len = len(building_ids)
1025
1026
        logger.debug(
1027
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) of buildings have "
1028
            f"peak load."
1029
        )
1030
1031
        buildings_gdf = buildings_gdf.loc[building_ids]
1032
1033
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
1034
1035
    logger.debug("Loaded buildings.")
1036
1037
    return drop_buildings_outside_muns(buildings_ags_gdf)
1038
1039
1040
@timer_func
1041
def sort_and_qcut_df(
1042
    df: pd.DataFrame | gpd.GeoDataFrame,
1043
    col: str,
1044
    q: int,
1045
) -> pd.DataFrame | gpd.GeoDataFrame:
1046
    """
1047
    Determine the quantile of a given attribute in a (Geo)DataFrame.
1048
    Sort the (Geo)DataFrame in ascending order for the given attribute.
1049
    Parameters
1050
    -----------
1051
    df : pandas.DataFrame or geopandas.GeoDataFrame
1052
        (Geo)DataFrame to sort and qcut.
1053
    col : str
1054
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
1055
    q : int
1056
        Number of quantiles.
1057
    Returns
1058
    -------
1059
    pandas.DataFrame or gepandas.GeoDataFrame
1060
        Sorted and qcut (Geo)DataFrame.
1061
    """
1062
    df = df.sort_values(col, ascending=True)
1063
1064
    return df.assign(
1065
        quant=pd.qcut(
1066
            df[col],
1067
            q=q,
1068
            labels=range(q),
1069
        )
1070
    )
1071
1072
1073
@timer_func
1074
def allocate_pv(
1075
    q_mastr_gdf: gpd.GeoDataFrame,
1076
    q_buildings_gdf: gpd.GeoDataFrame,
1077
    seed: int,
1078
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1079
    """
1080
    Allocate the MaStR pv generators to the OSM buildings.
1081
    This will determine a building for each pv generator if there are more
1082
    buildings than generators within a given AGS. Primarily generators are
1083
    distributed with the same qunatile as the buildings. Multiple assignment
1084
    is excluded.
1085
    Parameters
1086
    -----------
1087
    q_mastr_gdf : geopandas.GeoDataFrame
1088
        GeoDataFrame containing geocoded and qcut MaStR data.
1089
    q_buildings_gdf : geopandas.GeoDataFrame
1090
        GeoDataFrame containing qcut OSM buildings data.
1091
    seed : int
1092
        Seed to use for random operations with NumPy and pandas.
1093
    Returns
1094
    -------
1095
    tuple with two geopandas.GeoDataFrame s
1096
        GeoDataFrame containing MaStR data allocated to building IDs.
1097
        GeoDataFrame containing building data allocated to MaStR IDs.
1098
    """
1099
    rng = default_rng(seed=seed)
1100
1101
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan)
1102
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan)
1103
1104
    ags_list = q_buildings_gdf.ags.unique()
1105
1106
    num_ags = len(ags_list)
1107
1108
    t0 = perf_counter()
1109
1110
    for count, ags in enumerate(ags_list):
1111
1112
        buildings = q_buildings_gdf.loc[
1113
            (q_buildings_gdf.ags == ags) & (q_buildings_gdf.gens_id.isna())
1114
        ]
1115
        gens = q_mastr_gdf.loc[
1116
            (q_mastr_gdf.ags == ags) & (q_mastr_gdf.building_id.isna())
1117
        ]
1118
1119
        len_build = len(buildings)
1120
        len_gens = len(gens)
1121
1122
        if len_build < len_gens:
1123
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
1124
            logger.error(
1125
                f"There are {len_gens} generators and only {len_build}"
1126
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
1127
                "generators were truncated to match the amount of buildings."
1128
            )
1129
1130
            assert len_build == len(gens)
1131
1132
        for quant in gens.quant.unique():
1133
            q_buildings = buildings.loc[
1134
                (buildings.quant == quant) & (buildings.gens_id.isna())
1135
            ]
1136
            q_gens = gens.loc[gens.quant == quant]
1137
1138
            len_build = len(q_buildings)
1139
            len_gens = len(q_gens)
1140
1141
            if len_build < len_gens:
1142
                delta = len_gens - len_build
1143
1144
                logger.warning(
1145
                    f"There are {len_gens} generators and only {len_build} "
1146
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
1147
                    f"buildings from AGS {ags} will be added randomly."
1148
                )
1149
1150
                add_buildings = pd.Index(
1151
                    rng.choice(
1152
                        buildings.loc[
1153
                            (buildings.quant != quant)
1154
                            & (buildings.gens_id.isna())
1155
                        ].index,
1156
                        size=delta,
1157
                        replace=False,
1158
                    )
1159
                )
1160
1161
                q_buildings = buildings.loc[
1162
                    q_buildings.index.append(add_buildings)
1163
                ]
1164
1165
                assert len(q_buildings) == len_gens
1166
1167
            chosen_buildings = pd.Index(
1168
                rng.choice(
1169
                    q_buildings.index,
1170
                    size=len_gens,
1171
                    replace=False,
1172
                )
1173
            )
1174
1175
            # q_mastr_gdf.loc[q_gens.index, "building_id"] = chosen_buildings
1176
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
1177
1178
        if count % 100 == 0:
1179
            logger.debug(
1180
                f"Allocation of {count / num_ags * 100:g} % of AGS done. It took "
1181
                f"{perf_counter() - t0:g} seconds."
1182
            )
1183
1184
            t0 = perf_counter()
1185
1186
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
1187
1188
    q_mastr_gdf.loc[
1189
        assigned_buildings.gens_id, "building_id"
1190
    ] = assigned_buildings.index
1191
1192
    logger.debug("Allocated status quo generators to buildings.")
1193
1194
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
1195
1196
1197
def frame_to_numeric(
1198
    df: pd.DataFrame | gpd.GeoDataFrame,
1199
) -> pd.DataFrame | gpd.GeoDataFrame:
1200
    """
1201
    Try to convert all columns of a DataFrame to numeric ignoring errors.
1202
    Parameters
1203
    ----------
1204
    df : pandas.DataFrame or geopandas.GeoDataFrame
1205
    Returns
1206
    -------
1207
    pandas.DataFrame or geopandas.GeoDataFrame
1208
    """
1209
    if str(df.index.dtype) == "object":
1210
        df.index = pd.to_numeric(df.index, errors="ignore")
1211
1212
    for col in df.columns:
1213
        if str(df[col].dtype) == "object":
1214
            df[col] = pd.to_numeric(df[col], errors="ignore")
1215
1216
    return df
1217
1218
1219
def validate_output(
1220
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
1221
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
1222
) -> None:
1223
    """
1224
    Validate output.
1225
1226
    * Validate that there are exactly as many buildings with a pv system as there are
1227
      pv systems with a building
1228
    * Validate that the building IDs with a pv system are the same building IDs as
1229
      assigned to the pv systems
1230
    * Validate that the pv system IDs with a building are the same pv system IDs as
1231
      assigned to the buildings
1232
1233
    Parameters
1234
    -----------
1235
    desagg_mastr_gdf : geopandas.GeoDataFrame
1236
        GeoDataFrame containing MaStR data allocated to building IDs.
1237
    desagg_buildings_gdf : geopandas.GeoDataFrame
1238
        GeoDataFrame containing building data allocated to MaStR IDs.
1239
    """
1240
    assert len(
1241
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
1242
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
1243
    assert (
1244
        np.sort(
1245
            desagg_mastr_gdf.loc[
1246
                ~desagg_mastr_gdf.building_id.isna()
1247
            ].building_id.unique()
1248
        )
1249
        == np.sort(
1250
            desagg_buildings_gdf.loc[
1251
                ~desagg_buildings_gdf.gens_id.isna()
1252
            ].index.unique()
1253
        )
1254
    ).all()
1255
    assert (
1256
        np.sort(
1257
            desagg_mastr_gdf.loc[
1258
                ~desagg_mastr_gdf.building_id.isna()
1259
            ].index.unique()
1260
        )
1261
        == np.sort(
1262
            desagg_buildings_gdf.loc[
1263
                ~desagg_buildings_gdf.gens_id.isna()
1264
            ].gens_id.unique()
1265
        )
1266
    ).all()
1267
1268
    logger.debug("Validated output.")
1269
1270
1271
def drop_unallocated_gens(
1272
    gdf: gpd.GeoDataFrame,
1273
) -> gpd.GeoDataFrame:
1274
    """
1275
    Drop generators which did not get allocated.
1276
1277
    Parameters
1278
    -----------
1279
    gdf : geopandas.GeoDataFrame
1280
        GeoDataFrame containing MaStR data allocated to building IDs.
1281
    Returns
1282
    -------
1283
    geopandas.GeoDataFrame
1284
        GeoDataFrame containing MaStR data with generators dropped which did not get
1285
        allocated.
1286
    """
1287
    init_len = len(gdf)
1288
    gdf = gdf.loc[~gdf.building_id.isna()]
1289
    end_len = len(gdf)
1290
1291
    logger.debug(
1292
        f"Dropped {init_len - end_len} "
1293
        f"({((init_len - end_len) / init_len) * 100:g}%)"
1294
        f" of {init_len} unallocated rows from MaStR DataFrame."
1295
    )
1296
1297
    return gdf
1298
1299
1300
@timer_func
1301
def allocate_to_buildings(
1302
    mastr_gdf: gpd.GeoDataFrame,
1303
    buildings_gdf: gpd.GeoDataFrame,
1304
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1305
    """
1306
    Allocate status quo pv rooftop generators to buildings.
1307
    Parameters
1308
    -----------
1309
    mastr_gdf : geopandas.GeoDataFrame
1310
        GeoDataFrame containing MaStR data with geocoded locations.
1311
    buildings_gdf : geopandas.GeoDataFrame
1312
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
1313
        dropped.
1314
    Returns
1315
    -------
1316
    tuple with two geopandas.GeoDataFrame s
1317
        GeoDataFrame containing MaStR data allocated to building IDs.
1318
        GeoDataFrame containing building data allocated to MaStR IDs.
1319
    """
1320
    logger.debug("Starting allocation of status quo.")
1321
1322
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
1323
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
1324
1325
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
1326
        q_mastr_gdf, q_buildings_gdf, SEED
1327
    )
1328
1329
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
1330
1331
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
1332
1333
1334
@timer_func
1335
def grid_districts(
1336
    epsg: int,
1337
) -> gpd.GeoDataFrame:
1338
    """
1339
    Load mv grid district geo data from eGo^n Database as
1340
    geopandas.GeoDataFrame.
1341
    Parameters
1342
    -----------
1343
    epsg : int
1344
        EPSG ID to use as CRS.
1345
    Returns
1346
    -------
1347
    geopandas.GeoDataFrame
1348
        GeoDataFrame containing mv grid district ID and geo shapes data.
1349
    """
1350
    gdf = db.select_geodataframe(
1351
        """
1352
        SELECT bus_id, geom
1353
        FROM grid.egon_mv_grid_district
1354
        ORDER BY bus_id
1355
        """,
1356
        index_col="bus_id",
1357
        geom_col="geom",
1358
        epsg=epsg,
1359
    )
1360
1361
    gdf.index = gdf.index.astype(int)
1362
1363
    logger.debug("Grid districts loaded.")
1364
1365
    return gdf
1366
1367
1368
def scenario_data(
1369
    carrier: str = "solar_rooftop",
1370
    scenario: str = "eGon2035",
1371
) -> pd.DataFrame:
1372
    """
1373
    Get scenario capacity data from eGo^n Database.
1374
    Parameters
1375
    -----------
1376
    carrier : str
1377
        Carrier type to filter table by.
1378
    scenario : str
1379
        Scenario to filter table by.
1380
    Returns
1381
    -------
1382
    geopandas.GeoDataFrame
1383
        GeoDataFrame with scenario capacity data in GW.
1384
    """
1385
    with db.session_scope() as session:
1386
        query = session.query(EgonScenarioCapacities).filter(
1387
            EgonScenarioCapacities.carrier == carrier,
1388
            EgonScenarioCapacities.scenario_name == scenario,
1389
        )
1390
1391
    df = pd.read_sql(
1392
        query.statement, query.session.bind, index_col="index"
1393
    ).sort_index()
1394
1395
    logger.debug("Scenario capacity data loaded.")
1396
1397
    return df
1398
1399
1400 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1401
    __tablename__ = "vg250_lan"
1402
    __table_args__ = {"schema": "boundaries"}
1403
1404
    id = Column(BigInteger, primary_key=True, index=True)
1405
    ade = Column(BigInteger)
1406
    gf = Column(BigInteger)
1407
    bsg = Column(BigInteger)
1408
    ars = Column(String)
1409
    ags = Column(String)
1410
    sdv_ars = Column(String)
1411
    gen = Column(String)
1412
    bez = Column(String)
1413
    ibz = Column(BigInteger)
1414
    bem = Column(String)
1415
    nbd = Column(String)
1416
    sn_l = Column(String)
1417
    sn_r = Column(String)
1418
    sn_k = Column(String)
1419
    sn_v1 = Column(String)
1420
    sn_v2 = Column(String)
1421
    sn_g = Column(String)
1422
    fk_s3 = Column(String)
1423
    nuts = Column(String)
1424
    ars_0 = Column(String)
1425
    ags_0 = Column(String)
1426
    wsk = Column(String)
1427
    debkg_id = Column(String)
1428
    rs = Column(String)
1429
    sdv_rs = Column(String)
1430
    rs_0 = Column(String)
1431
    geometry = Column(Geometry(srid=EPSG), index=True)
1432
1433
1434
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1435
    """
1436
    Get feder state data from eGo^n Database.
1437
    Parameters
1438
    -----------
1439
    to_crs : pyproj.crs.crs.CRS
1440
        CRS to transform geometries to.
1441
    Returns
1442
    -------
1443
    geopandas.GeoDataFrame
1444
        GeoDataFrame with federal state data.
1445
    """
1446
    with db.session_scope() as session:
1447
        query = session.query(
1448
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1449
        )
1450
1451
    gdf = gpd.read_postgis(
1452
        query.statement, query.session.bind, index_col="id"
1453
    ).to_crs(to_crs)
1454
1455
    logger.debug("Federal State data loaded.")
1456
1457
    return gdf
1458
1459
1460
@timer_func
1461
def overlay_grid_districts_with_counties(
1462
    mv_grid_district_gdf: gpd.GeoDataFrame,
1463
    federal_state_gdf: gpd.GeoDataFrame,
1464
) -> gpd.GeoDataFrame:
1465
    """
1466
    Calculate the intersections of mv grid districts and counties.
1467
    Parameters
1468
    -----------
1469
    mv_grid_district_gdf : gpd.GeoDataFrame
1470
        GeoDataFrame containing mv grid district ID and geo shapes data.
1471
    federal_state_gdf : gpd.GeoDataFrame
1472
        GeoDataFrame with federal state data.
1473
    Returns
1474
    -------
1475
    geopandas.GeoDataFrame
1476
        GeoDataFrame containing OSM buildings data.
1477
    """
1478
    logger.debug(
1479
        "Calculating intersection overlay between mv grid districts and "
1480
        "counties. This may take a while..."
1481
    )
1482
1483
    gdf = gpd.overlay(
1484
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1485
        mv_grid_district_gdf.reset_index(),
1486
        how="intersection",
1487
        keep_geom_type=True,
1488
    )
1489
1490
    logger.debug("Done!")
1491
1492
    return gdf
1493
1494
1495
@timer_func
1496
def add_overlay_id_to_buildings(
1497
    buildings_gdf: gpd.GeoDataFrame,
1498
    grid_federal_state_gdf: gpd.GeoDataFrame,
1499
) -> gpd.GeoDataFrame:
1500
    """
1501
    Add information about overlay ID to buildings.
1502
    Parameters
1503
    -----------
1504
    buildings_gdf : geopandas.GeoDataFrame
1505
        GeoDataFrame containing OSM buildings data.
1506
    grid_federal_state_gdf : geopandas.GeoDataFrame
1507
        GeoDataFrame with intersection shapes between counties and grid districts.
1508
    Returns
1509
    -------
1510
    geopandas.GeoDataFrame
1511
        GeoDataFrame containing OSM buildings data with overlay ID added.
1512
    """
1513
    gdf = (
1514
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1515
        .sjoin(
1516
            grid_federal_state_gdf,
1517
            how="left",
1518
            predicate="intersects",
1519
        )
1520
        .rename(columns={"index_right": "overlay_id"})
1521
    )
1522
1523
    logger.debug("Added overlay ID to OSM buildings.")
1524
1525
    return gdf
1526
1527
1528
def drop_buildings_outside_grids(
1529
    buildings_gdf: gpd.GeoDataFrame,
1530
) -> gpd.GeoDataFrame:
1531
    """
1532
    Drop all buildings outside of grid areas.
1533
    Parameters
1534
    -----------
1535
    buildings_gdf : geopandas.GeoDataFrame
1536
        GeoDataFrame containing OSM buildings data.
1537
    Returns
1538
    -------
1539
    gepandas.GeoDataFrame
1540
        GeoDataFrame containing OSM buildings data
1541
        with buildings without an bus ID dropped.
1542
    """
1543
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1544
1545
    logger.debug(
1546
        f"{len(buildings_gdf) - len(gdf)} "
1547
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1548
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1549
        "and are therefore dropped."
1550
    )
1551
1552
    return gdf
1553
1554
1555
def cap_per_bus_id(
1556
    scenario: str,
1557
) -> pd.DataFrame:
1558
    """
1559
    Get table with total pv rooftop capacity per grid district.
1560
1561
    Parameters
1562
    -----------
1563
    scenario : str
1564
        Scenario name.
1565
    Returns
1566
    -------
1567
    pandas.DataFrame
1568
        DataFrame with total rooftop capacity per mv grid.
1569
    """
1570
    targets = config.datasets()["solar_rooftop"]["targets"]
1571
1572
    sql = f"""
1573
    SELECT bus as bus_id, p_nom as capacity
1574
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1575
    WHERE carrier = 'solar_rooftop'
1576
    AND scn_name = '{scenario}'
1577
    """
1578
1579
    return db.select_dataframe(sql, index_col="bus_id")
1580
1581
    # overlay_gdf = overlay_gdf.assign(capacity=np.nan)
1582
    #
1583
    # for cap, nuts in scenario_df[["capacity", "nuts"]].itertuples(index=False):
1584
    #     nuts_gdf = overlay_gdf.loc[overlay_gdf.nuts == nuts]
1585
    #
1586
    #     capacity = nuts_gdf.building_area.multiply(
1587
    #         cap / nuts_gdf.building_area.sum()
1588
    #     )
1589
    #
1590
    #     overlay_gdf.loc[nuts_gdf.index] = overlay_gdf.loc[
1591
    #         nuts_gdf.index
1592
    #     ].assign(capacity=capacity.multiply(conversion).to_numpy())
1593
    #
1594
    # return overlay_gdf[["bus_id", "capacity"]].groupby("bus_id").sum()
1595
1596
1597
def determine_end_of_life_gens(
1598
    mastr_gdf: gpd.GeoDataFrame,
1599
    scenario_timestamp: pd.Timestamp,
1600
    pv_rooftop_lifetime: pd.Timedelta,
1601
) -> gpd.GeoDataFrame:
1602
    """
1603
    Determine if an old PV system has reached its end of life.
1604
    Parameters
1605
    -----------
1606
    mastr_gdf : geopandas.GeoDataFrame
1607
        GeoDataFrame containing geocoded MaStR data.
1608
    scenario_timestamp : pandas.Timestamp
1609
        Timestamp at which the scenario takes place.
1610
    pv_rooftop_lifetime : pandas.Timedelta
1611
        Average expected lifetime of PV rooftop systems.
1612
    Returns
1613
    -------
1614
    geopandas.GeoDataFrame
1615
        GeoDataFrame containing geocoded MaStR data and info if the system
1616
        has reached its end of life.
1617
    """
1618
    mastr_gdf = mastr_gdf.assign(
1619
        age=scenario_timestamp - mastr_gdf.start_up_date
1620
    )
1621
1622
    logger.debug("Determined if pv rooftop systems reached their end of life.")
1623
1624
    return mastr_gdf.assign(end_of_life=pv_rooftop_lifetime < mastr_gdf.age)
1625
1626
1627
def calculate_max_pv_cap_per_building(
1628
    buildings_gdf: gpd.GeoDataFrame,
1629
    mastr_gdf: gpd.GeoDataFrame,
1630
    pv_cap_per_sq_m: float | int,
1631
    roof_factor: float | int,
1632
) -> gpd.GeoDataFrame:
1633
    """
1634
    Calculate the estimated maximum possible PV capacity per building.
1635
    Parameters
1636
    -----------
1637
    buildings_gdf : geopandas.GeoDataFrame
1638
        GeoDataFrame containing OSM buildings data.
1639
    mastr_gdf : geopandas.GeoDataFrame
1640
        GeoDataFrame containing geocoded MaStR data.
1641
    pv_cap_per_sq_m : float, int
1642
        Average expected, installable PV capacity per square meter.
1643
    roof_factor : float, int
1644
        Average for PV usable roof area share.
1645
    Returns
1646
    -------
1647
    geopandas.GeoDataFrame
1648
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1649
        capacity.
1650
    """
1651
    gdf = (
1652
        buildings_gdf.reset_index()
1653
        .merge(
1654
            mastr_gdf[
1655
                [
1656
                    "capacity",
1657
                    "end_of_life",
1658
                    "building_id",
1659
                    "EinheitlicheAusrichtungUndNeigungswinkel",
1660
                    "Hauptausrichtung",
1661
                    "HauptausrichtungNeigungswinkel",
1662
                ]
1663
            ],
1664
            how="left",
1665
            left_on="id",
1666
            right_on="building_id",
1667
        )
1668
        .set_index("id")
1669
        .drop(columns="building_id")
1670
    )
1671
1672
    return gdf.assign(
1673
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1674
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1675
        bus_id=gdf.bus_id.astype(int),
1676
    )
1677
1678
1679
def calculate_building_load_factor(
1680
    mastr_gdf: gpd.GeoDataFrame,
1681
    buildings_gdf: gpd.GeoDataFrame,
1682
    rounding: int = 4,
1683
) -> gpd.GeoDataFrame:
1684
    """
1685
    Calculate the roof load factor from existing PV systems.
1686
    Parameters
1687
    -----------
1688
    mastr_gdf : geopandas.GeoDataFrame
1689
        GeoDataFrame containing geocoded MaStR data.
1690
    buildings_gdf : geopandas.GeoDataFrame
1691
        GeoDataFrame containing OSM buildings data.
1692
    rounding : int
1693
        Rounding to use for load factor.
1694
    Returns
1695
    -------
1696
    geopandas.GeoDataFrame
1697
        GeoDataFrame containing geocoded MaStR data with calculated load factor.
1698
    """
1699
    gdf = mastr_gdf.merge(
1700
        buildings_gdf[["max_cap", "building_area"]].loc[
1701
            ~buildings_gdf["max_cap"].isna()
1702
        ],
1703
        how="left",
1704
        left_on="building_id",
1705
        right_index=True,
1706
    )
1707
1708
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1709
1710
1711
def get_probability_for_property(
1712
    mastr_gdf: gpd.GeoDataFrame,
1713
    cap_range: tuple[int | float, int | float],
1714
    prop: str,
1715
) -> tuple[np.array, np.array]:
1716
    """
1717
    Calculate the probability of the different options of a property of the
1718
    existing PV plants.
1719
    Parameters
1720
    -----------
1721
    mastr_gdf : geopandas.GeoDataFrame
1722
        GeoDataFrame containing geocoded MaStR data.
1723
    cap_range : tuple(int, int)
1724
        Capacity range of PV plants to look at.
1725
    prop : str
1726
        Property to calculate probabilities for. String needs to be in columns
1727
        of mastr_gdf.
1728
    Returns
1729
    -------
1730
    tuple
1731
        numpy.array
1732
            Unique values of property.
1733
        numpy.array
1734
            Probabilties per unique value.
1735
    """
1736
    cap_range_gdf = mastr_gdf.loc[
1737
        (mastr_gdf.capacity > cap_range[0])
1738
        & (mastr_gdf.capacity <= cap_range[1])
1739
    ]
1740
1741
    if prop == "load_factor":
1742
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1743
1744
    count = Counter(
1745
        cap_range_gdf[prop].loc[
1746
            ~cap_range_gdf[prop].isna()
1747
            & ~cap_range_gdf[prop].isnull()
1748
            & ~(cap_range_gdf[prop] == "None")
1749
        ]
1750
    )
1751
1752
    values = np.array(list(count.keys()))
1753
    probabilities = np.fromiter(count.values(), dtype=float)
1754
    probabilities = probabilities / np.sum(probabilities)
1755
1756
    return values, probabilities
1757
1758
1759
@timer_func
1760
def probabilities(
1761
    mastr_gdf: gpd.GeoDataFrame,
1762
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1763
    properties: list[str] | None = None,
1764
) -> dict:
1765
    """
1766
    Calculate the probability of the different options of properties of the
1767
    existing PV plants.
1768
    Parameters
1769
    -----------
1770
    mastr_gdf : geopandas.GeoDataFrame
1771
        GeoDataFrame containing geocoded MaStR data.
1772
    cap_ranges : list(tuple(int, int))
1773
        List of capacity ranges to distinguish between. The first tuple should
1774
        start with a zero and the last one should end with infinite.
1775
    properties : list(str)
1776
        List of properties to calculate probabilities for. Strings needs to be
1777
        in columns of mastr_gdf.
1778
    Returns
1779
    -------
1780
    dict
1781
        Dictionary with values and probabilities per capacity range.
1782
    """
1783
    if cap_ranges is None:
1784
        cap_ranges = [
1785
            (0, 30),
1786
            (30, 100),
1787
            (100, float("inf")),
1788
        ]
1789
    if properties is None:
1790
        properties = [
1791
            "EinheitlicheAusrichtungUndNeigungswinkel",
1792
            "Hauptausrichtung",
1793
            "HauptausrichtungNeigungswinkel",
1794
            "load_factor",
1795
        ]
1796
1797
    prob_dict = {}
1798
1799
    for cap_range in cap_ranges:
1800
        prob_dict[cap_range] = {
1801
            "values": {},
1802
            "probabilities": {},
1803
        }
1804
1805
        for prop in properties:
1806
            v, p = get_probability_for_property(
1807
                mastr_gdf,
1808
                cap_range,
1809
                prop,
1810
            )
1811
1812
            prob_dict[cap_range]["values"][prop] = v
1813
            prob_dict[cap_range]["probabilities"][prop] = p
1814
1815
    return prob_dict
1816
1817
1818
def cap_share_per_cap_range(
1819
    mastr_gdf: gpd.GeoDataFrame,
1820
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1821
) -> dict[tuple[int | float, int | float], float]:
1822
    """
1823
    Calculate the share of PV capacity from the total PV capacity within
1824
    capacity ranges.
1825
    Parameters
1826
    -----------
1827
    mastr_gdf : geopandas.GeoDataFrame
1828
        GeoDataFrame containing geocoded MaStR data.
1829
    cap_ranges : list(tuple(int, int))
1830
        List of capacity ranges to distinguish between. The first tuple should
1831
        start with a zero and the last one should end with infinite.
1832
    Returns
1833
    -------
1834
    dict
1835
        Dictionary with share of PV capacity from the total PV capacity within
1836
        capacity ranges.
1837
    """
1838
    if cap_ranges is None:
1839
        cap_ranges = [
1840
            (0, 30),
1841
            (30, 100),
1842
            (100, float("inf")),
1843
        ]
1844
1845
    cap_share_dict = {}
1846
1847
    total_cap = mastr_gdf.capacity.sum()
1848
1849
    for cap_range in cap_ranges:
1850
        cap_share = (
1851
            mastr_gdf.loc[
1852
                (mastr_gdf.capacity > cap_range[0])
1853
                & (mastr_gdf.capacity <= cap_range[1])
1854
            ].capacity.sum()
1855
            / total_cap
1856
        )
1857
1858
        cap_share_dict[cap_range] = cap_share
1859
1860
    return cap_share_dict
1861
1862
1863
def mean_load_factor_per_cap_range(
1864
    mastr_gdf: gpd.GeoDataFrame,
1865
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1866
) -> dict[tuple[int | float, int | float], float]:
1867
    """
1868
    Calculate the mean roof load factor per capacity range from existing PV
1869
    plants.
1870
    Parameters
1871
    -----------
1872
    mastr_gdf : geopandas.GeoDataFrame
1873
        GeoDataFrame containing geocoded MaStR data.
1874
    cap_ranges : list(tuple(int, int))
1875
        List of capacity ranges to distinguish between. The first tuple should
1876
        start with a zero and the last one should end with infinite.
1877
    Returns
1878
    -------
1879
    dict
1880
        Dictionary with mean roof load factor per capacity range.
1881
    """
1882
    if cap_ranges is None:
1883
        cap_ranges = [
1884
            (0, 30),
1885
            (30, 100),
1886
            (100, float("inf")),
1887
        ]
1888
1889
    load_factor_dict = {}
1890
1891
    for cap_range in cap_ranges:
1892
        load_factor = mastr_gdf.loc[
1893
            (mastr_gdf.load_factor <= 1)
1894
            & (mastr_gdf.capacity > cap_range[0])
1895
            & (mastr_gdf.capacity <= cap_range[1])
1896
        ].load_factor.mean()
1897
1898
        load_factor_dict[cap_range] = load_factor
1899
1900
    return load_factor_dict
1901
1902
1903
def building_area_range_per_cap_range(
1904
    mastr_gdf: gpd.GeoDataFrame,
1905
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1906
    min_building_size: int | float = 10.0,
1907
    upper_quantile: float = 0.95,
1908
    lower_quantile: float = 0.05,
1909
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1910
    """
1911
    Estimate normal building area range per capacity range.
1912
    Calculate the mean roof load factor per capacity range from existing PV
1913
    plants.
1914
    Parameters
1915
    -----------
1916
    mastr_gdf : geopandas.GeoDataFrame
1917
        GeoDataFrame containing geocoded MaStR data.
1918
    cap_ranges : list(tuple(int, int))
1919
        List of capacity ranges to distinguish between. The first tuple should
1920
        start with a zero and the last one should end with infinite.
1921
    min_building_size : int, float
1922
        Minimal building size to consider for PV plants.
1923
    upper_quantile : float
1924
        Upper quantile to estimate maximum building size per capacity range.
1925
    lower_quantile : float
1926
        Lower quantile to estimate minimum building size per capacity range.
1927
    Returns
1928
    -------
1929
    dict
1930
        Dictionary with estimated normal building area range per capacity
1931
        range.
1932
    """
1933
    if cap_ranges is None:
1934
        cap_ranges = [
1935
            (0, 30),
1936
            (30, 100),
1937
            (100, float("inf")),
1938
        ]
1939
1940
    building_area_range_dict = {}
1941
1942
    n_ranges = len(cap_ranges)
1943
1944
    for count, cap_range in enumerate(cap_ranges):
1945
        cap_range_gdf = mastr_gdf.loc[
1946
            (mastr_gdf.capacity > cap_range[0])
1947
            & (mastr_gdf.capacity <= cap_range[1])
1948
        ]
1949
1950
        if count == 0:
1951
            building_area_range_dict[cap_range] = (
1952
                min_building_size,
1953
                cap_range_gdf.building_area.quantile(upper_quantile),
1954
            )
1955
        elif count == n_ranges - 1:
1956
            building_area_range_dict[cap_range] = (
1957
                cap_range_gdf.building_area.quantile(lower_quantile),
1958
                float("inf"),
1959
            )
1960
        else:
1961
            building_area_range_dict[cap_range] = (
1962
                cap_range_gdf.building_area.quantile(lower_quantile),
1963
                cap_range_gdf.building_area.quantile(upper_quantile),
1964
            )
1965
1966
    values = list(building_area_range_dict.values())
1967
1968
    building_area_range_normed_dict = {}
1969
1970
    for count, (cap_range, (min_area, max_area)) in enumerate(
1971
        building_area_range_dict.items()
1972
    ):
1973
        if count == 0:
1974
            building_area_range_normed_dict[cap_range] = (
1975
                min_area,
1976
                np.mean((values[count + 1][0], max_area)),
1977
            )
1978
        elif count == n_ranges - 1:
1979
            building_area_range_normed_dict[cap_range] = (
1980
                np.mean((values[count - 1][1], min_area)),
1981
                max_area,
1982
            )
1983
        else:
1984
            building_area_range_normed_dict[cap_range] = (
1985
                np.mean((values[count - 1][1], min_area)),
1986
                np.mean((values[count + 1][0], max_area)),
1987
            )
1988
1989
    return building_area_range_normed_dict
1990
1991
1992
@timer_func
1993
def desaggregate_pv_in_mv_grid(
1994
    buildings_gdf: gpd.GeoDataFrame,
1995
    pv_cap: float | int,
1996
    **kwargs,
1997
) -> gpd.GeoDataFrame:
1998
    """
1999
    Desaggregate PV capacity on buildings within a given grid district.
2000
    Parameters
2001
    -----------
2002
    buildings_gdf : geopandas.GeoDataFrame
2003
        GeoDataFrame containing buildings within the grid district.
2004
    pv_cap : float, int
2005
        PV capacity to desaggregate.
2006
    Other Parameters
2007
    -----------
2008
    prob_dict : dict
2009
        Dictionary with values and probabilities per capacity range.
2010
    cap_share_dict : dict
2011
        Dictionary with share of PV capacity from the total PV capacity within
2012
        capacity ranges.
2013
    building_area_range_dict : dict
2014
        Dictionary with estimated normal building area range per capacity
2015
        range.
2016
    load_factor_dict : dict
2017
        Dictionary with mean roof load factor per capacity range.
2018
    seed : int
2019
        Seed to use for random operations with NumPy and pandas.
2020
    pv_cap_per_sq_m : float, int
2021
        Average expected, installable PV capacity per square meter.
2022
    Returns
2023
    -------
2024
    geopandas.GeoDataFrame
2025
        GeoDataFrame containing OSM building data with desaggregated PV
2026
        plants.
2027
    """
2028
    bus_id = int(buildings_gdf.bus_id.iat[0])
2029
2030
    rng = default_rng(seed=kwargs["seed"])
2031
    random_state = RandomState(seed=kwargs["seed"])
2032
2033
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
2034
2035
    for cap_range, share in kwargs["cap_share_dict"].items():
2036
        pv_cap_range = pv_cap * share
2037
2038
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
2039
2040
        cap_range_buildings_gdf = buildings_gdf.loc[
2041
            ~buildings_gdf.index.isin(results_df.index)
2042
            & (buildings_gdf.building_area > b_area_min)
2043
            & (buildings_gdf.building_area <= b_area_max)
2044
        ]
2045
2046
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
2047
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2048
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2049
            load_factor=np.nan,
2050
            capacity=np.nan,
2051
        )
2052
2053
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2054
2055
        if total_mean_cap == 0:
2056
            logger.warning(
2057
                f"There are no matching roof for capacity range {cap_range} "
2058
                f"kW in grid {bus_id}. Using all buildings as fallback."
2059
            )
2060
2061
            cap_range_buildings_gdf = buildings_gdf.loc[
2062
                ~buildings_gdf.index.isin(results_df.index)
2063
            ]
2064
2065
            if len(cap_range_buildings_gdf) == 0:
2066
                logger.warning(
2067
                    "There are no roofes available for capacity range "
2068
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
2069
                )
2070
                cap_range_buildings_gdf = buildings_gdf.copy()
2071
2072
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2073
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2074
                load_factor=np.nan,
2075
                capacity=np.nan,
2076
            )
2077
2078
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2079
2080
        elif total_mean_cap < pv_cap_range:
2081
            logger.warning(
2082
                f"Average roof utilization of the roof area in grid {bus_id} "
2083
                f"and capacity range {cap_range} kW is not sufficient. The "
2084
                "roof utilization will be above average."
2085
            )
2086
2087
        frac = max(
2088
            pv_cap_range / total_mean_cap,
2089
            1 / len(cap_range_buildings_gdf),
2090
        )
2091
2092
        samples_gdf = cap_range_buildings_gdf.sample(
2093
            frac=min(1, frac),
2094
            random_state=random_state,
2095
        )
2096
2097
        cap_range_dict = kwargs["prob_dict"][cap_range]
2098
2099
        values_dict = cap_range_dict["values"]
2100
        p_dict = cap_range_dict["probabilities"]
2101
2102
        load_factors = rng.choice(
2103
            a=values_dict["load_factor"],
2104
            size=len(samples_gdf),
2105
            p=p_dict["load_factor"],
2106
        )
2107
2108
        samples_gdf = samples_gdf.assign(
2109
            load_factor=load_factors,
2110
            capacity=(
2111
                samples_gdf.building_area
2112
                * load_factors
2113
                * kwargs["pv_cap_per_sq_m"]
2114
            ).clip(lower=0.4),
2115
        )
2116
2117
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
2118
2119
        samples_gdf = samples_gdf.assign(
2120
            capacity=(samples_gdf.capacity * missing_factor),
2121
            load_factor=(samples_gdf.load_factor * missing_factor),
2122
        )
2123
2124
        assert np.isclose(
2125
            samples_gdf.capacity.sum(),
2126
            pv_cap_range,
2127
            rtol=1e-03,
2128
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
2129
2130
        results_df = pd.concat(
2131
            [
2132
                results_df,
2133
                samples_gdf,
2134
            ],
2135
        )
2136
2137
    total_missing_factor = pv_cap / results_df.capacity.sum()
2138
2139
    results_df = results_df.assign(
2140
        capacity=(results_df.capacity * total_missing_factor),
2141
    )
2142
2143
    assert np.isclose(
2144
        results_df.capacity.sum(),
2145
        pv_cap,
2146
        rtol=1e-03,
2147
    ), f"{results_df.capacity.sum()} != {pv_cap}"
2148
2149
    return gpd.GeoDataFrame(
2150
        results_df,
2151
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 2035 is not entered. Are you sure this can never be the case?
Loading history...
2152
        geometry="geom",
2153
    )
2154
2155
2156
@timer_func
2157
def desaggregate_pv(
2158
    buildings_gdf: gpd.GeoDataFrame,
2159
    cap_df: pd.DataFrame,
2160
    **kwargs,
2161
) -> gpd.GeoDataFrame:
2162
    """
2163
    Desaggregate PV capacity on buildings within a given grid district.
2164
    Parameters
2165
    -----------
2166
    buildings_gdf : geopandas.GeoDataFrame
2167
        GeoDataFrame containing OSM buildings data.
2168
    cap_df : pandas.DataFrame
2169
        DataFrame with total rooftop capacity per mv grid.
2170
    Other Parameters
2171
    -----------
2172
    prob_dict : dict
2173
        Dictionary with values and probabilities per capacity range.
2174
    cap_share_dict : dict
2175
        Dictionary with share of PV capacity from the total PV capacity within
2176
        capacity ranges.
2177
    building_area_range_dict : dict
2178
        Dictionary with estimated normal building area range per capacity
2179
        range.
2180
    load_factor_dict : dict
2181
        Dictionary with mean roof load factor per capacity range.
2182
    seed : int
2183
        Seed to use for random operations with NumPy and pandas.
2184
    pv_cap_per_sq_m : float, int
2185
        Average expected, installable PV capacity per square meter.
2186
    Returns
2187
    -------
2188
    geopandas.GeoDataFrame
2189
        GeoDataFrame containing OSM building data with desaggregated PV
2190
        plants.
2191
    """
2192
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
2193
2194
    building_bus_ids = set(buildings_gdf.bus_id)
2195
    cap_bus_ids = set(cap_df.index)
2196
2197
    logger.debug(
2198
        f"Bus IDs from buildings: {len(building_bus_ids)}\nBus IDs from capacity: "
2199
        f"{len(cap_bus_ids)}"
2200
    )
2201
2202
    if len(building_bus_ids) > len(cap_bus_ids):
2203
        missing = building_bus_ids - cap_bus_ids
2204
    else:
2205
        missing = cap_bus_ids - building_bus_ids
2206
2207
    logger.debug(str(missing))
2208
2209
    # assert set(buildings_gdf.bus_id.unique()) == set(cap_df.index)
2210
2211
    for bus_id in buildings_gdf.bus_id.unique():
2212
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
2213
2214
        pv_installed_gdf = buildings_grid_gdf.loc[
2215
            ~buildings_grid_gdf.end_of_life
2216
        ]
2217
2218
        pv_installed = pv_installed_gdf.capacity.sum()
2219
2220
        pot_buildings_gdf = buildings_grid_gdf.drop(
2221
            index=pv_installed_gdf.index
2222
        )
2223
2224
        if len(pot_buildings_gdf) == 0:
2225
            logger.error(
2226
                f"In grid {bus_id} there are no potential buildings to allocate "
2227
                "PV capacity to. The grid is skipped. This message should only "
2228
                "appear doing test runs with few buildings."
2229
            )
2230
2231
            continue
2232
2233
        pv_target = cap_df.at[bus_id, "capacity"] * 1000
2234
2235
        logger.debug(f"pv_target: {pv_target}")
2236
2237
        pv_missing = pv_target - pv_installed
2238
2239
        if pv_missing <= 0:
2240
            logger.info(
2241
                f"In grid {bus_id} there is more PV installed ({pv_installed: g}) in "
2242
                f"status Quo than allocated within the scenario ({pv_target: g}). No "
2243
                f"new generators are added."
2244
            )
2245
2246
            continue
2247
2248
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
2249
            logger.error(
2250
                f"In grid {bus_id} there is less PV potential ("
2251
                f"{pot_buildings_gdf.max_cap.sum():g} kW) than allocated PV "
2252
                f"capacity ({pv_missing:g} kW). The grid is skipped. This message "
2253
                "should only appear doing test runs with few buildings."
2254
            )
2255
2256
            continue
2257
2258
        gdf = desaggregate_pv_in_mv_grid(
2259
            buildings_gdf=pot_buildings_gdf,
2260
            pv_cap=pv_missing,
2261
            **kwargs,
2262
        )
2263
2264
        logger.debug(f"New cap in grid {bus_id}: {gdf.capacity.sum()}")
2265
        logger.debug(f"Installed cap in grid {bus_id}: {pv_installed}")
2266
        logger.debug(
2267
            f"Total cap in grid {bus_id}: {gdf.capacity.sum() + pv_installed}"
2268
        )
2269
2270
        allocated_buildings_gdf = pd.concat(
2271
            [
2272
                allocated_buildings_gdf,
2273
                gdf,
2274
            ]
2275
        )
2276
2277
    logger.debug("Desaggregated scenario.")
2278
    logger.debug(f"Scenario capacity: {cap_df.capacity.sum(): g}")
2279
    logger.debug(
2280
        f"Generator capacity: {allocated_buildings_gdf.capacity.sum(): g}"
2281
    )
2282
2283
    return gpd.GeoDataFrame(
2284
        allocated_buildings_gdf,
2285
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
2286
        geometry="geom",
2287
    )
2288
2289
2290
@timer_func
2291
def add_buildings_meta_data(
2292
    buildings_gdf: gpd.GeoDataFrame,
2293
    prob_dict: dict,
2294
    seed: int,
2295
) -> gpd.GeoDataFrame:
2296
    """
2297
    Randomly add additional metadata to desaggregated PV plants.
2298
    Parameters
2299
    -----------
2300
    buildings_gdf : geopandas.GeoDataFrame
2301
        GeoDataFrame containing OSM buildings data with desaggregated PV
2302
        plants.
2303
    prob_dict : dict
2304
        Dictionary with values and probabilities per capacity range.
2305
    seed : int
2306
        Seed to use for random operations with NumPy and pandas.
2307
    Returns
2308
    -------
2309
    geopandas.GeoDataFrame
2310
        GeoDataFrame containing OSM building data with desaggregated PV
2311
        plants.
2312
    """
2313
    rng = default_rng(seed=seed)
2314
    buildings_gdf = buildings_gdf.reset_index().rename(
2315
        columns={
2316
            "index": "building_id",
2317
        }
2318
    )
2319
2320
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
2321
        cap_range_gdf = buildings_gdf.loc[
2322
            (buildings_gdf.capacity >= min_cap)
2323
            & (buildings_gdf.capacity < max_cap)
2324
        ]
2325
2326
        for key, values in cap_range_prob_dict["values"].items():
2327
            if key == "load_factor":
2328
                continue
2329
2330
            gdf = cap_range_gdf.loc[
2331
                cap_range_gdf[key].isna()
2332
                | cap_range_gdf[key].isnull()
2333
                | (cap_range_gdf[key] == "None")
2334
            ]
2335
2336
            key_vals = rng.choice(
2337
                a=values,
2338
                size=len(gdf),
2339
                p=cap_range_prob_dict["probabilities"][key],
2340
            )
2341
2342
            buildings_gdf.loc[gdf.index, key] = key_vals
2343
2344
    return buildings_gdf
2345
2346
2347
def add_voltage_level(
2348
    buildings_gdf: gpd.GeoDataFrame,
2349
) -> gpd.GeoDataFrame:
2350
    """
2351
    Add voltage level derived from generator capacity to the power plants.
2352
    Parameters
2353
    -----------
2354
    buildings_gdf : geopandas.GeoDataFrame
2355
        GeoDataFrame containing OSM buildings data with desaggregated PV
2356
        plants.
2357
    Returns
2358
    -------
2359
    geopandas.GeoDataFrame
2360
        GeoDataFrame containing OSM building data with voltage level per generator.
2361
    """
2362
2363
    def voltage_levels(p: float) -> int:
2364
        if p < 100:
2365
            return 7
2366
        elif p < 200:
2367
            return 6
2368
        elif p < 5500:
2369
            return 5
2370
        elif p < 20000:
2371
            return 4
2372
        elif p < 120000:
2373
            return 3
2374
        return 1
2375
2376
    return buildings_gdf.assign(
2377
        voltage_level=buildings_gdf.capacity.apply(voltage_levels)
2378
    )
2379
2380
2381
def add_start_up_date(
2382
    buildings_gdf: gpd.GeoDataFrame,
2383
    start: pd.Timestamp,
2384
    end: pd.Timestamp,
2385
    seed: int,
2386
):
2387
    """
2388
    Randomly and linear add start-up date to new pv generators.
2389
    Parameters
2390
    ----------
2391
    buildings_gdf : geopandas.GeoDataFrame
2392
        GeoDataFrame containing OSM buildings data with desaggregated PV
2393
        plants.
2394
    start : pandas.Timestamp
2395
        Minimum Timestamp to use.
2396
    end : pandas.Timestamp
2397
        Maximum Timestamp to use.
2398
    seed : int
2399
        Seed to use for random operations with NumPy and pandas.
2400
    Returns
2401
    -------
2402
    geopandas.GeoDataFrame
2403
        GeoDataFrame containing OSM buildings data with start-up date added.
2404
    """
2405
    rng = default_rng(seed=seed)
2406
2407
    date_range = pd.date_range(start=start, end=end, freq="1D")
2408
2409
    return buildings_gdf.assign(
2410
        start_up_date=rng.choice(date_range, size=len(buildings_gdf))
2411
    )
2412
2413
2414
@timer_func
2415
def allocate_scenarios(
2416
    mastr_gdf: gpd.GeoDataFrame,
2417
    buildings_gdf: gpd.GeoDataFrame,
2418
    last_scenario_gdf: gpd.GeoDataFrame,
2419
    scenario: str,
2420
):
2421
    """
2422
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2423
    Parameters
2424
    ----------
2425
    mastr_gdf : geopandas.GeoDataFrame
2426
        GeoDataFrame containing geocoded MaStR data.
2427
    buildings_gdf : geopandas.GeoDataFrame
2428
        GeoDataFrame containing OSM buildings data.
2429
    last_scenario_gdf : geopandas.GeoDataFrame
2430
        GeoDataFrame containing OSM buildings matched with pv generators from temporal
2431
        preceding scenario.
2432
    scenario : str
2433
        Scenario to desaggrgate and allocate.
2434
    Returns
2435
    -------
2436
    tuple
2437
        geopandas.GeoDataFrame
2438
            GeoDataFrame containing OSM buildings matched with pv generators.
2439
        pandas.DataFrame
2440
            DataFrame containing pv rooftop capacity per grid id.
2441
    """
2442
    grid_districts_gdf = grid_districts(EPSG)
2443
2444
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
2445
2446
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
2447
        grid_districts_gdf,
2448
        federal_state_gdf,
2449
    )
2450
2451
    buildings_overlay_gdf = add_overlay_id_to_buildings(
2452
        buildings_gdf,
2453
        grid_federal_state_gdf,
2454
    )
2455
2456
    valid_buildings_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
2457
2458
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2459
2460
    logger.debug(
2461
        f"cap_per_bus_id_df total capacity: {cap_per_bus_id_df.capacity.sum()}"
2462
    )
2463
2464
    last_scenario_gdf = determine_end_of_life_gens(
2465
        last_scenario_gdf,
2466
        SCENARIO_TIMESTAMP[scenario],
2467
        PV_ROOFTOP_LIFETIME,
2468
    )
2469
2470
    buildings_gdf = calculate_max_pv_cap_per_building(
2471
        valid_buildings_gdf,
2472
        last_scenario_gdf,
2473
        PV_CAP_PER_SQ_M,
2474
        ROOF_FACTOR,
2475
    )
2476
2477
    mastr_gdf = calculate_building_load_factor(
2478
        mastr_gdf,
2479
        buildings_gdf,
2480
    )
2481
2482
    probabilities_dict = probabilities(
2483
        mastr_gdf,
2484
        cap_ranges=CAP_RANGES,
2485
    )
2486
2487
    cap_share_dict = cap_share_per_cap_range(
2488
        mastr_gdf,
2489
        cap_ranges=CAP_RANGES,
2490
    )
2491
2492
    load_factor_dict = mean_load_factor_per_cap_range(
2493
        mastr_gdf,
2494
        cap_ranges=CAP_RANGES,
2495
    )
2496
2497
    building_area_range_dict = building_area_range_per_cap_range(
2498
        mastr_gdf,
2499
        cap_ranges=CAP_RANGES,
2500
        min_building_size=MIN_BUILDING_SIZE,
2501
        upper_quantile=UPPER_QUNATILE,
2502
        lower_quantile=LOWER_QUANTILE,
2503
    )
2504
2505
    allocated_buildings_gdf = desaggregate_pv(
2506
        buildings_gdf=buildings_gdf,
2507
        cap_df=cap_per_bus_id_df,
2508
        prob_dict=probabilities_dict,
2509
        cap_share_dict=cap_share_dict,
2510
        building_area_range_dict=building_area_range_dict,
2511
        load_factor_dict=load_factor_dict,
2512
        seed=SEED,
2513
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2514
    )
2515
2516
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2517
2518
    meta_buildings_gdf = frame_to_numeric(
2519
        add_buildings_meta_data(
2520
            allocated_buildings_gdf,
2521
            probabilities_dict,
2522
            SEED,
2523
        )
2524
    )
2525
2526
    return (
2527
        add_start_up_date(
2528
            meta_buildings_gdf,
2529
            start=last_scenario_gdf.start_up_date.max(),
2530
            end=SCENARIO_TIMESTAMP[scenario],
2531
            seed=SEED,
2532
        ),
2533
        cap_per_bus_id_df,
2534
    )
2535
2536
2537
class EgonPowerPlantPvRoofBuildingScenario(Base):
2538
    __tablename__ = "egon_power_plants_pv_roof_building"
2539
    __table_args__ = {"schema": "supply"}
2540
2541
    index = Column(Integer, primary_key=True, index=True)
2542
    scenario = Column(String)
2543
    building_id = Column(Integer)
2544
    gens_id = Column(String, nullable=True)
2545
    capacity = Column(Float)
2546
    einheitliche_ausrichtung_und_neigungswinkel = Column(Float)
2547
    hauptausrichtung = Column(String)
2548
    hauptausrichtung_neigungswinkel = Column(String)
2549
    voltage_level = Column(Integer)
2550
2551
2552
def create_scenario_table(buildings_gdf):
2553
    """Create mapping table pv_unit <-> building for scenario"""
2554
    EgonPowerPlantPvRoofBuildingScenario.__table__.drop(
2555
        bind=engine, checkfirst=True
2556
    )
2557
    EgonPowerPlantPvRoofBuildingScenario.__table__.create(
2558
        bind=engine, checkfirst=True
2559
    )
2560
2561
    buildings_gdf.rename(columns=COLS_TO_RENAME)[
2562
        COLS_TO_EXPORT
2563
    ].reset_index().to_sql(
2564
        name=EgonPowerPlantPvRoofBuildingScenario.__table__.name,
2565
        schema=EgonPowerPlantPvRoofBuildingScenario.__table__.schema,
2566
        con=db.engine(),
2567
        if_exists="append",
2568
        index=False,
2569
    )
2570
2571
2572
def geocode_mastr_data():
2573
    """
2574
    Read PV rooftop data from MaStR CSV
2575
    TODO: the source will be replaced as soon as the MaStR data is available
2576
     in DB.
2577
    """
2578
    mastr_df = mastr_data(
2579
        MASTR_INDEX_COL,
2580
        MASTR_RELEVANT_COLS,
2581
        MASTR_DTYPES,
2582
        MASTR_PARSE_DATES,
2583
    )
2584
2585
    clean_mastr_df = clean_mastr_data(
2586
        mastr_df,
2587
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
2588
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
2589
        seed=SEED,
2590
        rounding=ROUNDING,
2591
    )
2592
2593
    geocoding_df = geocoding_data(clean_mastr_df)
2594
2595
    ratelimiter = geocoder(USER_AGENT, MIN_DELAY_SECONDS)
2596
2597
    geocode_gdf = geocode_data(geocoding_df, ratelimiter, EPSG)
2598
2599
    create_geocoded_table(geocode_gdf)
2600
2601
2602
def pv_rooftop_to_buildings():
2603
    """Main script, executed as task"""
2604
2605
    mastr_gdf = load_mastr_data()
2606
2607
    buildings_gdf = load_building_data()
2608
2609
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2610
        mastr_gdf, buildings_gdf
2611
    )
2612
2613
    all_buildings_gdf = (
2614
        desagg_mastr_gdf.assign(scenario="status_quo")
2615
        .reset_index()
2616
        .rename(columns={"geometry": "geom", "EinheitMastrNummer": "gens_id"})
2617
    )
2618
2619
    scenario_buildings_gdf = all_buildings_gdf.copy()
2620
2621
    cap_per_bus_id_df = pd.DataFrame()
2622
2623
    for scenario in SCENARIOS:
2624
        logger.debug(f"Desaggregating scenario {scenario}.")
2625
        (
2626
            scenario_buildings_gdf,
2627
            cap_per_bus_id_scenario_df,
2628
        ) = allocate_scenarios(  # noqa: F841
2629
            desagg_mastr_gdf,
2630
            desagg_buildings_gdf,
2631
            scenario_buildings_gdf,
2632
            scenario,
2633
        )
2634
2635
        all_buildings_gdf = gpd.GeoDataFrame(
2636
            pd.concat(
2637
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2638
            ),
2639
            crs=scenario_buildings_gdf.crs,
2640
            geometry="geom",
2641
        )
2642
2643
        cap_per_bus_id_df = pd.concat(
2644
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2645
        )
2646
2647
    # export scenario
2648
    create_scenario_table(add_voltage_level(all_buildings_gdf))
2649