Passed
Pull Request — dev (#809)
by
unknown
01:34
created

data.datasets.power_plants.pv_rooftop_buildings   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 2647
Duplicated Lines 1.21 %

Importance

Changes 0
Metric Value
wmc 115
eloc 1165
dl 32
loc 2647
rs 0.94
c 0
b 0
f 0

53 Functions

Rating   Name   Duplication   Size   Complexity  
A synthetic_buildings() 0 25 2
A geocoding_data() 0 18 1
A geocoded_data_from_db() 0 23 2
A add_ags_to_gens() 0 24 1
A drop_buildings_outside_muns() 0 25 1
B clean_mastr_data() 0 143 2
A mastr_data() 0 60 2
A geocode_data() 0 46 3
B zip_and_municipality_from_standort() 0 50 7
A merge_geocode_with_mastr() 0 27 1
A municipality_data() 0 14 2
A osm_buildings() 0 25 2
A add_ags_to_buildings() 0 24 1
A timer_func() 0 11 1
A create_geocoded_table() 0 17 1
A load_mastr_data() 0 35 1
A drop_gens_outside_muns() 0 25 1
A geocoder() 0 22 1
A drop_invalid_entries_from_gdf() 0 25 1
A most_plausible() 0 41 4
A egon_building_peak_loads() 0 7 1
A load_building_data() 0 52 3
A get_probability_for_property() 0 46 2
A calculate_building_load_factor() 0 30 1
A determine_end_of_life_gens() 0 28 1
A add_start_up_date() 0 30 1
A buildings_area_per_overlay_id() 0 33 1
B add_voltage_level() 0 31 6
A add_overlay_id_to_buildings() 0 31 1
A cap_per_bus_id() 0 33 1
B desaggregate_pv_in_mv_grid() 0 151 5
A calculate_max_pv_cap_per_building() 0 49 1
A scenario_data() 0 30 2
A drop_buildings_outside_grids() 0 25 1
C building_area_range_per_cap_range() 0 87 8
A add_buildings_meta_data() 0 55 4
B allocate_pv() 0 125 6
A sort_and_qcut_df() 0 29 1
A federal_state_data() 0 24 2
A geocode_mastr_data() 0 28 1
A mean_load_factor_per_cap_range() 0 38 3
B desaggregate_pv() 0 104 5
A grid_districts() 0 32 1
A frame_to_numeric() 0 20 4
A pv_rooftop_to_buildings() 0 47 2
A allocate_to_buildings() 0 32 1
A overlay_grid_districts_with_counties() 0 33 1
B allocate_scenarios() 0 126 1
A validate_output() 0 50 1
A create_scenario_table() 0 17 1
A cap_share_per_cap_range() 0 43 3
A drop_unallocated_gens() 0 27 1
B probabilities() 0 57 5

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.power_plants.pv_rooftop_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate new
3
PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
Data cleaning: Drop duplicates and entries with missing critical data. Determine most
5
plausible capacity from multiple values given in MaStR data. Drop generators which don't
6
have any plausible capacity data (23.5MW > P > 0.1). Randomly and weighted add a
7
start-up date if it is missing. Extract zip and municipality from 'Standort' given in
8
MaStR data. Geocode unique zip and municipality combinations with Nominatim (1sec
9
delay). Drop generators for which geocoding failed or which are located outside the
10
municipalities of Germany. Add some visual sanity checks for cleaned data.
11
Allocation of MaStR data: Allocate each generator to an existing building from OSM.
12
Determine the quantile each generator and building is in depending on the capacity of
13
the generator and the area of the polygon of the building. Randomly distribute
14
generators within each municipality preferably within the same building area quantile as
15
the generators are capacity wise. If not enough buildings exists within a municipality
16
and quantile additional buildings from other quantiles are chosen randomly.
17
Desegregation of pv rooftop scenarios: The scenario data per federal state is linear
18
distributed to the mv grid districts according to the pv rooftop potential per mv grid
19
district. The rooftop potential is estimated from the building area given from the OSM
20
buildings. Grid districts, which are located in several federal states, are allocated PV
21
capacity according to their respective roof potential in the individual federal states.
22
The desegregation of PV plants within a grid districts respects existing plants from
23
MaStR, which did not reach their end of life. New PV plants are randomly and weighted
24
generated using a breakdown of MaStR data as generator basis. Plant metadata (e.g. plant
25
orientation) is also added random and weighted from MaStR data as basis.
26
"""
27
from __future__ import annotations
28
29
from collections import Counter
30
from functools import wraps
31
from pathlib import Path
32
from time import perf_counter
33
from typing import Any
34
35
from geoalchemy2 import Geometry
36
from geopy.extra.rate_limiter import RateLimiter
37
from geopy.geocoders import Nominatim
38
from loguru import logger
39
from numpy.random import RandomState, default_rng
40
from pyproj.crs.crs import CRS
41
from sqlalchemy import BigInteger, Column, Float, Integer, String
42
from sqlalchemy.dialects.postgresql import HSTORE
43
from sqlalchemy.ext.declarative import declarative_base
44
import geopandas as gpd
45
import numpy as np
46
import pandas as pd
47
48
from egon.data import config, db
49
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
50
    OsmBuildingsSynthetic,
51
)
52
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
53
from egon.data.datasets.zensus_vg250 import Vg250Gem
54
55
engine = db.engine()
56
Base = declarative_base()
57
SEED = int(config.settings()["egon-data"]["--random-seed"])
58
59
# TODO: move to yml
60
# mastr data
61
MASTR_RELEVANT_COLS = [
62
    "EinheitMastrNummer",
63
    "Bruttoleistung",
64
    "StatisikFlag",
65
    "Bruttoleistung_extended",
66
    "Nettonennleistung",
67
    "InstallierteLeistung",
68
    "zugeordneteWirkleistungWechselrichter",
69
    "EinheitBetriebsstatus",
70
    "Standort",
71
    "Bundesland",
72
    "Land",
73
    "Landkreis",
74
    "Gemeinde",
75
    "Postleitzahl",
76
    "Ort",
77
    "GeplantesInbetriebnahmedatum",
78
    "Inbetriebnahmedatum",
79
    "GemeinsamerWechselrichterMitSpeicher",
80
    "Lage",
81
    "Leistungsbegrenzung",
82
    "EinheitlicheAusrichtungUndNeigungswinkel",
83
    "Hauptausrichtung",
84
    "HauptausrichtungNeigungswinkel",
85
    "Nebenausrichtung",
86
]
87
88
MASTR_DTYPES = {
89
    "EinheitMastrNummer": str,
90
    "Bruttoleistung": float,
91
    "StatisikFlag": str,
92
    "Bruttoleistung_extended": float,
93
    "Nettonennleistung": float,
94
    "InstallierteLeistung": float,
95
    "zugeordneteWirkleistungWechselrichter": float,
96
    "EinheitBetriebsstatus": str,
97
    "Standort": str,
98
    "Bundesland": str,
99
    "Land": str,
100
    "Landkreis": str,
101
    "Gemeinde": str,
102
    # "Postleitzahl": int,  # fails because of nan values
103
    "Ort": str,
104
    "GemeinsamerWechselrichterMitSpeicher": str,
105
    "Lage": str,
106
    "Leistungsbegrenzung": str,
107
    # this will parse nan values as false wich is not always correct
108
    # "EinheitlicheAusrichtungUndNeigungswinkel": bool,
109
    "Hauptausrichtung": str,
110
    "HauptausrichtungNeigungswinkel": str,
111
    "Nebenausrichtung": str,
112
    "NebenausrichtungNeigungswinkel": str,
113
}
114
115
MASTR_PARSE_DATES = [
116
    "GeplantesInbetriebnahmedatum",
117
    "Inbetriebnahmedatum",
118
]
119
120
MASTR_INDEX_COL = "EinheitMastrNummer"
121
122
EPSG = 4326
123
SRID = 3035
124
125
# data cleaning
126
MAX_REALISTIC_PV_CAP = 23500
127
MIN_REALISTIC_PV_CAP = 0.1
128
ROUNDING = 1
129
130
# geopy
131
MIN_DELAY_SECONDS = 1
132
USER_AGENT = "rli_kh_geocoder"
133
134
# show additional logging information
135
VERBOSE = False
136
137
EXPORT_DIR = Path(__name__).resolve().parent / "data"
138
EXPORT_FILE = "mastr_geocoded.gpkg"
139
EXPORT_PATH = EXPORT_DIR / EXPORT_FILE
140
DRIVER = "GPKG"
141
142
# Number of quantiles
143
Q = 5
144
145
# Scenario Data
146
CARRIER = "solar_rooftop"
147
SCENARIOS = ["eGon2035"]  # , "eGon100RE"]
148
SCENARIO_TIMESTAMP = {
149
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
150
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
151
}
152
PV_ROOFTOP_LIFETIME = pd.Timedelta(30 * 365, unit="D")
153
154
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
155
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
156
MODUL_CAP = 0.4  # kWp
157
MODUL_SIZE = 1.096 * 1.754  # m²
158
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
159
160
# Estimation of usable roof area
161
# Factor for the conversion of building area to roof area
162
# estimation mean roof pitch: 35°
163
# estimation usable roof share: 80%
164
# estimation that only the south side of the building is used for pv
165
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
166
# AREA_FACTOR = 1.221
167
# USABLE_ROOF_SHARE = 0.8
168
# SOUTH_SHARE = 0.5
169
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
170
ROOF_FACTOR = 0.5
171
172
CAP_RANGES = [
173
    (0, 30),
174
    (30, 100),
175
    (100, float("inf")),
176
]
177
178
MIN_BUILDING_SIZE = 10.0
179
UPPER_QUNATILE = 0.95
180
LOWER_QUANTILE = 0.05
181
182
COLS_TO_RENAME = {
183
    "EinheitlicheAusrichtungUndNeigungswinkel": (
184
        "einheitliche_ausrichtung_und_neigungswinkel"
185
    ),
186
    "Hauptausrichtung": "hauptausrichtung",
187
    "HauptausrichtungNeigungswinkel": "hauptausrichtung_neigungswinkel",
188
}
189
190
COLS_TO_EXPORT = [
191
    "scenario",
192
    "building_id",
193
    "gens_id",
194
    "capacity",
195
    "einheitliche_ausrichtung_und_neigungswinkel",
196
    "hauptausrichtung",
197
    "hauptausrichtung_neigungswinkel",
198
    "voltage_level",
199
]
200
201
INCLUDE_SYNTHETIC_BUILDINGS = False
202
ONLY_BUILDINGS_WITH_DEMAND = True
203
204
205
def timer_func(func):
206
    @wraps(func)
207
    def timeit_wrapper(*args, **kwargs):
208
        start_time = perf_counter()
209
        result = func(*args, **kwargs)
210
        end_time = perf_counter()
211
        total_time = end_time - start_time
212
        print(f"Function {func.__name__} took {total_time:.4f} seconds.")
213
        return result
214
215
    return timeit_wrapper
216
217
218
@timer_func
219
def mastr_data(
220
    index_col: str | int | list[str] | list[int],
221
    usecols: list[str],
222
    dtype: dict[str, Any] | None,
223
    parse_dates: list[str] | None,
224
) -> pd.DataFrame:
225
    """
226
    Read MaStR data from csv.
227
228
    Parameters
229
    -----------
230
    index_col : str, int or list of str or int
231
        Column(s) to use as the row labels of the DataFrame.
232
    usecols : list of str
233
        Return a subset of the columns.
234
    dtype : dict of column (str) -> type (any), optional
235
        Data type for data or columns.
236
    parse_dates : list of names (str), optional
237
        Try to parse given columns to datetime.
238
    Returns
239
    -------
240
    pandas.DataFrame
241
        DataFrame containing MaStR data.
242
    """
243
    mastr_path = Path(
244
        config.datasets()["power_plants"]["sources"]["mastr_pv"]
245
    ).resolve()
246
247
    mastr_df = pd.read_csv(
248
        mastr_path,
249
        index_col=index_col,
250
        usecols=usecols,
251
        dtype=dtype,
252
        parse_dates=parse_dates,
253
    )
254
255
    mastr_df = mastr_df.loc[
256
        (mastr_df.StatisikFlag == "B")
257
        & (mastr_df.EinheitBetriebsstatus == "InBetrieb")
258
        & (mastr_df.Land == "Deutschland")
259
        & (mastr_df.Lage == "BaulicheAnlagen")
260
    ]
261
262
    if (
263
        config.settings()["egon-data"]["--dataset-boundary"]
264
        == "Schleswig-Holstein"
265
    ):
266
        init_len = len(mastr_df)
267
268
        mastr_df = mastr_df.loc[mastr_df.Bundesland == "SchleswigHolstein"]
269
270
        logger.info(
271
            f"Using only MaStR data within Schleswig-Holstein. "
272
            f"{init_len - len(mastr_df)} of {init_len} generators are dropped."
273
        )
274
275
    logger.debug("MaStR data loaded.")
276
277
    return mastr_df
278
279
280
@timer_func
281
def clean_mastr_data(
282
    mastr_df: pd.DataFrame,
283
    max_realistic_pv_cap: int | float,
284
    min_realistic_pv_cap: int | float,
285
    rounding: int,
286
    seed: int,
287
) -> pd.DataFrame:
288
    """
289
    Clean the MaStR data from implausible data.
290
291
    * Drop MaStR ID duplicates.
292
    * Drop generators with implausible capacities.
293
    * Drop generators without any kind of start-up date.
294
    * Clean up Standort column and capacity.
295
296
    Parameters
297
    -----------
298
    mastr_df : pandas.DataFrame
299
        DataFrame containing MaStR data.
300
    max_realistic_pv_cap : int or float
301
        Maximum capacity, which is considered to be realistic.
302
    min_realistic_pv_cap : int or float
303
        Minimum capacity, which is considered to be realistic.
304
    rounding : int
305
        Rounding to use when cleaning up capacity. E.g. when
306
        rounding is 1 a capacity of 9.93 will be rounded to 9.9.
307
    seed : int
308
        Seed to use for random operations with NumPy and pandas.
309
    Returns
310
    -------
311
    pandas.DataFrame
312
        DataFrame containing cleaned MaStR data.
313
    """
314
    init_len = len(mastr_df)
315
316
    # drop duplicates
317
    mastr_df = mastr_df.loc[~mastr_df.index.duplicated()]
318
319
    # drop invalid entries in standort
320
    index_to_drop = mastr_df.loc[
321
        (mastr_df.Standort.isna()) | (mastr_df.Standort.isnull())
322
    ].index
323
324
    mastr_df = mastr_df.loc[~mastr_df.index.isin(index_to_drop)]
325
326
    df = mastr_df[
327
        [
328
            "Bruttoleistung",
329
            "Bruttoleistung_extended",
330
            "Nettonennleistung",
331
            "zugeordneteWirkleistungWechselrichter",
332
            "InstallierteLeistung",
333
        ]
334
    ].round(rounding)
335
336
    # use only the smallest capacity rating if multiple are given
337
    mastr_df = mastr_df.assign(
338
        capacity=[
339
            most_plausible(p_tub, min_realistic_pv_cap)
340
            for p_tub in df.itertuples(index=False)
341
        ]
342
    )
343
344
    # drop generators without any capacity info
345
    # and capacity of zero
346
    # and if the capacity is > 23.5 MW, because
347
    # Germanies largest rooftop PV is 23 MW
348
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
349
    mastr_df = mastr_df.loc[
350
        (~mastr_df.capacity.isna())
351
        & (mastr_df.capacity <= max_realistic_pv_cap)
352
        & (mastr_df.capacity > min_realistic_pv_cap)
353
    ]
354
355
    # get zip and municipality
356
    mastr_df[["zip_and_municipality", "drop_this"]] = pd.DataFrame(
357
        mastr_df.Standort.astype(str)
358
        .apply(
359
            zip_and_municipality_from_standort,
360
            args=(VERBOSE,),
361
        )
362
        .tolist(),
363
        index=mastr_df.index,
364
    )
365
366
    # drop invalid entries
367
    mastr_df = mastr_df.loc[mastr_df.drop_this].drop(columns="drop_this")
368
369
    # add ", Deutschland" just in case
370
    mastr_df = mastr_df.assign(
371
        zip_and_municipality=(mastr_df.zip_and_municipality + ", Deutschland")
372
    )
373
374
    # get consistent start-up date
375
    mastr_df = mastr_df.assign(
376
        start_up_date=mastr_df.Inbetriebnahmedatum,
377
    )
378
379
    mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
380
        mastr_df.start_up_date.isna()
381
    ].assign(
382
        start_up_date=mastr_df.GeplantesInbetriebnahmedatum.loc[
383
            mastr_df.start_up_date.isna()
384
        ]
385
    )
386
387
    # randomly and weighted fill missing start-up dates
388
    pool = mastr_df.loc[
389
        ~mastr_df.start_up_date.isna()
390
    ].start_up_date.to_numpy()
391
392
    size = len(mastr_df) - len(pool)
393
394
    if size > 0:
395
        np.random.seed(seed)
396
397
        choice = np.random.choice(
398
            pool,
399
            size=size,
400
            replace=False,
401
        )
402
403
        mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
404
            mastr_df.start_up_date.isna()
405
        ].assign(start_up_date=choice)
406
407
        logger.info(
408
            f"Randomly and weigthed added start-up date to {size} generators."
409
        )
410
411
    mastr_df = mastr_df.assign(
412
        start_up_date=pd.to_datetime(mastr_df.start_up_date, utc=True)
413
    )
414
415
    end_len = len(mastr_df)
416
    logger.debug(
417
        f"Dropped {init_len - end_len} "
418
        f"({((init_len - end_len) / init_len) * 100:g}%)"
419
        f" of {init_len} rows from MaStR DataFrame."
420
    )
421
422
    return mastr_df
423
424
425
def zip_and_municipality_from_standort(
426
    standort: str,
427
    verbose: bool = False,
428
) -> tuple[str, bool]:
429
    """
430
    Get zip code and municipality from Standort string split into a list.
431
    Parameters
432
    -----------
433
    standort : str
434
        Standort as given from MaStR data.
435
    verbose : bool
436
        Logs additional info if True.
437
    Returns
438
    -------
439
    str
440
        Standort with only the zip code and municipality
441
        as well a ', Germany' added.
442
    """
443
    if verbose:
444
        logger.debug(f"Uncleaned String: {standort}")
445
446
    standort_list = standort.split()
447
448
    found = False
449
    count = 0
450
451
    for count, elem in enumerate(standort_list):
452
        if len(elem) != 5:
453
            continue
454
        if not elem.isnumeric():
455
            continue
456
457
        found = True
458
459
        break
460
461
    if found:
462
        cleaned_str = " ".join(standort_list[count:])
463
464
        if verbose:
465
            logger.debug(f"Cleaned String:   {cleaned_str}")
466
467
        return cleaned_str, found
468
469
    logger.warning(
470
        "Couldn't identify zip code. This entry will be dropped."
471
        f" Original standort: {standort}."
472
    )
473
474
    return standort, found
475
476
477
def most_plausible(
478
    p_tub: tuple,
479
    min_realistic_pv_cap: int | float,
480
) -> float:
481
    """
482
    Try to determine the most plausible capacity.
483
    Try to determine the most plausible capacity from a given
484
    generator from MaStR data.
485
    Parameters
486
    -----------
487
    p_tub : tuple
488
        Tuple containing the different capacities given in
489
        the MaStR data.
490
    min_realistic_pv_cap : int or float
491
        Minimum capacity, which is considered to be realistic.
492
    Returns
493
    -------
494
    float
495
        Capacity of the generator estimated as the most realistic.
496
    """
497
    count = Counter(p_tub).most_common(3)
498
499
    if len(count) == 1:
500
        return count[0][0]
501
502
    val1 = count[0][0]
503
    val2 = count[1][0]
504
505
    if len(count) == 2:
506
        min_val = min(val1, val2)
507
        max_val = max(val1, val2)
508
    else:
509
        val3 = count[2][0]
510
511
        min_val = min(val1, val2, val3)
512
        max_val = max(val1, val2, val3)
513
514
    if min_val < min_realistic_pv_cap:
515
        return max_val
516
517
    return min_val
518
519
520
def geocoder(
521
    user_agent: str,
522
    min_delay_seconds: int,
523
) -> RateLimiter:
524
    """
525
    Setup Nominatim geocoding class.
526
    Parameters
527
    -----------
528
    user_agent : str
529
        The app name.
530
    min_delay_seconds : int
531
        Delay in seconds to use between requests to Nominatim.
532
        A minimum of 1 is advised.
533
    Returns
534
    -------
535
    geopy.extra.rate_limiter.RateLimiter
536
        Nominatim RateLimiter geocoding class to use for geocoding.
537
    """
538
    locator = Nominatim(user_agent=user_agent)
539
    return RateLimiter(
540
        locator.geocode,
541
        min_delay_seconds=min_delay_seconds,
542
    )
543
544
545
def geocoding_data(
546
    clean_mastr_df: pd.DataFrame,
547
) -> pd.DataFrame:
548
    """
549
    Setup DataFrame to geocode.
550
    Parameters
551
    -----------
552
    clean_mastr_df : pandas.DataFrame
553
        DataFrame containing cleaned MaStR data.
554
    Returns
555
    -------
556
    pandas.DataFrame
557
        DataFrame containing all unique combinations of
558
        zip codes with municipalities for geocoding.
559
    """
560
    return pd.DataFrame(
561
        data=clean_mastr_df.zip_and_municipality.unique(),
562
        columns=["zip_and_municipality"],
563
    )
564
565
566
@timer_func
567
def geocode_data(
568
    geocoding_df: pd.DataFrame,
569
    ratelimiter: RateLimiter,
570
    epsg: int,
571
) -> gpd.GeoDataFrame:
572
    """
573
    Geocode zip code and municipality.
574
    Extract latitude, longitude and altitude.
575
    Transfrom latitude and longitude to shapely
576
    Point and return a geopandas GeoDataFrame.
577
    Parameters
578
    -----------
579
    geocoding_df : pandas.DataFrame
580
        DataFrame containing all unique combinations of
581
        zip codes with municipalities for geocoding.
582
    ratelimiter : geopy.extra.rate_limiter.RateLimiter
583
        Nominatim RateLimiter geocoding class to use for geocoding.
584
    epsg : int
585
        EPSG ID to use as CRS.
586
    Returns
587
    -------
588
    geopandas.GeoDataFrame
589
        GeoDataFrame containing all unique combinations of
590
        zip codes with municipalities with matching geolocation.
591
    """
592
    logger.info(f"Geocoding {len(geocoding_df)} locations.")
593
594
    geocode_df = geocoding_df.assign(
595
        location=geocoding_df.zip_and_municipality.apply(ratelimiter)
596
    )
597
598
    geocode_df = geocode_df.assign(
599
        point=geocode_df.location.apply(
600
            lambda loc: tuple(loc.point) if loc else None
601
        )
602
    )
603
604
    geocode_df[["latitude", "longitude", "altitude"]] = pd.DataFrame(
605
        geocode_df.point.tolist(), index=geocode_df.index
606
    )
607
608
    return gpd.GeoDataFrame(
609
        geocode_df,
610
        geometry=gpd.points_from_xy(geocode_df.longitude, geocode_df.latitude),
611
        crs=f"EPSG:{epsg}",
612
    )
613
614
615
def merge_geocode_with_mastr(
616
    clean_mastr_df: pd.DataFrame, geocode_gdf: gpd.GeoDataFrame
617
) -> gpd.GeoDataFrame:
618
    """
619
    Merge geometry to original mastr data.
620
    Parameters
621
    -----------
622
    clean_mastr_df : pandas.DataFrame
623
        DataFrame containing cleaned MaStR data.
624
    geocode_gdf : geopandas.GeoDataFrame
625
        GeoDataFrame containing all unique combinations of
626
        zip codes with municipalities with matching geolocation.
627
    Returns
628
    -------
629
    gepandas.GeoDataFrame
630
        GeoDataFrame containing cleaned MaStR data with
631
        matching geolocation from geocoding.
632
    """
633
    return gpd.GeoDataFrame(
634
        clean_mastr_df.merge(
635
            geocode_gdf[["zip_and_municipality", "geometry"]],
636
            how="left",
637
            left_on="zip_and_municipality",
638
            right_on="zip_and_municipality",
639
        ),
640
        crs=geocode_gdf.crs,
641
    ).set_index(clean_mastr_df.index)
642
643
644
def drop_invalid_entries_from_gdf(
645
    gdf: gpd.GeoDataFrame,
646
) -> gpd.GeoDataFrame:
647
    """
648
    Drop invalid entries from geopandas GeoDataFrame.
649
    TODO: how to omit the logging from geos here???
650
    Parameters
651
    -----------
652
    gdf : geopandas.GeoDataFrame
653
        GeoDataFrame to be checked for validity.
654
    Returns
655
    -------
656
    gepandas.GeoDataFrame
657
        GeoDataFrame with rows with invalid geometries
658
        dropped.
659
    """
660
    valid_gdf = gdf.loc[gdf.is_valid]
661
662
    logger.debug(
663
        f"{len(gdf) - len(valid_gdf)} "
664
        f"({(len(gdf) - len(valid_gdf)) / len(gdf) * 100:g}%) "
665
        f"of {len(gdf)} values were invalid and are dropped."
666
    )
667
668
    return valid_gdf
669
670
671
@timer_func
672
def municipality_data() -> gpd.GeoDataFrame:
673
    """
674
    Get municipality data from eGo^n Database.
675
    Returns
676
    -------
677
    gepandas.GeoDataFrame
678
        GeoDataFrame with municipality data.
679
    """
680
    with db.session_scope() as session:
681
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
682
683
    return gpd.read_postgis(
684
        query.statement, query.session.bind, index_col="ags"
685
    )
686
687
688
@timer_func
689
def add_ags_to_gens(
690
    valid_mastr_gdf: gpd.GeoDataFrame,
691
    municipalities_gdf: gpd.GeoDataFrame,
692
) -> gpd.GeoDataFrame:
693
    """
694
    Add information about AGS ID to generators.
695
    Parameters
696
    -----------
697
    valid_mastr_gdf : geopandas.GeoDataFrame
698
        GeoDataFrame with valid and cleaned MaStR data.
699
    municipalities_gdf : geopandas.GeoDataFrame
700
        GeoDataFrame with municipality data.
701
    Returns
702
    -------
703
    gepandas.GeoDataFrame
704
        GeoDataFrame with valid and cleaned MaStR data
705
        with AGS ID added.
706
    """
707
    return valid_mastr_gdf.sjoin(
708
        municipalities_gdf,
709
        how="left",
710
        predicate="intersects",
711
    ).rename(columns={"index_right": "ags"})
712
713
714
def drop_gens_outside_muns(
715
    valid_mastr_gdf: gpd.GeoDataFrame,
716
) -> gpd.GeoDataFrame:
717
    """
718
    Drop all generators outside of municipalities.
719
    Parameters
720
    -----------
721
    valid_mastr_gdf : geopandas.GeoDataFrame
722
        GeoDataFrame with valid and cleaned MaStR data.
723
    Returns
724
    -------
725
    gepandas.GeoDataFrame
726
        GeoDataFrame with valid and cleaned MaStR data
727
        with generatos without an AGS ID dropped.
728
    """
729
    gdf = valid_mastr_gdf.loc[~valid_mastr_gdf.ags.isna()]
730
731
    logger.debug(
732
        f"{len(valid_mastr_gdf) - len(gdf)} "
733
        f"({(len(valid_mastr_gdf) - len(gdf)) / len(valid_mastr_gdf) * 100:g}%) "
734
        f"of {len(valid_mastr_gdf)} values are outside of the municipalities"
735
        " and are therefore dropped."
736
    )
737
738
    return gdf
739
740
741
class EgonMastrPvRoofGeocoded(Base):
742
    __tablename__ = "egon_mastr_pv_roof_geocoded"
743
    __table_args__ = {"schema": "supply"}
744
745
    zip_and_municipality = Column(String, primary_key=True, index=True)
746
    location = Column(String)
747
    point = Column(String)
748
    latitude = Column(Float)
749
    longitude = Column(Float)
750
    altitude = Column(Float)
751
    geometry = Column(Geometry(srid=EPSG))
752
753
754
def create_geocoded_table(geocode_gdf):
755
    """
756
    Create geocoded table mastr pv rooftop
757
    Parameters
758
    -----------
759
    geocode_gdf : geopandas.GeoDataFrame
760
        GeoDataFrame containing geocoding information on pv rooftop locations.
761
    """
762
    EgonMastrPvRoofGeocoded.__table__.drop(bind=engine, checkfirst=True)
763
    EgonMastrPvRoofGeocoded.__table__.create(bind=engine, checkfirst=True)
764
765
    geocode_gdf.to_postgis(
766
        name=EgonMastrPvRoofGeocoded.__table__.name,
767
        schema=EgonMastrPvRoofGeocoded.__table__.schema,
768
        con=db.engine(),
769
        if_exists="append",
770
        index=False,
771
        # dtype={}
772
    )
773
774
775
def geocoded_data_from_db(
776
    epsg: str | int,
777
) -> gpd.GeoDataFrame:
778
    """
779
    Read OSM buildings data from eGo^n Database.
780
    Parameters
781
    -----------
782
    to_crs : pyproj.crs.crs.CRS
783
        CRS to transform geometries to.
784
    Returns
785
    -------
786
    geopandas.GeoDataFrame
787
        GeoDataFrame containing OSM buildings data.
788
    """
789
    with db.session_scope() as session:
790
        query = session.query(
791
            EgonMastrPvRoofGeocoded.zip_and_municipality,
792
            EgonMastrPvRoofGeocoded.geometry,
793
        )
794
795
    return gpd.read_postgis(
796
        query.statement, query.session.bind, geom_col="geometry"
797
    ).to_crs(f"EPSG:{epsg}")
798
799
800
def load_mastr_data():
801
    """Read PV rooftop data from MaStR CSV
802
    Note: the source will be replaced as soon as the MaStR data is available
803
    in DB.
804
    Returns
805
    -------
806
    geopandas.GeoDataFrame
807
        GeoDataFrame containing MaStR data with geocoded locations.
808
    """
809
    mastr_df = mastr_data(
810
        MASTR_INDEX_COL,
811
        MASTR_RELEVANT_COLS,
812
        MASTR_DTYPES,
813
        MASTR_PARSE_DATES,
814
    )
815
816
    clean_mastr_df = clean_mastr_data(
817
        mastr_df,
818
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
819
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
820
        seed=SEED,
821
        rounding=ROUNDING,
822
    )
823
824
    geocode_gdf = geocoded_data_from_db(EPSG)
825
826
    mastr_gdf = merge_geocode_with_mastr(clean_mastr_df, geocode_gdf)
827
828
    valid_mastr_gdf = drop_invalid_entries_from_gdf(mastr_gdf)
829
830
    municipalities_gdf = municipality_data()
831
832
    valid_mastr_gdf = add_ags_to_gens(valid_mastr_gdf, municipalities_gdf)
833
834
    return drop_gens_outside_muns(valid_mastr_gdf)
835
836
837
class OsmBuildingsFiltered(Base):
838
    __tablename__ = "osm_buildings_filtered"
839
    __table_args__ = {"schema": "openstreetmap"}
840
841
    osm_id = Column(BigInteger)
842
    amenity = Column(String)
843
    building = Column(String)
844
    name = Column(String)
845
    geom = Column(Geometry(srid=SRID), index=True)
846
    area = Column(Float)
847
    geom_point = Column(Geometry(srid=SRID), index=True)
848
    tags = Column(HSTORE)
849
    id = Column(BigInteger, primary_key=True, index=True)
850
851
852
@timer_func
853
def osm_buildings(
854
    to_crs: CRS,
855
) -> gpd.GeoDataFrame:
856
    """
857
    Read OSM buildings data from eGo^n Database.
858
    Parameters
859
    -----------
860
    to_crs : pyproj.crs.crs.CRS
861
        CRS to transform geometries to.
862
    Returns
863
    -------
864
    geopandas.GeoDataFrame
865
        GeoDataFrame containing OSM buildings data.
866
    """
867
    with db.session_scope() as session:
868
        query = session.query(
869
            OsmBuildingsFiltered.id,
870
            OsmBuildingsFiltered.area,
871
            OsmBuildingsFiltered.geom_point.label("geom"),
872
        )
873
874
    return gpd.read_postgis(
875
        query.statement, query.session.bind, index_col="id"
876
    ).to_crs(to_crs)
877
878
879
@timer_func
880
def synthetic_buildings(
881
    to_crs: CRS,
882
) -> gpd.GeoDataFrame:
883
    """
884
    Read synthetic buildings data from eGo^n Database.
885
    Parameters
886
    -----------
887
    to_crs : pyproj.crs.crs.CRS
888
        CRS to transform geometries to.
889
    Returns
890
    -------
891
    geopandas.GeoDataFrame
892
        GeoDataFrame containing OSM buildings data.
893
    """
894
    with db.session_scope() as session:
895
        query = session.query(
896
            OsmBuildingsSynthetic.id,
897
            OsmBuildingsSynthetic.area,
898
            OsmBuildingsSynthetic.geom_point.label("geom"),
899
        )
900
901
    return gpd.read_postgis(
902
        query.statement, query.session.bind, index_col="id"
903
    ).to_crs(to_crs)
904
905
906
@timer_func
907
def add_ags_to_buildings(
908
    buildings_gdf: gpd.GeoDataFrame,
909
    municipalities_gdf: gpd.GeoDataFrame,
910
) -> gpd.GeoDataFrame:
911
    """
912
    Add information about AGS ID to buildings.
913
    Parameters
914
    -----------
915
    buildings_gdf : geopandas.GeoDataFrame
916
        GeoDataFrame containing OSM buildings data.
917
    municipalities_gdf : geopandas.GeoDataFrame
918
        GeoDataFrame with municipality data.
919
    Returns
920
    -------
921
    gepandas.GeoDataFrame
922
        GeoDataFrame containing OSM buildings data
923
        with AGS ID added.
924
    """
925
    return buildings_gdf.sjoin(
926
        municipalities_gdf,
927
        how="left",
928
        predicate="intersects",
929
    ).rename(columns={"index_right": "ags"})
930
931
932
def drop_buildings_outside_muns(
933
    buildings_gdf: gpd.GeoDataFrame,
934
) -> gpd.GeoDataFrame:
935
    """
936
    Drop all buildings outside of municipalities.
937
    Parameters
938
    -----------
939
    buildings_gdf : geopandas.GeoDataFrame
940
        GeoDataFrame containing OSM buildings data.
941
    Returns
942
    -------
943
    gepandas.GeoDataFrame
944
        GeoDataFrame containing OSM buildings data
945
        with buildings without an AGS ID dropped.
946
    """
947
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
948
949
    logger.debug(
950
        f"{len(buildings_gdf) - len(gdf)} "
951
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
952
        f"of {len(buildings_gdf)} values are outside of the municipalities "
953
        "and are therefore dropped."
954
    )
955
956
    return gdf
957
958
959
def egon_building_peak_loads():
960
    sql = f"""
961
    SELECT building_id
962
    FROM demand.egon_building_peak_loads
963
    """
964
965
    return db.select_dataframe(sql).building_id.astype(int).sort_values()
966
967
968
@timer_func
969
def load_building_data():
970
    """
971
    Read buildings from DB
972
    Tables:
973
974
    * `openstreetmap.osm_buildings_filtered` (from OSM)
975
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
976
977
    Use column `id` for both as it is unique hence you concat both datasets. If
978
    INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be loaded.
979
980
    Returns
981
    -------
982
    gepandas.GeoDataFrame
983
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
984
        dropped.
985
    """
986
987
    municipalities_gdf = municipality_data()
988
989
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
990
991
    if INCLUDE_SYNTHETIC_BUILDINGS:
992
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
993
994
        buildings_gdf = gpd.GeoDataFrame(
995
            pd.concat(
996
                [
997
                    osm_buildings_gdf,
998
                    synthetic_buildings_gdf,
999
                ]
1000
            ),
1001
            geometry="geom",
1002
            crs=osm_buildings_gdf.crs,
1003
        ).rename(columns={"area": "building_area"})
1004
1005
    else:
1006
        buildings_gdf = osm_buildings_gdf.rename(
1007
            columns={"area": "building_area"}
1008
        )
1009
1010
    if ONLY_BUILDINGS_WITH_DEMAND:
1011
        building_ids = egon_building_peak_loads()
1012
1013
        buildings_gdf = buildings_gdf.loc[building_ids]
1014
1015
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
1016
1017
    logger.debug("Loaded buildings.")
1018
1019
    return drop_buildings_outside_muns(buildings_ags_gdf)
1020
1021
1022
@timer_func
1023
def sort_and_qcut_df(
1024
    df: pd.DataFrame | gpd.GeoDataFrame,
1025
    col: str,
1026
    q: int,
1027
) -> pd.DataFrame | gpd.GeoDataFrame:
1028
    """
1029
    Determine the quantile of a given attribute in a (Geo)DataFrame.
1030
    Sort the (Geo)DataFrame in ascending order for the given attribute.
1031
    Parameters
1032
    -----------
1033
    df : pandas.DataFrame or geopandas.GeoDataFrame
1034
        (Geo)DataFrame to sort and qcut.
1035
    col : str
1036
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
1037
    q : int
1038
        Number of quantiles.
1039
    Returns
1040
    -------
1041
    pandas.DataFrame or gepandas.GeoDataFrame
1042
        Sorted and qcut (Geo)DataFrame.
1043
    """
1044
    df = df.sort_values(col, ascending=True)
1045
1046
    return df.assign(
1047
        quant=pd.qcut(
1048
            df[col],
1049
            q=q,
1050
            labels=range(q),
1051
        )
1052
    )
1053
1054
1055
@timer_func
1056
def allocate_pv(
1057
    q_mastr_gdf: gpd.GeoDataFrame,
1058
    q_buildings_gdf: gpd.GeoDataFrame,
1059
    seed: int,
1060
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1061
    """
1062
    Allocate the MaStR pv generators to the OSM buildings.
1063
    This will determine a building for each pv generator if there are more
1064
    buildings than generators within a given AGS. Primarily generators are
1065
    distributed with the same qunatile as the buildings. Multiple assignment
1066
    is excluded.
1067
    Parameters
1068
    -----------
1069
    q_mastr_gdf : geopandas.GeoDataFrame
1070
        GeoDataFrame containing geocoded and qcut MaStR data.
1071
    q_buildings_gdf : geopandas.GeoDataFrame
1072
        GeoDataFrame containing qcut OSM buildings data.
1073
    seed : int
1074
        Seed to use for random operations with NumPy and pandas.
1075
    Returns
1076
    -------
1077
    tuple with two geopandas.GeoDataFrame s
1078
        GeoDataFrame containing MaStR data allocated to building IDs.
1079
        GeoDataFrame containing building data allocated to MaStR IDs.
1080
    """
1081
    rng = default_rng(seed=seed)
1082
1083
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan)
1084
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan)
1085
1086
    ags_list = q_buildings_gdf.ags.unique()
1087
1088
    num_ags = len(ags_list)
1089
1090
    t0 = perf_counter()
1091
1092
    for count, ags in enumerate(ags_list):
1093
1094
        buildings = q_buildings_gdf.loc[
1095
            (q_buildings_gdf.ags == ags) & (q_buildings_gdf.gens_id.isna())
1096
        ]
1097
        gens = q_mastr_gdf.loc[
1098
            (q_mastr_gdf.ags == ags) & (q_mastr_gdf.building_id.isna())
1099
        ]
1100
1101
        len_build = len(buildings)
1102
        len_gens = len(gens)
1103
1104
        if len_build < len_gens:
1105
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
1106
            logger.error(
1107
                f"There are {len_gens} generators and only {len_build}"
1108
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
1109
                "generators were truncated to match the amount of buildings."
1110
            )
1111
1112
            assert len_build == len(gens)
1113
1114
        for quant in gens.quant.unique():
1115
            q_buildings = buildings.loc[
1116
                (buildings.quant == quant) & (buildings.gens_id.isna())
1117
            ]
1118
            q_gens = gens.loc[gens.quant == quant]
1119
1120
            len_build = len(q_buildings)
1121
            len_gens = len(q_gens)
1122
1123
            if len_build < len_gens:
1124
                delta = len_gens - len_build
1125
1126
                logger.warning(
1127
                    f"There are {len_gens} generators and only {len_build} "
1128
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
1129
                    f"buildings from AGS {ags} will be added randomly."
1130
                )
1131
1132
                add_buildings = pd.Index(
1133
                    rng.choice(
1134
                        buildings.loc[
1135
                            (buildings.quant != quant)
1136
                            & (buildings.gens_id.isna())
1137
                        ].index,
1138
                        size=delta,
1139
                        replace=False,
1140
                    )
1141
                )
1142
1143
                q_buildings = buildings.loc[
1144
                    q_buildings.index.append(add_buildings)
1145
                ]
1146
1147
                assert len(q_buildings) == len_gens
1148
1149
            chosen_buildings = pd.Index(
1150
                rng.choice(
1151
                    q_buildings.index,
1152
                    size=len_gens,
1153
                    replace=False,
1154
                )
1155
            )
1156
1157
            # q_mastr_gdf.loc[q_gens.index] = q_mastr_gdf.loc[
1158
            #     q_gens.index
1159
            # ].assign(building_id=chosen_buildings)
1160
1161
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
1162
1163
        if count % 100 == 0:
1164
            logger.debug(
1165
                f"Allocation of {count / num_ags * 100:g} % of AGS done. It took "
1166
                f"{perf_counter() - t0:g} seconds."
1167
            )
1168
1169
            t0 = perf_counter()
1170
1171
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
1172
1173
    q_mastr_gdf.loc[
1174
        assigned_buildings.gens_id, "building_id"
1175
    ] = assigned_buildings.index
1176
1177
    logger.debug("Allocated status quo generators to buildings.")
1178
1179
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
1180
1181
1182
def frame_to_numeric(
1183
    df: pd.DataFrame | gpd.GeoDataFrame,
1184
) -> pd.DataFrame | gpd.GeoDataFrame:
1185
    """
1186
    Try to convert all columns of a DataFrame to numeric ignoring errors.
1187
    Parameters
1188
    ----------
1189
    df : pandas.DataFrame or geopandas.GeoDataFrame
1190
    Returns
1191
    -------
1192
    pandas.DataFrame or geopandas.GeoDataFrame
1193
    """
1194
    if str(df.index.dtype) == "object":
1195
        df.index = pd.to_numeric(df.index, errors="ignore")
1196
1197
    for col in df.columns:
1198
        if str(df[col].dtype) == "object":
1199
            df[col] = pd.to_numeric(df[col], errors="ignore")
1200
1201
    return df
1202
1203
1204
def validate_output(
1205
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
1206
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
1207
) -> None:
1208
    """
1209
    Validate output.
1210
1211
    * Validate that there are exactly as many buildings with a pv system as there are
1212
      pv systems with a building
1213
    * Validate that the building IDs with a pv system are the same building IDs as
1214
      assigned to the pv systems
1215
    * Validate that the pv system IDs with a building are the same pv system IDs as
1216
      assigned to the buildings
1217
1218
    Parameters
1219
    -----------
1220
    desagg_mastr_gdf : geopandas.GeoDataFrame
1221
        GeoDataFrame containing MaStR data allocated to building IDs.
1222
    desagg_buildings_gdf : geopandas.GeoDataFrame
1223
        GeoDataFrame containing building data allocated to MaStR IDs.
1224
    """
1225
    assert len(
1226
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
1227
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
1228
    assert (
1229
        np.sort(
1230
            desagg_mastr_gdf.loc[
1231
                ~desagg_mastr_gdf.building_id.isna()
1232
            ].building_id.unique()
1233
        )
1234
        == np.sort(
1235
            desagg_buildings_gdf.loc[
1236
                ~desagg_buildings_gdf.gens_id.isna()
1237
            ].index.unique()
1238
        )
1239
    ).all()
1240
    assert (
1241
        np.sort(
1242
            desagg_mastr_gdf.loc[
1243
                ~desagg_mastr_gdf.building_id.isna()
1244
            ].index.unique()
1245
        )
1246
        == np.sort(
1247
            desagg_buildings_gdf.loc[
1248
                ~desagg_buildings_gdf.gens_id.isna()
1249
            ].gens_id.unique()
1250
        )
1251
    ).all()
1252
1253
    logger.debug("Validated output.")
1254
1255
1256
def drop_unallocated_gens(
1257
    gdf: gpd.GeoDataFrame,
1258
) -> gpd.GeoDataFrame:
1259
    """
1260
    Drop generators which did not get allocated.
1261
1262
    Parameters
1263
    -----------
1264
    gdf : geopandas.GeoDataFrame
1265
        GeoDataFrame containing MaStR data allocated to building IDs.
1266
    Returns
1267
    -------
1268
    geopandas.GeoDataFrame
1269
        GeoDataFrame containing MaStR data with generators dropped which did not get
1270
        allocated.
1271
    """
1272
    init_len = len(gdf)
1273
    gdf = gdf.loc[~gdf.building_id.isna()]
1274
    end_len = len(gdf)
1275
1276
    logger.debug(
1277
        f"Dropped {init_len - end_len} "
1278
        f"({((init_len - end_len) / init_len) * 100:g}%)"
1279
        f" of {init_len} unallocated rows from MaStR DataFrame."
1280
    )
1281
1282
    return gdf
1283
1284
1285
@timer_func
1286
def allocate_to_buildings(
1287
    mastr_gdf: gpd.GeoDataFrame,
1288
    buildings_gdf: gpd.GeoDataFrame,
1289
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1290
    """
1291
    Allocate status quo pv rooftop generators to buildings.
1292
    Parameters
1293
    -----------
1294
    mastr_gdf : geopandas.GeoDataFrame
1295
        GeoDataFrame containing MaStR data with geocoded locations.
1296
    buildings_gdf : geopandas.GeoDataFrame
1297
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
1298
        dropped.
1299
    Returns
1300
    -------
1301
    tuple with two geopandas.GeoDataFrame s
1302
        GeoDataFrame containing MaStR data allocated to building IDs.
1303
        GeoDataFrame containing building data allocated to MaStR IDs.
1304
    """
1305
    logger.debug("Starting allocation of status quo.")
1306
1307
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
1308
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
1309
1310
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
1311
        q_mastr_gdf, q_buildings_gdf, SEED
1312
    )
1313
1314
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
1315
1316
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
1317
1318
1319
@timer_func
1320
def grid_districts(
1321
    epsg: int,
1322
) -> gpd.GeoDataFrame:
1323
    """
1324
    Load mv grid district geo data from eGo^n Database as
1325
    geopandas.GeoDataFrame.
1326
    Parameters
1327
    -----------
1328
    epsg : int
1329
        EPSG ID to use as CRS.
1330
    Returns
1331
    -------
1332
    geopandas.GeoDataFrame
1333
        GeoDataFrame containing mv grid district ID and geo shapes data.
1334
    """
1335
    gdf = db.select_geodataframe(
1336
        """
1337
        SELECT bus_id, geom
1338
        FROM grid.egon_mv_grid_district
1339
        ORDER BY bus_id
1340
        """,
1341
        index_col="bus_id",
1342
        geom_col="geom",
1343
        epsg=epsg,
1344
    )
1345
1346
    gdf.index = gdf.index.astype(int)
1347
1348
    logger.debug("Grid districts loaded.")
1349
1350
    return gdf
1351
1352
1353
def scenario_data(
1354
    carrier: str = "solar_rooftop",
1355
    scenario: str = "eGon2035",
1356
) -> pd.DataFrame:
1357
    """
1358
    Get scenario capacity data from eGo^n Database.
1359
    Parameters
1360
    -----------
1361
    carrier : str
1362
        Carrier type to filter table by.
1363
    scenario : str
1364
        Scenario to filter table by.
1365
    Returns
1366
    -------
1367
    geopandas.GeoDataFrame
1368
        GeoDataFrame with scenario capacity data in GW.
1369
    """
1370
    with db.session_scope() as session:
1371
        query = session.query(EgonScenarioCapacities).filter(
1372
            EgonScenarioCapacities.carrier == carrier,
1373
            EgonScenarioCapacities.scenario_name == scenario,
1374
        )
1375
1376
    df = pd.read_sql(
1377
        query.statement, query.session.bind, index_col="index"
1378
    ).sort_index()
1379
1380
    logger.debug("Scenario capacity data loaded.")
1381
1382
    return df
1383
1384
1385 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1386
    __tablename__ = "vg250_lan"
1387
    __table_args__ = {"schema": "boundaries"}
1388
1389
    id = Column(BigInteger, primary_key=True, index=True)
1390
    ade = Column(BigInteger)
1391
    gf = Column(BigInteger)
1392
    bsg = Column(BigInteger)
1393
    ars = Column(String)
1394
    ags = Column(String)
1395
    sdv_ars = Column(String)
1396
    gen = Column(String)
1397
    bez = Column(String)
1398
    ibz = Column(BigInteger)
1399
    bem = Column(String)
1400
    nbd = Column(String)
1401
    sn_l = Column(String)
1402
    sn_r = Column(String)
1403
    sn_k = Column(String)
1404
    sn_v1 = Column(String)
1405
    sn_v2 = Column(String)
1406
    sn_g = Column(String)
1407
    fk_s3 = Column(String)
1408
    nuts = Column(String)
1409
    ars_0 = Column(String)
1410
    ags_0 = Column(String)
1411
    wsk = Column(String)
1412
    debkg_id = Column(String)
1413
    rs = Column(String)
1414
    sdv_rs = Column(String)
1415
    rs_0 = Column(String)
1416
    geometry = Column(Geometry(srid=EPSG), index=True)
1417
1418
1419
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1420
    """
1421
    Get feder state data from eGo^n Database.
1422
    Parameters
1423
    -----------
1424
    to_crs : pyproj.crs.crs.CRS
1425
        CRS to transform geometries to.
1426
    Returns
1427
    -------
1428
    geopandas.GeoDataFrame
1429
        GeoDataFrame with federal state data.
1430
    """
1431
    with db.session_scope() as session:
1432
        query = session.query(
1433
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1434
        )
1435
1436
    gdf = gpd.read_postgis(
1437
        query.statement, query.session.bind, index_col="id"
1438
    ).to_crs(to_crs)
1439
1440
    logger.debug("Federal State data loaded.")
1441
1442
    return gdf
1443
1444
1445
@timer_func
1446
def overlay_grid_districts_with_counties(
1447
    mv_grid_district_gdf: gpd.GeoDataFrame,
1448
    federal_state_gdf: gpd.GeoDataFrame,
1449
) -> gpd.GeoDataFrame:
1450
    """
1451
    Calculate the intersections of mv grid districts and counties.
1452
    Parameters
1453
    -----------
1454
    mv_grid_district_gdf : gpd.GeoDataFrame
1455
        GeoDataFrame containing mv grid district ID and geo shapes data.
1456
    federal_state_gdf : gpd.GeoDataFrame
1457
        GeoDataFrame with federal state data.
1458
    Returns
1459
    -------
1460
    geopandas.GeoDataFrame
1461
        GeoDataFrame containing OSM buildings data.
1462
    """
1463
    logger.debug(
1464
        "Calculating intersection overlay between mv grid districts and "
1465
        "counties. This may take a while..."
1466
    )
1467
1468
    gdf = gpd.overlay(
1469
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1470
        mv_grid_district_gdf.reset_index(),
1471
        how="intersection",
1472
        keep_geom_type=True,
1473
    )
1474
1475
    logger.debug("Done!")
1476
1477
    return gdf
1478
1479
1480
@timer_func
1481
def add_overlay_id_to_buildings(
1482
    buildings_gdf: gpd.GeoDataFrame,
1483
    grid_federal_state_gdf: gpd.GeoDataFrame,
1484
) -> gpd.GeoDataFrame:
1485
    """
1486
    Add information about overlay ID to buildings.
1487
    Parameters
1488
    -----------
1489
    buildings_gdf : geopandas.GeoDataFrame
1490
        GeoDataFrame containing OSM buildings data.
1491
    grid_federal_state_gdf : geopandas.GeoDataFrame
1492
        GeoDataFrame with intersection shapes between counties and grid districts.
1493
    Returns
1494
    -------
1495
    geopandas.GeoDataFrame
1496
        GeoDataFrame containing OSM buildings data with overlay ID added.
1497
    """
1498
    gdf = (
1499
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1500
        .sjoin(
1501
            grid_federal_state_gdf,
1502
            how="left",
1503
            predicate="intersects",
1504
        )
1505
        .rename(columns={"index_right": "overlay_id"})
1506
    )
1507
1508
    logger.debug("Added overlay ID to OSM buildings.")
1509
1510
    return gdf
1511
1512
1513
def drop_buildings_outside_grids(
1514
    buildings_gdf: gpd.GeoDataFrame,
1515
) -> gpd.GeoDataFrame:
1516
    """
1517
    Drop all buildings outside of grid areas.
1518
    Parameters
1519
    -----------
1520
    buildings_gdf : geopandas.GeoDataFrame
1521
        GeoDataFrame containing OSM buildings data.
1522
    Returns
1523
    -------
1524
    gepandas.GeoDataFrame
1525
        GeoDataFrame containing OSM buildings data
1526
        with buildings without an bus ID dropped.
1527
    """
1528
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1529
1530
    logger.debug(
1531
        f"{len(buildings_gdf) - len(gdf)} "
1532
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1533
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1534
        "and are therefore dropped."
1535
    )
1536
1537
    return gdf
1538
1539
1540
def buildings_area_per_overlay_id(
1541
    valid_buildings_gdf: gpd.GeoDataFrame,
1542
    grid_federal_state_gdf: gpd.GeoDataFrame,
1543
) -> pd.DataFrame:
1544
    """
1545
    Calculate total area of building per overlay ID.
1546
    TODO: This is very simplified at the moment. If possible add some kind
1547
     of weights to the area per building to differentiate between single-
1548
     family and multi-family houses as well as agricultural and industrial
1549
     buildings.
1550
    Parameters
1551
    -----------
1552
    valid_buildings_gdf : geopandas.GeoDataFrame
1553
        GeoDataFrame containing OSM buildings data with overlay ID added.
1554
    grid_federal_state_gdf : geopandas.GeoDataFrame
1555
        GeoDataFrame with intersection shapes between counties and grid districts.
1556
    Returns
1557
    -------
1558
    geopandas.GeoDataFrame
1559
        GeoDataFrame with grid data and total buildings area.
1560
    """
1561
    return grid_federal_state_gdf.merge(
1562
        valid_buildings_gdf[
1563
            [
1564
                "building_area",
1565
                "overlay_id",
1566
            ]
1567
        ]
1568
        .groupby("overlay_id")
1569
        .sum(),
1570
        how="left",
1571
        left_index=True,
1572
        right_index=True,
1573
    )
1574
1575
1576
def cap_per_bus_id(
1577
    scenario: str,
1578
    # overlay_gdf: gpd.GeoDataFrame,
1579
    # scenario_df: pd.DataFrame,
1580
    # conversion: int | float = 10**3,
1581
) -> pd.DataFrame:
1582
    """
1583
    Calculate total pv rooftop capacity per grid district dependent on
1584
    available buildings area.
1585
    TODO: This is very simplified at the moment as it assumes an evenly
1586
     distribution of generators depending on the available buildings
1587
     area per grid district.
1588
    Parameters
1589
    -----------
1590
    overlay_gdf : geopandas.GeoDataFrame
1591
        GeoDataFrame with grid data and total buildings area.
1592
    conversion : int, float
1593
        Conversion factor to match units. E.g. MW -> kW
1594
    Returns
1595
    -------
1596
    pandas.DataFrame
1597
        DataFrame with total rooftop capacity per mv grid.
1598
    """
1599
    targets = config.datasets()["solar_rooftop"]["targets"]
1600
1601
    sql = f"""
1602
    SELECT bus as bus_id, p_nom as capacity
1603
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1604
    WHERE carrier = 'solar_rooftop'
1605
    AND scn_name = '{scenario}'
1606
    """
1607
1608
    return db.select_dataframe(sql)
1609
1610
    # overlay_gdf = overlay_gdf.assign(capacity=np.nan)
1611
    #
1612
    # for cap, nuts in scenario_df[["capacity", "nuts"]].itertuples(index=False):
1613
    #     nuts_gdf = overlay_gdf.loc[overlay_gdf.nuts == nuts]
1614
    #
1615
    #     capacity = nuts_gdf.building_area.multiply(
1616
    #         cap / nuts_gdf.building_area.sum()
1617
    #     )
1618
    #
1619
    #     overlay_gdf.loc[nuts_gdf.index] = overlay_gdf.loc[
1620
    #         nuts_gdf.index
1621
    #     ].assign(capacity=capacity.multiply(conversion).to_numpy())
1622
    #
1623
    # return overlay_gdf[["bus_id", "capacity"]].groupby("bus_id").sum()
1624
1625
1626
def determine_end_of_life_gens(
1627
    mastr_gdf: gpd.GeoDataFrame,
1628
    scenario_timestamp: pd.Timestamp,
1629
    pv_rooftop_lifetime: pd.Timedelta,
1630
) -> gpd.GeoDataFrame:
1631
    """
1632
    Determine if an old PV system has reached its end of life.
1633
    Parameters
1634
    -----------
1635
    mastr_gdf : geopandas.GeoDataFrame
1636
        GeoDataFrame containing geocoded MaStR data.
1637
    scenario_timestamp : pandas.Timestamp
1638
        Timestamp at which the scenario takes place.
1639
    pv_rooftop_lifetime : pandas.Timedelta
1640
        Average expected lifetime of PV rooftop systems.
1641
    Returns
1642
    -------
1643
    geopandas.GeoDataFrame
1644
        GeoDataFrame containing geocoded MaStR data and info if the system
1645
        has reached its end of life.
1646
    """
1647
    mastr_gdf = mastr_gdf.assign(
1648
        age=scenario_timestamp - mastr_gdf.start_up_date
1649
    )
1650
1651
    logger.debug("Determined if pv rooftop systems reached their end of life.")
1652
1653
    return mastr_gdf.assign(end_of_life=pv_rooftop_lifetime < mastr_gdf.age)
1654
1655
1656
def calculate_max_pv_cap_per_building(
1657
    buildings_gdf: gpd.GeoDataFrame,
1658
    mastr_gdf: gpd.GeoDataFrame,
1659
    pv_cap_per_sq_m: float | int,
1660
    roof_factor: float | int,
1661
) -> gpd.GeoDataFrame:
1662
    """
1663
    Calculate the estimated maximum possible PV capacity per building.
1664
    Parameters
1665
    -----------
1666
    buildings_gdf : geopandas.GeoDataFrame
1667
        GeoDataFrame containing OSM buildings data.
1668
    mastr_gdf : geopandas.GeoDataFrame
1669
        GeoDataFrame containing geocoded MaStR data.
1670
    pv_cap_per_sq_m : float, int
1671
        Average expected, installable PV capacity per square meter.
1672
    roof_factor : float, int
1673
        Average for PV usable roof area share.
1674
    Returns
1675
    -------
1676
    geopandas.GeoDataFrame
1677
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1678
        capacity.
1679
    """
1680
    gdf = (
1681
        buildings_gdf.reset_index()
1682
        .merge(
1683
            mastr_gdf[
1684
                [
1685
                    "capacity",
1686
                    "end_of_life",
1687
                    "building_id",
1688
                    "EinheitlicheAusrichtungUndNeigungswinkel",
1689
                    "Hauptausrichtung",
1690
                    "HauptausrichtungNeigungswinkel",
1691
                ]
1692
            ],
1693
            how="left",
1694
            left_on="id",
1695
            right_on="building_id",
1696
        )
1697
        .set_index("id")
1698
        .drop(columns="building_id")
1699
    )
1700
1701
    return gdf.assign(
1702
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1703
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1704
        bus_id=gdf.bus_id.astype(int),
1705
    )
1706
1707
1708
def calculate_building_load_factor(
1709
    mastr_gdf: gpd.GeoDataFrame,
1710
    buildings_gdf: gpd.GeoDataFrame,
1711
    rounding: int = 4,
1712
) -> gpd.GeoDataFrame:
1713
    """
1714
    Calculate the roof load factor from existing PV systems.
1715
    Parameters
1716
    -----------
1717
    mastr_gdf : geopandas.GeoDataFrame
1718
        GeoDataFrame containing geocoded MaStR data.
1719
    buildings_gdf : geopandas.GeoDataFrame
1720
        GeoDataFrame containing OSM buildings data.
1721
    rounding : int
1722
        Rounding to use for load factor.
1723
    Returns
1724
    -------
1725
    geopandas.GeoDataFrame
1726
        GeoDataFrame containing geocoded MaStR data with calculated load factor.
1727
    """
1728
    gdf = mastr_gdf.merge(
1729
        buildings_gdf[["max_cap", "building_area"]].loc[
1730
            ~buildings_gdf["max_cap"].isna()
1731
        ],
1732
        how="left",
1733
        left_on="building_id",
1734
        right_index=True,
1735
    )
1736
1737
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1738
1739
1740
def get_probability_for_property(
1741
    mastr_gdf: gpd.GeoDataFrame,
1742
    cap_range: tuple[int | float, int | float],
1743
    prop: str,
1744
) -> tuple[np.array, np.array]:
1745
    """
1746
    Calculate the probability of the different options of a property of the
1747
    existing PV plants.
1748
    Parameters
1749
    -----------
1750
    mastr_gdf : geopandas.GeoDataFrame
1751
        GeoDataFrame containing geocoded MaStR data.
1752
    cap_range : tuple(int, int)
1753
        Capacity range of PV plants to look at.
1754
    prop : str
1755
        Property to calculate probabilities for. String needs to be in columns
1756
        of mastr_gdf.
1757
    Returns
1758
    -------
1759
    tuple
1760
        numpy.array
1761
            Unique values of property.
1762
        numpy.array
1763
            Probabilties per unique value.
1764
    """
1765
    cap_range_gdf = mastr_gdf.loc[
1766
        (mastr_gdf.capacity > cap_range[0])
1767
        & (mastr_gdf.capacity <= cap_range[1])
1768
    ]
1769
1770
    if prop == "load_factor":
1771
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1772
1773
    count = Counter(
1774
        cap_range_gdf[prop].loc[
1775
            ~cap_range_gdf[prop].isna()
1776
            & ~cap_range_gdf[prop].isnull()
1777
            & ~(cap_range_gdf[prop] == "None")
1778
        ]
1779
    )
1780
1781
    values = np.array(list(count.keys()))
1782
    probabilities = np.fromiter(count.values(), dtype=float)
1783
    probabilities = probabilities / np.sum(probabilities)
1784
1785
    return values, probabilities
1786
1787
1788
@timer_func
1789
def probabilities(
1790
    mastr_gdf: gpd.GeoDataFrame,
1791
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1792
    properties: list[str] | None = None,
1793
) -> dict:
1794
    """
1795
    Calculate the probability of the different options of properties of the
1796
    existing PV plants.
1797
    Parameters
1798
    -----------
1799
    mastr_gdf : geopandas.GeoDataFrame
1800
        GeoDataFrame containing geocoded MaStR data.
1801
    cap_ranges : list(tuple(int, int))
1802
        List of capacity ranges to distinguish between. The first tuple should
1803
        start with a zero and the last one should end with infinite.
1804
    properties : list(str)
1805
        List of properties to calculate probabilities for. Strings needs to be
1806
        in columns of mastr_gdf.
1807
    Returns
1808
    -------
1809
    dict
1810
        Dictionary with values and probabilities per capacity range.
1811
    """
1812
    if cap_ranges is None:
1813
        cap_ranges = [
1814
            (0, 30),
1815
            (30, 100),
1816
            (100, float("inf")),
1817
        ]
1818
    if properties is None:
1819
        properties = [
1820
            "EinheitlicheAusrichtungUndNeigungswinkel",
1821
            "Hauptausrichtung",
1822
            "HauptausrichtungNeigungswinkel",
1823
            "load_factor",
1824
        ]
1825
1826
    prob_dict = {}
1827
1828
    for cap_range in cap_ranges:
1829
        prob_dict[cap_range] = {
1830
            "values": {},
1831
            "probabilities": {},
1832
        }
1833
1834
        for prop in properties:
1835
            v, p = get_probability_for_property(
1836
                mastr_gdf,
1837
                cap_range,
1838
                prop,
1839
            )
1840
1841
            prob_dict[cap_range]["values"][prop] = v
1842
            prob_dict[cap_range]["probabilities"][prop] = p
1843
1844
    return prob_dict
1845
1846
1847
def cap_share_per_cap_range(
1848
    mastr_gdf: gpd.GeoDataFrame,
1849
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1850
) -> dict[tuple[int | float, int | float], float]:
1851
    """
1852
    Calculate the share of PV capacity from the total PV capacity within
1853
    capacity ranges.
1854
    Parameters
1855
    -----------
1856
    mastr_gdf : geopandas.GeoDataFrame
1857
        GeoDataFrame containing geocoded MaStR data.
1858
    cap_ranges : list(tuple(int, int))
1859
        List of capacity ranges to distinguish between. The first tuple should
1860
        start with a zero and the last one should end with infinite.
1861
    Returns
1862
    -------
1863
    dict
1864
        Dictionary with share of PV capacity from the total PV capacity within
1865
        capacity ranges.
1866
    """
1867
    if cap_ranges is None:
1868
        cap_ranges = [
1869
            (0, 30),
1870
            (30, 100),
1871
            (100, float("inf")),
1872
        ]
1873
1874
    cap_share_dict = {}
1875
1876
    total_cap = mastr_gdf.capacity.sum()
1877
1878
    for cap_range in cap_ranges:
1879
        cap_share = (
1880
            mastr_gdf.loc[
1881
                (mastr_gdf.capacity > cap_range[0])
1882
                & (mastr_gdf.capacity <= cap_range[1])
1883
            ].capacity.sum()
1884
            / total_cap
1885
        )
1886
1887
        cap_share_dict[cap_range] = cap_share
1888
1889
    return cap_share_dict
1890
1891
1892
def mean_load_factor_per_cap_range(
1893
    mastr_gdf: gpd.GeoDataFrame,
1894
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1895
) -> dict[tuple[int | float, int | float], float]:
1896
    """
1897
    Calculate the mean roof load factor per capacity range from existing PV
1898
    plants.
1899
    Parameters
1900
    -----------
1901
    mastr_gdf : geopandas.GeoDataFrame
1902
        GeoDataFrame containing geocoded MaStR data.
1903
    cap_ranges : list(tuple(int, int))
1904
        List of capacity ranges to distinguish between. The first tuple should
1905
        start with a zero and the last one should end with infinite.
1906
    Returns
1907
    -------
1908
    dict
1909
        Dictionary with mean roof load factor per capacity range.
1910
    """
1911
    if cap_ranges is None:
1912
        cap_ranges = [
1913
            (0, 30),
1914
            (30, 100),
1915
            (100, float("inf")),
1916
        ]
1917
1918
    load_factor_dict = {}
1919
1920
    for cap_range in cap_ranges:
1921
        load_factor = mastr_gdf.loc[
1922
            (mastr_gdf.load_factor <= 1)
1923
            & (mastr_gdf.capacity > cap_range[0])
1924
            & (mastr_gdf.capacity <= cap_range[1])
1925
        ].load_factor.mean()
1926
1927
        load_factor_dict[cap_range] = load_factor
1928
1929
    return load_factor_dict
1930
1931
1932
def building_area_range_per_cap_range(
1933
    mastr_gdf: gpd.GeoDataFrame,
1934
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1935
    min_building_size: int | float = 10.0,
1936
    upper_quantile: float = 0.95,
1937
    lower_quantile: float = 0.05,
1938
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1939
    """
1940
    Estimate normal building area range per capacity range.
1941
    Calculate the mean roof load factor per capacity range from existing PV
1942
    plants.
1943
    Parameters
1944
    -----------
1945
    mastr_gdf : geopandas.GeoDataFrame
1946
        GeoDataFrame containing geocoded MaStR data.
1947
    cap_ranges : list(tuple(int, int))
1948
        List of capacity ranges to distinguish between. The first tuple should
1949
        start with a zero and the last one should end with infinite.
1950
    min_building_size : int, float
1951
        Minimal building size to consider for PV plants.
1952
    upper_quantile : float
1953
        Upper quantile to estimate maximum building size per capacity range.
1954
    lower_quantile : float
1955
        Lower quantile to estimate minimum building size per capacity range.
1956
    Returns
1957
    -------
1958
    dict
1959
        Dictionary with estimated normal building area range per capacity
1960
        range.
1961
    """
1962
    if cap_ranges is None:
1963
        cap_ranges = [
1964
            (0, 30),
1965
            (30, 100),
1966
            (100, float("inf")),
1967
        ]
1968
1969
    building_area_range_dict = {}
1970
1971
    n_ranges = len(cap_ranges)
1972
1973
    for count, cap_range in enumerate(cap_ranges):
1974
        cap_range_gdf = mastr_gdf.loc[
1975
            (mastr_gdf.capacity > cap_range[0])
1976
            & (mastr_gdf.capacity <= cap_range[1])
1977
        ]
1978
1979
        if count == 0:
1980
            building_area_range_dict[cap_range] = (
1981
                min_building_size,
1982
                cap_range_gdf.building_area.quantile(upper_quantile),
1983
            )
1984
        elif count == n_ranges - 1:
1985
            building_area_range_dict[cap_range] = (
1986
                cap_range_gdf.building_area.quantile(lower_quantile),
1987
                float("inf"),
1988
            )
1989
        else:
1990
            building_area_range_dict[cap_range] = (
1991
                cap_range_gdf.building_area.quantile(lower_quantile),
1992
                cap_range_gdf.building_area.quantile(upper_quantile),
1993
            )
1994
1995
    values = list(building_area_range_dict.values())
1996
1997
    building_area_range_normed_dict = {}
1998
1999
    for count, (cap_range, (min_area, max_area)) in enumerate(
2000
        building_area_range_dict.items()
2001
    ):
2002
        if count == 0:
2003
            building_area_range_normed_dict[cap_range] = (
2004
                min_area,
2005
                np.mean((values[count + 1][0], max_area)),
2006
            )
2007
        elif count == n_ranges - 1:
2008
            building_area_range_normed_dict[cap_range] = (
2009
                np.mean((values[count - 1][1], min_area)),
2010
                max_area,
2011
            )
2012
        else:
2013
            building_area_range_normed_dict[cap_range] = (
2014
                np.mean((values[count - 1][1], min_area)),
2015
                np.mean((values[count + 1][0], max_area)),
2016
            )
2017
2018
    return building_area_range_normed_dict
2019
2020
2021
@timer_func
2022
def desaggregate_pv_in_mv_grid(
2023
    buildings_gdf: gpd.GeoDataFrame,
2024
    pv_cap: float | int,
2025
    **kwargs,
2026
) -> gpd.GeoDataFrame:
2027
    """
2028
    Desaggregate PV capacity on buildings within a given grid district.
2029
    Parameters
2030
    -----------
2031
    buildings_gdf : geopandas.GeoDataFrame
2032
        GeoDataFrame containing buildings within the grid district.
2033
    pv_cap : float, int
2034
        PV capacity to desaggregate.
2035
    Other Parameters
2036
    -----------
2037
    prob_dict : dict
2038
        Dictionary with values and probabilities per capacity range.
2039
    cap_share_dict : dict
2040
        Dictionary with share of PV capacity from the total PV capacity within
2041
        capacity ranges.
2042
    building_area_range_dict : dict
2043
        Dictionary with estimated normal building area range per capacity
2044
        range.
2045
    load_factor_dict : dict
2046
        Dictionary with mean roof load factor per capacity range.
2047
    seed : int
2048
        Seed to use for random operations with NumPy and pandas.
2049
    pv_cap_per_sq_m : float, int
2050
        Average expected, installable PV capacity per square meter.
2051
    Returns
2052
    -------
2053
    geopandas.GeoDataFrame
2054
        GeoDataFrame containing OSM building data with desaggregated PV
2055
        plants.
2056
    """
2057
    bus_id = int(buildings_gdf.bus_id.iat[0])
2058
2059
    rng = default_rng(seed=kwargs["seed"])
2060
    random_state = RandomState(seed=kwargs["seed"])
2061
2062
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
2063
2064
    for cap_range, share in kwargs["cap_share_dict"].items():
2065
        pv_cap_range = pv_cap * share
2066
2067
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
2068
2069
        cap_range_buildings_gdf = buildings_gdf.loc[
2070
            ~buildings_gdf.index.isin(results_df.index)
2071
            & (buildings_gdf.building_area > b_area_min)
2072
            & (buildings_gdf.building_area <= b_area_max)
2073
        ]
2074
2075
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
2076
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2077
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2078
            load_factor=np.nan,
2079
            capacity=np.nan,
2080
        )
2081
2082
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2083
2084
        if total_mean_cap == 0:
2085
            logger.warning(
2086
                f"There are no matching roof for capacity range {cap_range} "
2087
                f"kW in grid {bus_id}. Using all buildings as fallback."
2088
            )
2089
2090
            cap_range_buildings_gdf = buildings_gdf.loc[
2091
                ~buildings_gdf.index.isin(results_df.index)
2092
            ]
2093
2094
            if len(cap_range_buildings_gdf) == 0:
2095
                logger.warning(
2096
                    "There are no roofes available for capacity range "
2097
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
2098
                )
2099
                cap_range_buildings_gdf = buildings_gdf.copy()
2100
2101
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2102
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2103
                load_factor=np.nan,
2104
                capacity=np.nan,
2105
            )
2106
2107
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2108
2109
        elif total_mean_cap < pv_cap_range:
2110
            logger.warning(
2111
                f"Average roof utilization of the roof area in grid {bus_id} "
2112
                f"and capacity range {cap_range} kW is not sufficient. The "
2113
                "roof utilization will be above average."
2114
            )
2115
2116
        frac = max(
2117
            pv_cap_range / total_mean_cap,
2118
            1 / len(cap_range_buildings_gdf),
2119
        )
2120
2121
        samples_gdf = cap_range_buildings_gdf.sample(
2122
            frac=min(1, frac),
2123
            random_state=random_state,
2124
        )
2125
2126
        cap_range_dict = kwargs["prob_dict"][cap_range]
2127
2128
        values_dict = cap_range_dict["values"]
2129
        p_dict = cap_range_dict["probabilities"]
2130
2131
        load_factors = rng.choice(
2132
            a=values_dict["load_factor"],
2133
            size=len(samples_gdf),
2134
            p=p_dict["load_factor"],
2135
        )
2136
2137
        samples_gdf = samples_gdf.assign(
2138
            load_factor=load_factors,
2139
            capacity=samples_gdf.building_area
2140
            * load_factors
2141
            * kwargs["pv_cap_per_sq_m"],
2142
        )
2143
2144
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
2145
2146
        samples_gdf = samples_gdf.assign(
2147
            capacity=(samples_gdf.capacity * missing_factor),
2148
            load_factor=(samples_gdf.load_factor * missing_factor),
2149
        )
2150
2151
        assert np.isclose(
2152
            samples_gdf.capacity.sum(),
2153
            pv_cap_range,
2154
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
2155
2156
        results_df = pd.concat(
2157
            [
2158
                results_df,
2159
                samples_gdf,
2160
            ],
2161
        )
2162
2163
    assert np.isclose(
2164
        results_df.capacity.sum(),
2165
        pv_cap,
2166
    ), f"{results_df.capacity.sum()} != {pv_cap}"
2167
2168
    return gpd.GeoDataFrame(
2169
        results_df,
2170
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 2064 is not entered. Are you sure this can never be the case?
Loading history...
2171
        geometry="geom",
2172
    )
2173
2174
2175
@timer_func
2176
def desaggregate_pv(
2177
    buildings_gdf: gpd.GeoDataFrame,
2178
    cap_df: pd.DataFrame,
2179
    **kwargs,
2180
) -> gpd.GeoDataFrame:
2181
    """
2182
    Desaggregate PV capacity on buildings within a given grid district.
2183
    Parameters
2184
    -----------
2185
    buildings_gdf : geopandas.GeoDataFrame
2186
        GeoDataFrame containing OSM buildings data.
2187
    cap_df : pandas.DataFrame
2188
        DataFrame with total rooftop capacity per mv grid.
2189
    Other Parameters
2190
    -----------
2191
    prob_dict : dict
2192
        Dictionary with values and probabilities per capacity range.
2193
    cap_share_dict : dict
2194
        Dictionary with share of PV capacity from the total PV capacity within
2195
        capacity ranges.
2196
    building_area_range_dict : dict
2197
        Dictionary with estimated normal building area range per capacity
2198
        range.
2199
    load_factor_dict : dict
2200
        Dictionary with mean roof load factor per capacity range.
2201
    seed : int
2202
        Seed to use for random operations with NumPy and pandas.
2203
    pv_cap_per_sq_m : float, int
2204
        Average expected, installable PV capacity per square meter.
2205
    Returns
2206
    -------
2207
    geopandas.GeoDataFrame
2208
        GeoDataFrame containing OSM building data with desaggregated PV
2209
        plants.
2210
    """
2211
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
2212
2213
    for bus_id in buildings_gdf.bus_id.unique():
2214
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
2215
2216
        pv_installed_gdf = buildings_grid_gdf.loc[
2217
            ~buildings_grid_gdf.end_of_life
2218
        ]
2219
2220
        pv_installed = pv_installed_gdf.capacity.sum()
2221
2222
        pot_buildings_gdf = buildings_grid_gdf.loc[
2223
            ~buildings_grid_gdf.index.isin(pv_installed_gdf.index)
2224
        ]
2225
2226
        if len(pot_buildings_gdf) == 0:
2227
            logger.error(
2228
                f"In grid {bus_id} there are no potential buildings to allocate "
2229
                "PV capacity to. The grid is skipped. This message should only "
2230
                "appear doing test runs with few buildings."
2231
            )
2232
2233
            continue
2234
2235
        pv_target = cap_df.at[
2236
            bus_id,
2237
            "capacity",
2238
        ]
2239
2240
        pv_missing = pv_target - pv_installed
2241
2242
        if pv_missing <= 0:
2243
            logger.info(
2244
                f"In grid {bus_id} there is more PV installed in status Quo than"
2245
                " allocated within the scenario. No new generators are added."
2246
            )
2247
2248
            continue
2249
2250
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
2251
            logger.error(
2252
                f"In grid {bus_id} there is less PV potential ("
2253
                f"{pot_buildings_gdf.max_cap.sum():g} kW) than allocated PV "
2254
                f"capacity ({pv_missing:g} kW). The grid is skipped. This message "
2255
                "should only appear doing test runs with few buildings."
2256
            )
2257
2258
            continue
2259
2260
        gdf = desaggregate_pv_in_mv_grid(
2261
            buildings_gdf=pot_buildings_gdf,
2262
            pv_cap=pv_missing,
2263
            **kwargs,
2264
        )
2265
2266
        allocated_buildings_gdf = pd.concat(
2267
            [
2268
                allocated_buildings_gdf,
2269
                gdf,
2270
            ]
2271
        )
2272
2273
    logger.debug("Desaggregated scenario.")
2274
2275
    return gpd.GeoDataFrame(
2276
        allocated_buildings_gdf,
2277
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
2278
        geometry="geom",
2279
    )
2280
2281
2282
@timer_func
2283
def add_buildings_meta_data(
2284
    buildings_gdf: gpd.GeoDataFrame,
2285
    prob_dict: dict,
2286
    seed: int,
2287
) -> gpd.GeoDataFrame:
2288
    """
2289
    Randomly add additional metadata to desaggregated PV plants.
2290
    Parameters
2291
    -----------
2292
    buildings_gdf : geopandas.GeoDataFrame
2293
        GeoDataFrame containing OSM buildings data with desaggregated PV
2294
        plants.
2295
    prob_dict : dict
2296
        Dictionary with values and probabilities per capacity range.
2297
    seed : int
2298
        Seed to use for random operations with NumPy and pandas.
2299
    Returns
2300
    -------
2301
    geopandas.GeoDataFrame
2302
        GeoDataFrame containing OSM building data with desaggregated PV
2303
        plants.
2304
    """
2305
    rng = default_rng(seed=seed)
2306
    buildings_gdf = buildings_gdf.reset_index().rename(
2307
        columns={
2308
            "index": "building_id",
2309
        }
2310
    )
2311
2312
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
2313
        cap_range_gdf = buildings_gdf.loc[
2314
            (buildings_gdf.capacity >= min_cap)
2315
            & (buildings_gdf.capacity < max_cap)
2316
        ]
2317
2318
        for key, values in cap_range_prob_dict["values"].items():
2319
            if key == "load_factor":
2320
                continue
2321
2322
            gdf = cap_range_gdf.loc[
2323
                cap_range_gdf[key].isna()
2324
                | cap_range_gdf[key].isnull()
2325
                | (cap_range_gdf[key] == "None")
2326
            ]
2327
2328
            key_vals = rng.choice(
2329
                a=values,
2330
                size=len(gdf),
2331
                p=cap_range_prob_dict["probabilities"][key],
2332
            )
2333
2334
            buildings_gdf.loc[gdf.index, key] = key_vals
2335
2336
    return buildings_gdf
2337
2338
2339
def add_voltage_level(
2340
    buildings_gdf: gpd.GeoDataFrame,
2341
) -> gpd.GeoDataFrame:
2342
    """
2343
    Add voltage level derived from generator capacity to the power plants.
2344
    Parameters
2345
    -----------
2346
    buildings_gdf : geopandas.GeoDataFrame
2347
        GeoDataFrame containing OSM buildings data with desaggregated PV
2348
        plants.
2349
    Returns
2350
    -------
2351
    geopandas.GeoDataFrame
2352
        GeoDataFrame containing OSM building data with voltage level per generator.
2353
    """
2354
2355
    def voltage_levels(p: float) -> int:
2356
        if p < 100:
2357
            return 7
2358
        elif p < 200:
2359
            return 6
2360
        elif p < 5500:
2361
            return 5
2362
        elif p < 20000:
2363
            return 4
2364
        elif p < 120000:
2365
            return 3
2366
        return 1
2367
2368
    return buildings_gdf.assign(
2369
        voltage_level=buildings_gdf.capacity.apply(voltage_levels)
2370
    )
2371
2372
2373
def add_start_up_date(
2374
    buildings_gdf: gpd.GeoDataFrame,
2375
    start: pd.Timestamp,
2376
    end: pd.Timestamp,
2377
    seed: int,
2378
):
2379
    """
2380
    Randomly and linear add start-up date to new pv generators.
2381
    Parameters
2382
    ----------
2383
    buildings_gdf : geopandas.GeoDataFrame
2384
        GeoDataFrame containing OSM buildings data with desaggregated PV
2385
        plants.
2386
    start : pandas.Timestamp
2387
        Minimum Timestamp to use.
2388
    end : pandas.Timestamp
2389
        Maximum Timestamp to use.
2390
    seed : int
2391
        Seed to use for random operations with NumPy and pandas.
2392
    Returns
2393
    -------
2394
    geopandas.GeoDataFrame
2395
        GeoDataFrame containing OSM buildings data with start-up date added.
2396
    """
2397
    rng = default_rng(seed=seed)
2398
2399
    date_range = pd.date_range(start=start, end=end, freq="1D")
2400
2401
    return buildings_gdf.assign(
2402
        start_up_date=rng.choice(date_range, size=len(buildings_gdf))
2403
    )
2404
2405
2406
@timer_func
2407
def allocate_scenarios(
2408
    mastr_gdf: gpd.GeoDataFrame,
2409
    buildings_gdf: gpd.GeoDataFrame,
2410
    last_scenario_gdf: gpd.GeoDataFrame,
2411
    scenario: str,
2412
):
2413
    """
2414
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2415
    Parameters
2416
    ----------
2417
    mastr_gdf : geopandas.GeoDataFrame
2418
        GeoDataFrame containing geocoded MaStR data.
2419
    buildings_gdf : geopandas.GeoDataFrame
2420
        GeoDataFrame containing OSM buildings data.
2421
    last_scenario_gdf : geopandas.GeoDataFrame
2422
        GeoDataFrame containing OSM buildings matched with pv generators from temporal
2423
        preceding scenario.
2424
    scenario : str
2425
        Scenario to desaggrgate and allocate.
2426
    Returns
2427
    -------
2428
    tuple
2429
        geopandas.GeoDataFrame
2430
            GeoDataFrame containing OSM buildings matched with pv generators.
2431
        pandas.DataFrame
2432
            DataFrame containingpv rooftop capacity per grid id.
2433
    """
2434
    grid_districts_gdf = grid_districts(EPSG)
2435
2436
    # federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
2437
    #
2438
    # grid_federal_state_gdf = overlay_grid_districts_with_counties(
2439
    #     grid_districts_gdf,
2440
    #     federal_state_gdf,
2441
    # )
2442
    #
2443
    # buildings_overlay_gdf = add_overlay_id_to_buildings(
2444
    #     buildings_gdf,
2445
    #     grid_federal_state_gdf,
2446
    # )
2447
2448
    valid_buildings_gdf = drop_buildings_outside_grids(
2449
        buildings_gdf
2450
    )  # buildings_overlay_gdf)
2451
2452
    # buildings_area_per_overlay_gdf = buildings_area_per_overlay_id(
2453
    #     valid_buildings_gdf,
2454
    #     grid_federal_state_gdf,
2455
    # )
2456
2457
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2458
    #     buildings_area_per_overlay_gdf,
2459
    #     scenario_data(CARRIER, scenario),
2460
    # )
2461
2462
    last_scenario_gdf = determine_end_of_life_gens(
2463
        last_scenario_gdf,
2464
        SCENARIO_TIMESTAMP[scenario],
2465
        PV_ROOFTOP_LIFETIME,
2466
    )
2467
2468
    buildings_gdf = calculate_max_pv_cap_per_building(
2469
        valid_buildings_gdf,
2470
        last_scenario_gdf,
2471
        PV_CAP_PER_SQ_M,
2472
        ROOF_FACTOR,
2473
    )
2474
2475
    mastr_gdf = calculate_building_load_factor(
2476
        mastr_gdf,
2477
        buildings_gdf,
2478
    )
2479
2480
    probabilities_dict = probabilities(
2481
        mastr_gdf,
2482
        cap_ranges=CAP_RANGES,
2483
    )
2484
2485
    cap_share_dict = cap_share_per_cap_range(
2486
        mastr_gdf,
2487
        cap_ranges=CAP_RANGES,
2488
    )
2489
2490
    load_factor_dict = mean_load_factor_per_cap_range(
2491
        mastr_gdf,
2492
        cap_ranges=CAP_RANGES,
2493
    )
2494
2495
    building_area_range_dict = building_area_range_per_cap_range(
2496
        mastr_gdf,
2497
        cap_ranges=CAP_RANGES,
2498
        min_building_size=MIN_BUILDING_SIZE,
2499
        upper_quantile=UPPER_QUNATILE,
2500
        lower_quantile=LOWER_QUANTILE,
2501
    )
2502
2503
    allocated_buildings_gdf = desaggregate_pv(
2504
        buildings_gdf=buildings_gdf,
2505
        cap_df=cap_per_bus_id_df,
2506
        prob_dict=probabilities_dict,
2507
        cap_share_dict=cap_share_dict,
2508
        building_area_range_dict=building_area_range_dict,
2509
        load_factor_dict=load_factor_dict,
2510
        seed=SEED,
2511
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2512
    )
2513
2514
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2515
2516
    meta_buildings_gdf = frame_to_numeric(
2517
        add_buildings_meta_data(
2518
            allocated_buildings_gdf,
2519
            probabilities_dict,
2520
            SEED,
2521
        )
2522
    )
2523
2524
    return (
2525
        add_start_up_date(
2526
            meta_buildings_gdf,
2527
            start=last_scenario_gdf.start_up_date.max(),
2528
            end=SCENARIO_TIMESTAMP[scenario],
2529
            seed=SEED,
2530
        ),
2531
        cap_per_bus_id_df,
2532
    )
2533
2534
2535
class EgonPowerPlantPvRoofBuildingScenario(Base):
2536
    __tablename__ = "egon_power_plants_pv_roof_building"
2537
    __table_args__ = {"schema": "supply"}
2538
2539
    index = Column(Integer, primary_key=True, index=True)
2540
    scenario = Column(String)
2541
    building_id = Column(Integer)
2542
    gens_id = Column(String, nullable=True)
2543
    capacity = Column(Float)
2544
    einheitliche_ausrichtung_und_neigungswinkel = Column(Float)
2545
    hauptausrichtung = Column(String)
2546
    hauptausrichtung_neigungswinkel = Column(String)
2547
    voltage_level = Column(Integer)
2548
2549
2550
def create_scenario_table(buildings_gdf):
2551
    """Create mapping table pv_unit <-> building for scenario"""
2552
    EgonPowerPlantPvRoofBuildingScenario.__table__.drop(
2553
        bind=engine, checkfirst=True
2554
    )
2555
    EgonPowerPlantPvRoofBuildingScenario.__table__.create(
2556
        bind=engine, checkfirst=True
2557
    )
2558
2559
    buildings_gdf.rename(columns=COLS_TO_RENAME)[
2560
        COLS_TO_EXPORT
2561
    ].reset_index().to_sql(
2562
        name=EgonPowerPlantPvRoofBuildingScenario.__table__.name,
2563
        schema=EgonPowerPlantPvRoofBuildingScenario.__table__.schema,
2564
        con=db.engine(),
2565
        if_exists="append",
2566
        index=False,
2567
    )
2568
2569
2570
def geocode_mastr_data():
2571
    """
2572
    Read PV rooftop data from MaStR CSV
2573
    TODO: the source will be replaced as soon as the MaStR data is available
2574
     in DB.
2575
    """
2576
    mastr_df = mastr_data(
2577
        MASTR_INDEX_COL,
2578
        MASTR_RELEVANT_COLS,
2579
        MASTR_DTYPES,
2580
        MASTR_PARSE_DATES,
2581
    )
2582
2583
    clean_mastr_df = clean_mastr_data(
2584
        mastr_df,
2585
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
2586
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
2587
        seed=SEED,
2588
        rounding=ROUNDING,
2589
    )
2590
2591
    geocoding_df = geocoding_data(clean_mastr_df)
2592
2593
    ratelimiter = geocoder(USER_AGENT, MIN_DELAY_SECONDS)
2594
2595
    geocode_gdf = geocode_data(geocoding_df, ratelimiter, EPSG)
2596
2597
    create_geocoded_table(geocode_gdf)
2598
2599
2600
def pv_rooftop_to_buildings():
2601
    """Main script, executed as task"""
2602
2603
    mastr_gdf = load_mastr_data()
2604
2605
    buildings_gdf = load_building_data()
2606
2607
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2608
        mastr_gdf, buildings_gdf
2609
    )
2610
2611
    all_buildings_gdf = (
2612
        desagg_mastr_gdf.assign(scenario="status_quo")
2613
        .reset_index()
2614
        .rename(columns={"geometry": "geom", "EinheitMastrNummer": "gens_id"})
2615
    )
2616
2617
    scenario_buildings_gdf = all_buildings_gdf.copy()
2618
2619
    cap_per_bus_id_df = pd.DataFrame()
2620
2621
    for scenario in SCENARIOS:
2622
        logger.debug(f"Desaggregating scenario {scenario}.")
2623
        (
2624
            scenario_buildings_gdf,
2625
            cap_per_bus_id_scenario_df,
2626
        ) = allocate_scenarios(  # noqa: F841
2627
            desagg_mastr_gdf,
2628
            desagg_buildings_gdf,
2629
            scenario_buildings_gdf,
2630
            scenario,
2631
        )
2632
2633
        all_buildings_gdf = gpd.GeoDataFrame(
2634
            pd.concat(
2635
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2636
            ),
2637
            crs=scenario_buildings_gdf.crs,
2638
            geometry="geom",
2639
        )
2640
2641
        cap_per_bus_id_df = pd.concat(
2642
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2643
        )
2644
2645
    # export scenario
2646
    create_scenario_table(add_voltage_level(all_buildings_gdf))
2647