Passed
Pull Request — dev (#934)
by
unknown
01:38
created

data.datasets.power_plants.pv_rooftop_buildings   F

Complexity

Total Complexity 117

Size/Duplication

Total Lines 2660
Duplicated Lines 1.2 %

Importance

Changes 0
Metric Value
wmc 117
eloc 1195
dl 32
loc 2660
rs 0.82
c 0
b 0
f 0

52 Functions

Rating   Name   Duplication   Size   Complexity  
A get_probability_for_property() 0 46 2
A calculate_building_load_factor() 0 30 1
A determine_end_of_life_gens() 0 28 1
A calculate_max_pv_cap_per_building() 0 49 1
C building_area_range_per_cap_range() 0 87 8
A mean_load_factor_per_cap_range() 0 38 3
A cap_share_per_cap_range() 0 43 3
B probabilities() 0 57 5
A geocode_mastr_data() 0 28 1
A pv_rooftop_to_buildings() 0 47 2
A create_scenario_table() 0 17 1
A add_start_up_date() 0 30 1
B add_voltage_level() 0 31 6
A add_buildings_meta_data() 0 55 4
B allocate_scenarios() 0 120 1
A egon_building_peak_loads() 0 9 1
A load_building_data() 0 65 3
A synthetic_buildings() 0 25 2
A geocoding_data() 0 18 1
A geocoded_data_from_db() 0 23 2
A add_overlay_id_to_buildings() 0 31 1
A cap_per_bus_id() 0 25 1
B desaggregate_pv_in_mv_grid() 0 161 5
A add_ags_to_gens() 0 24 1
A drop_buildings_outside_muns() 0 25 1
B clean_mastr_data() 0 143 2
A mastr_data() 0 60 2
A scenario_data() 0 30 2
A geocode_data() 0 46 3
B zip_and_municipality_from_standort() 0 50 7
A merge_geocode_with_mastr() 0 27 1
A drop_buildings_outside_grids() 0 25 1
A municipality_data() 0 14 2
A osm_buildings() 0 25 2
A add_ags_to_buildings() 0 24 1
A timer_func() 0 13 1
C allocate_pv() 0 125 7
A sort_and_qcut_df() 0 29 1
A federal_state_data() 0 24 2
A create_geocoded_table() 0 17 1
C desaggregate_pv() 0 137 7
A grid_districts() 0 32 1
A load_mastr_data() 0 35 1
A frame_to_numeric() 0 20 4
A drop_gens_outside_muns() 0 25 1
A geocoder() 0 22 1
A allocate_to_buildings() 0 32 1
A overlay_grid_districts_with_counties() 0 33 1
A drop_invalid_entries_from_gdf() 0 25 1
A validate_output() 0 50 1
A drop_unallocated_gens() 0 27 1
A most_plausible() 0 41 4

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.power_plants.pv_rooftop_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate new
3
PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
Data cleaning: Drop duplicates and entries with missing critical data. Determine most
5
plausible capacity from multiple values given in MaStR data. Drop generators which don't
6
have any plausible capacity data (23.5MW > P > 0.1). Randomly and weighted add a
7
start-up date if it is missing. Extract zip and municipality from 'Standort' given in
8
MaStR data. Geocode unique zip and municipality combinations with Nominatim (1sec
9
delay). Drop generators for which geocoding failed or which are located outside the
10
municipalities of Germany. Add some visual sanity checks for cleaned data.
11
Allocation of MaStR data: Allocate each generator to an existing building from OSM.
12
Determine the quantile each generator and building is in depending on the capacity of
13
the generator and the area of the polygon of the building. Randomly distribute
14
generators within each municipality preferably within the same building area quantile as
15
the generators are capacity wise. If not enough buildings exists within a municipality
16
and quantile additional buildings from other quantiles are chosen randomly.
17
Desegregation of pv rooftop scenarios: The scenario data per federal state is linear
18
distributed to the mv grid districts according to the pv rooftop potential per mv grid
19
district. The rooftop potential is estimated from the building area given from the OSM
20
buildings. Grid districts, which are located in several federal states, are allocated PV
21
capacity according to their respective roof potential in the individual federal states.
22
The desegregation of PV plants within a grid districts respects existing plants from
23
MaStR, which did not reach their end of life. New PV plants are randomly and weighted
24
generated using a breakdown of MaStR data as generator basis. Plant metadata (e.g. plant
25
orientation) is also added random and weighted from MaStR data as basis.
26
"""
27
from __future__ import annotations
28
29
from collections import Counter
30
from functools import wraps
31
from pathlib import Path
32
from time import perf_counter
33
from typing import Any
34
35
from geoalchemy2 import Geometry
36
from geopy.extra.rate_limiter import RateLimiter
37
from geopy.geocoders import Nominatim
38
from loguru import logger
39
from numpy.random import RandomState, default_rng
40
from pyproj.crs.crs import CRS
41
from sqlalchemy import BigInteger, Column, Float, Integer, String
42
from sqlalchemy.dialects.postgresql import HSTORE
43
from sqlalchemy.ext.declarative import declarative_base
44
import geopandas as gpd
45
import numpy as np
46
import pandas as pd
47
48
from egon.data import config, db
49
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
50
    OsmBuildingsSynthetic,
51
)
52
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
53
from egon.data.datasets.zensus_vg250 import Vg250Gem
54
55
engine = db.engine()
56
Base = declarative_base()
57
SEED = int(config.settings()["egon-data"]["--random-seed"])
58
59
# TODO: move to yml
60
# mastr data
61
MASTR_RELEVANT_COLS = [
62
    "EinheitMastrNummer",
63
    "Bruttoleistung",
64
    "StatisikFlag",
65
    "Bruttoleistung_extended",
66
    "Nettonennleistung",
67
    "InstallierteLeistung",
68
    "zugeordneteWirkleistungWechselrichter",
69
    "EinheitBetriebsstatus",
70
    "Standort",
71
    "Bundesland",
72
    "Land",
73
    "Landkreis",
74
    "Gemeinde",
75
    "Postleitzahl",
76
    "Ort",
77
    "GeplantesInbetriebnahmedatum",
78
    "Inbetriebnahmedatum",
79
    "GemeinsamerWechselrichterMitSpeicher",
80
    "Lage",
81
    "Leistungsbegrenzung",
82
    "EinheitlicheAusrichtungUndNeigungswinkel",
83
    "Hauptausrichtung",
84
    "HauptausrichtungNeigungswinkel",
85
    "Nebenausrichtung",
86
]
87
88
MASTR_DTYPES = {
89
    "EinheitMastrNummer": str,
90
    "Bruttoleistung": float,
91
    "StatisikFlag": str,
92
    "Bruttoleistung_extended": float,
93
    "Nettonennleistung": float,
94
    "InstallierteLeistung": float,
95
    "zugeordneteWirkleistungWechselrichter": float,
96
    "EinheitBetriebsstatus": str,
97
    "Standort": str,
98
    "Bundesland": str,
99
    "Land": str,
100
    "Landkreis": str,
101
    "Gemeinde": str,
102
    # "Postleitzahl": int,  # fails because of nan values
103
    "Ort": str,
104
    "GemeinsamerWechselrichterMitSpeicher": str,
105
    "Lage": str,
106
    "Leistungsbegrenzung": str,
107
    # this will parse nan values as false wich is not always correct
108
    # "EinheitlicheAusrichtungUndNeigungswinkel": bool,
109
    "Hauptausrichtung": str,
110
    "HauptausrichtungNeigungswinkel": str,
111
    "Nebenausrichtung": str,
112
    "NebenausrichtungNeigungswinkel": str,
113
}
114
115
MASTR_PARSE_DATES = [
116
    "GeplantesInbetriebnahmedatum",
117
    "Inbetriebnahmedatum",
118
]
119
120
MASTR_INDEX_COL = "EinheitMastrNummer"
121
122
EPSG = 4326
123
SRID = 3035
124
125
# data cleaning
126
MAX_REALISTIC_PV_CAP = 23500
127
MIN_REALISTIC_PV_CAP = 0.1
128
ROUNDING = 1
129
130
# geopy
131
MIN_DELAY_SECONDS = 1
132
USER_AGENT = "rli_kh_geocoder"
133
134
# show additional logging information
135
VERBOSE = False
136
137
EXPORT_DIR = Path(__name__).resolve().parent / "data"
138
EXPORT_FILE = "mastr_geocoded.gpkg"
139
EXPORT_PATH = EXPORT_DIR / EXPORT_FILE
140
DRIVER = "GPKG"
141
142
# Number of quantiles
143
Q = 5
144
145
# Scenario Data
146
CARRIER = "solar_rooftop"
147
SCENARIOS = ["eGon2035"]  # , "eGon100RE"]
148
SCENARIO_TIMESTAMP = {
149
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
150
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
151
}
152
PV_ROOFTOP_LIFETIME = pd.Timedelta(30 * 365, unit="D")
153
154
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
155
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
156
MODUL_CAP = 0.4  # kWp
157
MODUL_SIZE = 1.096 * 1.754  # m²
158
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
159
160
# Estimation of usable roof area
161
# Factor for the conversion of building area to roof area
162
# estimation mean roof pitch: 35°
163
# estimation usable roof share: 80%
164
# estimation that only the south side of the building is used for pv
165
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
166
# AREA_FACTOR = 1.221
167
# USABLE_ROOF_SHARE = 0.8
168
# SOUTH_SHARE = 0.5
169
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
170
ROOF_FACTOR = 0.5
171
172
CAP_RANGES = [
173
    (0, 30),
174
    (30, 100),
175
    (100, float("inf")),
176
]
177
178
MIN_BUILDING_SIZE = 10.0
179
UPPER_QUNATILE = 0.95
180
LOWER_QUANTILE = 0.05
181
182
COLS_TO_RENAME = {
183
    "EinheitlicheAusrichtungUndNeigungswinkel": (
184
        "einheitliche_ausrichtung_und_neigungswinkel"
185
    ),
186
    "Hauptausrichtung": "hauptausrichtung",
187
    "HauptausrichtungNeigungswinkel": "hauptausrichtung_neigungswinkel",
188
}
189
190
COLS_TO_EXPORT = [
191
    "scenario",
192
    "building_id",
193
    "gens_id",
194
    "capacity",
195
    "einheitliche_ausrichtung_und_neigungswinkel",
196
    "hauptausrichtung",
197
    "hauptausrichtung_neigungswinkel",
198
    "voltage_level",
199
]
200
201
# TODO
202
INCLUDE_SYNTHETIC_BUILDINGS = True
203
ONLY_BUILDINGS_WITH_DEMAND = True
204
TEST_RUN = False
205
206
207
def timer_func(func):
208
    @wraps(func)
209
    def timeit_wrapper(*args, **kwargs):
210
        start_time = perf_counter()
211
        result = func(*args, **kwargs)
212
        end_time = perf_counter()
213
        total_time = end_time - start_time
214
        logger.debug(
215
            f"Function {func.__name__} took {total_time:.4f} seconds."
216
        )
217
        return result
218
219
    return timeit_wrapper
220
221
222
@timer_func
223
def mastr_data(
224
    index_col: str | int | list[str] | list[int],
225
    usecols: list[str],
226
    dtype: dict[str, Any] | None,
227
    parse_dates: list[str] | None,
228
) -> pd.DataFrame:
229
    """
230
    Read MaStR data from csv.
231
232
    Parameters
233
    -----------
234
    index_col : str, int or list of str or int
235
        Column(s) to use as the row labels of the DataFrame.
236
    usecols : list of str
237
        Return a subset of the columns.
238
    dtype : dict of column (str) -> type (any), optional
239
        Data type for data or columns.
240
    parse_dates : list of names (str), optional
241
        Try to parse given columns to datetime.
242
    Returns
243
    -------
244
    pandas.DataFrame
245
        DataFrame containing MaStR data.
246
    """
247
    mastr_path = Path(
248
        config.datasets()["power_plants"]["sources"]["mastr_pv"]
249
    ).resolve()
250
251
    mastr_df = pd.read_csv(
252
        mastr_path,
253
        index_col=index_col,
254
        usecols=usecols,
255
        dtype=dtype,
256
        parse_dates=parse_dates,
257
    )
258
259
    mastr_df = mastr_df.loc[
260
        (mastr_df.StatisikFlag == "B")
261
        & (mastr_df.EinheitBetriebsstatus == "InBetrieb")
262
        & (mastr_df.Land == "Deutschland")
263
        & (mastr_df.Lage == "BaulicheAnlagen")
264
    ]
265
266
    if (
267
        config.settings()["egon-data"]["--dataset-boundary"]
268
        == "Schleswig-Holstein"
269
    ):
270
        init_len = len(mastr_df)
271
272
        mastr_df = mastr_df.loc[mastr_df.Bundesland == "SchleswigHolstein"]
273
274
        logger.info(
275
            f"Using only MaStR data within Schleswig-Holstein. "
276
            f"{init_len - len(mastr_df)} of {init_len} generators are dropped."
277
        )
278
279
    logger.debug("MaStR data loaded.")
280
281
    return mastr_df
282
283
284
@timer_func
285
def clean_mastr_data(
286
    mastr_df: pd.DataFrame,
287
    max_realistic_pv_cap: int | float,
288
    min_realistic_pv_cap: int | float,
289
    rounding: int,
290
    seed: int,
291
) -> pd.DataFrame:
292
    """
293
    Clean the MaStR data from implausible data.
294
295
    * Drop MaStR ID duplicates.
296
    * Drop generators with implausible capacities.
297
    * Drop generators without any kind of start-up date.
298
    * Clean up Standort column and capacity.
299
300
    Parameters
301
    -----------
302
    mastr_df : pandas.DataFrame
303
        DataFrame containing MaStR data.
304
    max_realistic_pv_cap : int or float
305
        Maximum capacity, which is considered to be realistic.
306
    min_realistic_pv_cap : int or float
307
        Minimum capacity, which is considered to be realistic.
308
    rounding : int
309
        Rounding to use when cleaning up capacity. E.g. when
310
        rounding is 1 a capacity of 9.93 will be rounded to 9.9.
311
    seed : int
312
        Seed to use for random operations with NumPy and pandas.
313
    Returns
314
    -------
315
    pandas.DataFrame
316
        DataFrame containing cleaned MaStR data.
317
    """
318
    init_len = len(mastr_df)
319
320
    # drop duplicates
321
    mastr_df = mastr_df.loc[~mastr_df.index.duplicated()]
322
323
    # drop invalid entries in standort
324
    index_to_drop = mastr_df.loc[
325
        (mastr_df.Standort.isna()) | (mastr_df.Standort.isnull())
326
    ].index
327
328
    mastr_df = mastr_df.loc[~mastr_df.index.isin(index_to_drop)]
329
330
    df = mastr_df[
331
        [
332
            "Bruttoleistung",
333
            "Bruttoleistung_extended",
334
            "Nettonennleistung",
335
            "zugeordneteWirkleistungWechselrichter",
336
            "InstallierteLeistung",
337
        ]
338
    ].round(rounding)
339
340
    # use only the smallest capacity rating if multiple are given
341
    mastr_df = mastr_df.assign(
342
        capacity=[
343
            most_plausible(p_tub, min_realistic_pv_cap)
344
            for p_tub in df.itertuples(index=False)
345
        ]
346
    )
347
348
    # drop generators without any capacity info
349
    # and capacity of zero
350
    # and if the capacity is > 23.5 MW, because
351
    # Germanies largest rooftop PV is 23 MW
352
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
353
    mastr_df = mastr_df.loc[
354
        (~mastr_df.capacity.isna())
355
        & (mastr_df.capacity <= max_realistic_pv_cap)
356
        & (mastr_df.capacity > min_realistic_pv_cap)
357
    ]
358
359
    # get zip and municipality
360
    mastr_df[["zip_and_municipality", "drop_this"]] = pd.DataFrame(
361
        mastr_df.Standort.astype(str)
362
        .apply(
363
            zip_and_municipality_from_standort,
364
            args=(VERBOSE,),
365
        )
366
        .tolist(),
367
        index=mastr_df.index,
368
    )
369
370
    # drop invalid entries
371
    mastr_df = mastr_df.loc[mastr_df.drop_this].drop(columns="drop_this")
372
373
    # add ", Deutschland" just in case
374
    mastr_df = mastr_df.assign(
375
        zip_and_municipality=(mastr_df.zip_and_municipality + ", Deutschland")
376
    )
377
378
    # get consistent start-up date
379
    mastr_df = mastr_df.assign(
380
        start_up_date=mastr_df.Inbetriebnahmedatum,
381
    )
382
383
    mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
384
        mastr_df.start_up_date.isna()
385
    ].assign(
386
        start_up_date=mastr_df.GeplantesInbetriebnahmedatum.loc[
387
            mastr_df.start_up_date.isna()
388
        ]
389
    )
390
391
    # randomly and weighted fill missing start-up dates
392
    pool = mastr_df.loc[
393
        ~mastr_df.start_up_date.isna()
394
    ].start_up_date.to_numpy()
395
396
    size = len(mastr_df) - len(pool)
397
398
    if size > 0:
399
        np.random.seed(seed)
400
401
        choice = np.random.choice(
402
            pool,
403
            size=size,
404
            replace=False,
405
        )
406
407
        mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
408
            mastr_df.start_up_date.isna()
409
        ].assign(start_up_date=choice)
410
411
        logger.info(
412
            f"Randomly and weigthed added start-up date to {size} generators."
413
        )
414
415
    mastr_df = mastr_df.assign(
416
        start_up_date=pd.to_datetime(mastr_df.start_up_date, utc=True)
417
    )
418
419
    end_len = len(mastr_df)
420
    logger.debug(
421
        f"Dropped {init_len - end_len} "
422
        f"({((init_len - end_len) / init_len) * 100:g}%)"
423
        f" of {init_len} rows from MaStR DataFrame."
424
    )
425
426
    return mastr_df
427
428
429
def zip_and_municipality_from_standort(
430
    standort: str,
431
    verbose: bool = False,
432
) -> tuple[str, bool]:
433
    """
434
    Get zip code and municipality from Standort string split into a list.
435
    Parameters
436
    -----------
437
    standort : str
438
        Standort as given from MaStR data.
439
    verbose : bool
440
        Logs additional info if True.
441
    Returns
442
    -------
443
    str
444
        Standort with only the zip code and municipality
445
        as well a ', Germany' added.
446
    """
447
    if verbose:
448
        logger.debug(f"Uncleaned String: {standort}")
449
450
    standort_list = standort.split()
451
452
    found = False
453
    count = 0
454
455
    for count, elem in enumerate(standort_list):
456
        if len(elem) != 5:
457
            continue
458
        if not elem.isnumeric():
459
            continue
460
461
        found = True
462
463
        break
464
465
    if found:
466
        cleaned_str = " ".join(standort_list[count:])
467
468
        if verbose:
469
            logger.debug(f"Cleaned String:   {cleaned_str}")
470
471
        return cleaned_str, found
472
473
    logger.warning(
474
        "Couldn't identify zip code. This entry will be dropped."
475
        f" Original standort: {standort}."
476
    )
477
478
    return standort, found
479
480
481
def most_plausible(
482
    p_tub: tuple,
483
    min_realistic_pv_cap: int | float,
484
) -> float:
485
    """
486
    Try to determine the most plausible capacity.
487
    Try to determine the most plausible capacity from a given
488
    generator from MaStR data.
489
    Parameters
490
    -----------
491
    p_tub : tuple
492
        Tuple containing the different capacities given in
493
        the MaStR data.
494
    min_realistic_pv_cap : int or float
495
        Minimum capacity, which is considered to be realistic.
496
    Returns
497
    -------
498
    float
499
        Capacity of the generator estimated as the most realistic.
500
    """
501
    count = Counter(p_tub).most_common(3)
502
503
    if len(count) == 1:
504
        return count[0][0]
505
506
    val1 = count[0][0]
507
    val2 = count[1][0]
508
509
    if len(count) == 2:
510
        min_val = min(val1, val2)
511
        max_val = max(val1, val2)
512
    else:
513
        val3 = count[2][0]
514
515
        min_val = min(val1, val2, val3)
516
        max_val = max(val1, val2, val3)
517
518
    if min_val < min_realistic_pv_cap:
519
        return max_val
520
521
    return min_val
522
523
524
def geocoder(
525
    user_agent: str,
526
    min_delay_seconds: int,
527
) -> RateLimiter:
528
    """
529
    Setup Nominatim geocoding class.
530
    Parameters
531
    -----------
532
    user_agent : str
533
        The app name.
534
    min_delay_seconds : int
535
        Delay in seconds to use between requests to Nominatim.
536
        A minimum of 1 is advised.
537
    Returns
538
    -------
539
    geopy.extra.rate_limiter.RateLimiter
540
        Nominatim RateLimiter geocoding class to use for geocoding.
541
    """
542
    locator = Nominatim(user_agent=user_agent)
543
    return RateLimiter(
544
        locator.geocode,
545
        min_delay_seconds=min_delay_seconds,
546
    )
547
548
549
def geocoding_data(
550
    clean_mastr_df: pd.DataFrame,
551
) -> pd.DataFrame:
552
    """
553
    Setup DataFrame to geocode.
554
    Parameters
555
    -----------
556
    clean_mastr_df : pandas.DataFrame
557
        DataFrame containing cleaned MaStR data.
558
    Returns
559
    -------
560
    pandas.DataFrame
561
        DataFrame containing all unique combinations of
562
        zip codes with municipalities for geocoding.
563
    """
564
    return pd.DataFrame(
565
        data=clean_mastr_df.zip_and_municipality.unique(),
566
        columns=["zip_and_municipality"],
567
    )
568
569
570
@timer_func
571
def geocode_data(
572
    geocoding_df: pd.DataFrame,
573
    ratelimiter: RateLimiter,
574
    epsg: int,
575
) -> gpd.GeoDataFrame:
576
    """
577
    Geocode zip code and municipality.
578
    Extract latitude, longitude and altitude.
579
    Transfrom latitude and longitude to shapely
580
    Point and return a geopandas GeoDataFrame.
581
    Parameters
582
    -----------
583
    geocoding_df : pandas.DataFrame
584
        DataFrame containing all unique combinations of
585
        zip codes with municipalities for geocoding.
586
    ratelimiter : geopy.extra.rate_limiter.RateLimiter
587
        Nominatim RateLimiter geocoding class to use for geocoding.
588
    epsg : int
589
        EPSG ID to use as CRS.
590
    Returns
591
    -------
592
    geopandas.GeoDataFrame
593
        GeoDataFrame containing all unique combinations of
594
        zip codes with municipalities with matching geolocation.
595
    """
596
    logger.info(f"Geocoding {len(geocoding_df)} locations.")
597
598
    geocode_df = geocoding_df.assign(
599
        location=geocoding_df.zip_and_municipality.apply(ratelimiter)
600
    )
601
602
    geocode_df = geocode_df.assign(
603
        point=geocode_df.location.apply(
604
            lambda loc: tuple(loc.point) if loc else None
605
        )
606
    )
607
608
    geocode_df[["latitude", "longitude", "altitude"]] = pd.DataFrame(
609
        geocode_df.point.tolist(), index=geocode_df.index
610
    )
611
612
    return gpd.GeoDataFrame(
613
        geocode_df,
614
        geometry=gpd.points_from_xy(geocode_df.longitude, geocode_df.latitude),
615
        crs=f"EPSG:{epsg}",
616
    )
617
618
619
def merge_geocode_with_mastr(
620
    clean_mastr_df: pd.DataFrame, geocode_gdf: gpd.GeoDataFrame
621
) -> gpd.GeoDataFrame:
622
    """
623
    Merge geometry to original mastr data.
624
    Parameters
625
    -----------
626
    clean_mastr_df : pandas.DataFrame
627
        DataFrame containing cleaned MaStR data.
628
    geocode_gdf : geopandas.GeoDataFrame
629
        GeoDataFrame containing all unique combinations of
630
        zip codes with municipalities with matching geolocation.
631
    Returns
632
    -------
633
    gepandas.GeoDataFrame
634
        GeoDataFrame containing cleaned MaStR data with
635
        matching geolocation from geocoding.
636
    """
637
    return gpd.GeoDataFrame(
638
        clean_mastr_df.merge(
639
            geocode_gdf[["zip_and_municipality", "geometry"]],
640
            how="left",
641
            left_on="zip_and_municipality",
642
            right_on="zip_and_municipality",
643
        ),
644
        crs=geocode_gdf.crs,
645
    ).set_index(clean_mastr_df.index)
646
647
648
def drop_invalid_entries_from_gdf(
649
    gdf: gpd.GeoDataFrame,
650
) -> gpd.GeoDataFrame:
651
    """
652
    Drop invalid entries from geopandas GeoDataFrame.
653
    TODO: how to omit the logging from geos here???
654
    Parameters
655
    -----------
656
    gdf : geopandas.GeoDataFrame
657
        GeoDataFrame to be checked for validity.
658
    Returns
659
    -------
660
    gepandas.GeoDataFrame
661
        GeoDataFrame with rows with invalid geometries
662
        dropped.
663
    """
664
    valid_gdf = gdf.loc[gdf.is_valid]
665
666
    logger.debug(
667
        f"{len(gdf) - len(valid_gdf)} "
668
        f"({(len(gdf) - len(valid_gdf)) / len(gdf) * 100:g}%) "
669
        f"of {len(gdf)} values were invalid and are dropped."
670
    )
671
672
    return valid_gdf
673
674
675
@timer_func
676
def municipality_data() -> gpd.GeoDataFrame:
677
    """
678
    Get municipality data from eGo^n Database.
679
    Returns
680
    -------
681
    gepandas.GeoDataFrame
682
        GeoDataFrame with municipality data.
683
    """
684
    with db.session_scope() as session:
685
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
686
687
    return gpd.read_postgis(
688
        query.statement, query.session.bind, index_col="ags"
689
    )
690
691
692
@timer_func
693
def add_ags_to_gens(
694
    valid_mastr_gdf: gpd.GeoDataFrame,
695
    municipalities_gdf: gpd.GeoDataFrame,
696
) -> gpd.GeoDataFrame:
697
    """
698
    Add information about AGS ID to generators.
699
    Parameters
700
    -----------
701
    valid_mastr_gdf : geopandas.GeoDataFrame
702
        GeoDataFrame with valid and cleaned MaStR data.
703
    municipalities_gdf : geopandas.GeoDataFrame
704
        GeoDataFrame with municipality data.
705
    Returns
706
    -------
707
    gepandas.GeoDataFrame
708
        GeoDataFrame with valid and cleaned MaStR data
709
        with AGS ID added.
710
    """
711
    return valid_mastr_gdf.sjoin(
712
        municipalities_gdf,
713
        how="left",
714
        predicate="intersects",
715
    ).rename(columns={"index_right": "ags"})
716
717
718
def drop_gens_outside_muns(
719
    valid_mastr_gdf: gpd.GeoDataFrame,
720
) -> gpd.GeoDataFrame:
721
    """
722
    Drop all generators outside of municipalities.
723
    Parameters
724
    -----------
725
    valid_mastr_gdf : geopandas.GeoDataFrame
726
        GeoDataFrame with valid and cleaned MaStR data.
727
    Returns
728
    -------
729
    gepandas.GeoDataFrame
730
        GeoDataFrame with valid and cleaned MaStR data
731
        with generatos without an AGS ID dropped.
732
    """
733
    gdf = valid_mastr_gdf.loc[~valid_mastr_gdf.ags.isna()]
734
735
    logger.debug(
736
        f"{len(valid_mastr_gdf) - len(gdf)} "
737
        f"({(len(valid_mastr_gdf) - len(gdf)) / len(valid_mastr_gdf) * 100:g}%) "
738
        f"of {len(valid_mastr_gdf)} values are outside of the municipalities"
739
        " and are therefore dropped."
740
    )
741
742
    return gdf
743
744
745
class EgonMastrPvRoofGeocoded(Base):
746
    __tablename__ = "egon_mastr_pv_roof_geocoded"
747
    __table_args__ = {"schema": "supply"}
748
749
    zip_and_municipality = Column(String, primary_key=True, index=True)
750
    location = Column(String)
751
    point = Column(String)
752
    latitude = Column(Float)
753
    longitude = Column(Float)
754
    altitude = Column(Float)
755
    geometry = Column(Geometry(srid=EPSG))
756
757
758
def create_geocoded_table(geocode_gdf):
759
    """
760
    Create geocoded table mastr pv rooftop
761
    Parameters
762
    -----------
763
    geocode_gdf : geopandas.GeoDataFrame
764
        GeoDataFrame containing geocoding information on pv rooftop locations.
765
    """
766
    EgonMastrPvRoofGeocoded.__table__.drop(bind=engine, checkfirst=True)
767
    EgonMastrPvRoofGeocoded.__table__.create(bind=engine, checkfirst=True)
768
769
    geocode_gdf.to_postgis(
770
        name=EgonMastrPvRoofGeocoded.__table__.name,
771
        schema=EgonMastrPvRoofGeocoded.__table__.schema,
772
        con=db.engine(),
773
        if_exists="append",
774
        index=False,
775
        # dtype={}
776
    )
777
778
779
def geocoded_data_from_db(
780
    epsg: str | int,
781
) -> gpd.GeoDataFrame:
782
    """
783
    Read OSM buildings data from eGo^n Database.
784
    Parameters
785
    -----------
786
    to_crs : pyproj.crs.crs.CRS
787
        CRS to transform geometries to.
788
    Returns
789
    -------
790
    geopandas.GeoDataFrame
791
        GeoDataFrame containing OSM buildings data.
792
    """
793
    with db.session_scope() as session:
794
        query = session.query(
795
            EgonMastrPvRoofGeocoded.zip_and_municipality,
796
            EgonMastrPvRoofGeocoded.geometry,
797
        )
798
799
    return gpd.read_postgis(
800
        query.statement, query.session.bind, geom_col="geometry"
801
    ).to_crs(f"EPSG:{epsg}")
802
803
804
def load_mastr_data():
805
    """Read PV rooftop data from MaStR CSV
806
    Note: the source will be replaced as soon as the MaStR data is available
807
    in DB.
808
    Returns
809
    -------
810
    geopandas.GeoDataFrame
811
        GeoDataFrame containing MaStR data with geocoded locations.
812
    """
813
    mastr_df = mastr_data(
814
        MASTR_INDEX_COL,
815
        MASTR_RELEVANT_COLS,
816
        MASTR_DTYPES,
817
        MASTR_PARSE_DATES,
818
    )
819
820
    clean_mastr_df = clean_mastr_data(
821
        mastr_df,
822
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
823
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
824
        seed=SEED,
825
        rounding=ROUNDING,
826
    )
827
828
    geocode_gdf = geocoded_data_from_db(EPSG)
829
830
    mastr_gdf = merge_geocode_with_mastr(clean_mastr_df, geocode_gdf)
831
832
    valid_mastr_gdf = drop_invalid_entries_from_gdf(mastr_gdf)
833
834
    municipalities_gdf = municipality_data()
835
836
    valid_mastr_gdf = add_ags_to_gens(valid_mastr_gdf, municipalities_gdf)
837
838
    return drop_gens_outside_muns(valid_mastr_gdf)
839
840
841
class OsmBuildingsFiltered(Base):
842
    __tablename__ = "osm_buildings_filtered"
843
    __table_args__ = {"schema": "openstreetmap"}
844
845
    osm_id = Column(BigInteger)
846
    amenity = Column(String)
847
    building = Column(String)
848
    name = Column(String)
849
    geom = Column(Geometry(srid=SRID), index=True)
850
    area = Column(Float)
851
    geom_point = Column(Geometry(srid=SRID), index=True)
852
    tags = Column(HSTORE)
853
    id = Column(BigInteger, primary_key=True, index=True)
854
855
856
@timer_func
857
def osm_buildings(
858
    to_crs: CRS,
859
) -> gpd.GeoDataFrame:
860
    """
861
    Read OSM buildings data from eGo^n Database.
862
    Parameters
863
    -----------
864
    to_crs : pyproj.crs.crs.CRS
865
        CRS to transform geometries to.
866
    Returns
867
    -------
868
    geopandas.GeoDataFrame
869
        GeoDataFrame containing OSM buildings data.
870
    """
871
    with db.session_scope() as session:
872
        query = session.query(
873
            OsmBuildingsFiltered.id,
874
            OsmBuildingsFiltered.area,
875
            OsmBuildingsFiltered.geom_point.label("geom"),
876
        )
877
878
    return gpd.read_postgis(
879
        query.statement, query.session.bind, index_col="id"
880
    ).to_crs(to_crs)
881
882
883
@timer_func
884
def synthetic_buildings(
885
    to_crs: CRS,
886
) -> gpd.GeoDataFrame:
887
    """
888
    Read synthetic buildings data from eGo^n Database.
889
    Parameters
890
    -----------
891
    to_crs : pyproj.crs.crs.CRS
892
        CRS to transform geometries to.
893
    Returns
894
    -------
895
    geopandas.GeoDataFrame
896
        GeoDataFrame containing OSM buildings data.
897
    """
898
    with db.session_scope() as session:
899
        query = session.query(
900
            OsmBuildingsSynthetic.id,
901
            OsmBuildingsSynthetic.area,
902
            OsmBuildingsSynthetic.geom_point.label("geom"),
903
        )
904
905
    return gpd.read_postgis(
906
        query.statement, query.session.bind, index_col="id"
907
    ).to_crs(to_crs)
908
909
910
@timer_func
911
def add_ags_to_buildings(
912
    buildings_gdf: gpd.GeoDataFrame,
913
    municipalities_gdf: gpd.GeoDataFrame,
914
) -> gpd.GeoDataFrame:
915
    """
916
    Add information about AGS ID to buildings.
917
    Parameters
918
    -----------
919
    buildings_gdf : geopandas.GeoDataFrame
920
        GeoDataFrame containing OSM buildings data.
921
    municipalities_gdf : geopandas.GeoDataFrame
922
        GeoDataFrame with municipality data.
923
    Returns
924
    -------
925
    gepandas.GeoDataFrame
926
        GeoDataFrame containing OSM buildings data
927
        with AGS ID added.
928
    """
929
    return buildings_gdf.sjoin(
930
        municipalities_gdf,
931
        how="left",
932
        predicate="intersects",
933
    ).rename(columns={"index_right": "ags"})
934
935
936
def drop_buildings_outside_muns(
937
    buildings_gdf: gpd.GeoDataFrame,
938
) -> gpd.GeoDataFrame:
939
    """
940
    Drop all buildings outside of municipalities.
941
    Parameters
942
    -----------
943
    buildings_gdf : geopandas.GeoDataFrame
944
        GeoDataFrame containing OSM buildings data.
945
    Returns
946
    -------
947
    gepandas.GeoDataFrame
948
        GeoDataFrame containing OSM buildings data
949
        with buildings without an AGS ID dropped.
950
    """
951
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
952
953
    logger.debug(
954
        f"{len(buildings_gdf) - len(gdf)} "
955
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
956
        f"of {len(buildings_gdf)} values are outside of the municipalities "
957
        "and are therefore dropped."
958
    )
959
960
    return gdf
961
962
963
def egon_building_peak_loads():
964
    sql = """
965
    SELECT building_id
966
    FROM demand.egon_building_electricity_peak_loads
967
    WHERE scenario = 'eGon2035'
968
    """
969
970
    return (
971
        db.select_dataframe(sql).building_id.astype(int).sort_values().unique()
972
    )
973
974
975
@timer_func
976
def load_building_data():
977
    """
978
    Read buildings from DB
979
    Tables:
980
981
    * `openstreetmap.osm_buildings_filtered` (from OSM)
982
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
983
984
    Use column `id` for both as it is unique hence you concat both datasets. If
985
    INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be loaded.
986
987
    Returns
988
    -------
989
    gepandas.GeoDataFrame
990
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
991
        dropped.
992
    """
993
994
    municipalities_gdf = municipality_data()
995
996
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
997
998
    if INCLUDE_SYNTHETIC_BUILDINGS:
999
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
1000
1001
        buildings_gdf = gpd.GeoDataFrame(
1002
            pd.concat(
1003
                [
1004
                    osm_buildings_gdf,
1005
                    synthetic_buildings_gdf,
1006
                ]
1007
            ),
1008
            geometry="geom",
1009
            crs=osm_buildings_gdf.crs,
1010
        ).rename(columns={"area": "building_area"})
1011
1012
    else:
1013
        buildings_gdf = osm_buildings_gdf.rename(
1014
            columns={"area": "building_area"}
1015
        )
1016
1017
    if ONLY_BUILDINGS_WITH_DEMAND:
1018
        building_ids = egon_building_peak_loads()
1019
1020
        init_len = len(building_ids)
1021
1022
        building_ids = np.intersect1d(
1023
            building_ids, buildings_gdf.index.to_numpy()
1024
        )
1025
1026
        end_len = len(building_ids)
1027
1028
        logger.debug(
1029
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) of buildings have "
1030
            f"peak load."
1031
        )
1032
1033
        buildings_gdf = buildings_gdf.loc[building_ids]
1034
1035
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
1036
1037
    logger.debug("Loaded buildings.")
1038
1039
    return drop_buildings_outside_muns(buildings_ags_gdf)
1040
1041
1042
@timer_func
1043
def sort_and_qcut_df(
1044
    df: pd.DataFrame | gpd.GeoDataFrame,
1045
    col: str,
1046
    q: int,
1047
) -> pd.DataFrame | gpd.GeoDataFrame:
1048
    """
1049
    Determine the quantile of a given attribute in a (Geo)DataFrame.
1050
    Sort the (Geo)DataFrame in ascending order for the given attribute.
1051
    Parameters
1052
    -----------
1053
    df : pandas.DataFrame or geopandas.GeoDataFrame
1054
        (Geo)DataFrame to sort and qcut.
1055
    col : str
1056
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
1057
    q : int
1058
        Number of quantiles.
1059
    Returns
1060
    -------
1061
    pandas.DataFrame or gepandas.GeoDataFrame
1062
        Sorted and qcut (Geo)DataFrame.
1063
    """
1064
    df = df.sort_values(col, ascending=True)
1065
1066
    return df.assign(
1067
        quant=pd.qcut(
1068
            df[col],
1069
            q=q,
1070
            labels=range(q),
1071
        )
1072
    )
1073
1074
1075
@timer_func
1076
def allocate_pv(
1077
    q_mastr_gdf: gpd.GeoDataFrame,
1078
    q_buildings_gdf: gpd.GeoDataFrame,
1079
    seed: int,
1080
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1081
    """
1082
    Allocate the MaStR pv generators to the OSM buildings.
1083
    This will determine a building for each pv generator if there are more
1084
    buildings than generators within a given AGS. Primarily generators are
1085
    distributed with the same qunatile as the buildings. Multiple assignment
1086
    is excluded.
1087
    Parameters
1088
    -----------
1089
    q_mastr_gdf : geopandas.GeoDataFrame
1090
        GeoDataFrame containing geocoded and qcut MaStR data.
1091
    q_buildings_gdf : geopandas.GeoDataFrame
1092
        GeoDataFrame containing qcut OSM buildings data.
1093
    seed : int
1094
        Seed to use for random operations with NumPy and pandas.
1095
    Returns
1096
    -------
1097
    tuple with two geopandas.GeoDataFrame s
1098
        GeoDataFrame containing MaStR data allocated to building IDs.
1099
        GeoDataFrame containing building data allocated to MaStR IDs.
1100
    """
1101
    rng = default_rng(seed=seed)
1102
1103
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan)
1104
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan)
1105
1106
    ags_list = q_buildings_gdf.ags.unique()
1107
1108
    if TEST_RUN:
1109
        ags_list = ags_list[:250]
1110
1111
    num_ags = len(ags_list)
1112
1113
    t0 = perf_counter()
1114
1115
    for count, ags in enumerate(ags_list):
1116
1117
        buildings = q_buildings_gdf.loc[
1118
            (q_buildings_gdf.ags == ags) & (q_buildings_gdf.gens_id.isna())
1119
        ]
1120
        gens = q_mastr_gdf.loc[
1121
            (q_mastr_gdf.ags == ags) & (q_mastr_gdf.building_id.isna())
1122
        ]
1123
1124
        len_build = len(buildings)
1125
        len_gens = len(gens)
1126
1127
        if len_build < len_gens:
1128
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
1129
            logger.error(
1130
                f"There are {len_gens} generators and only {len_build}"
1131
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
1132
                "generators were truncated to match the amount of buildings."
1133
            )
1134
1135
            assert len_build == len(gens)
1136
1137
        for quant in gens.quant.unique():
1138
            q_buildings = buildings.loc[
1139
                (buildings.quant == quant) & (buildings.gens_id.isna())
1140
            ]
1141
            q_gens = gens.loc[gens.quant == quant]
1142
1143
            len_build = len(q_buildings)
1144
            len_gens = len(q_gens)
1145
1146
            if len_build < len_gens:
1147
                delta = len_gens - len_build
1148
1149
                logger.warning(
1150
                    f"There are {len_gens} generators and only {len_build} "
1151
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
1152
                    f"buildings from AGS {ags} will be added randomly."
1153
                )
1154
1155
                add_buildings = pd.Index(
1156
                    rng.choice(
1157
                        buildings.loc[
1158
                            (buildings.quant != quant)
1159
                            & (buildings.gens_id.isna())
1160
                        ].index,
1161
                        size=delta,
1162
                        replace=False,
1163
                    )
1164
                )
1165
1166
                q_buildings = buildings.loc[
1167
                    q_buildings.index.append(add_buildings)
1168
                ]
1169
1170
                assert len(q_buildings) == len_gens
1171
1172
            chosen_buildings = pd.Index(
1173
                rng.choice(
1174
                    q_buildings.index,
1175
                    size=len_gens,
1176
                    replace=False,
1177
                )
1178
            )
1179
1180
            # q_mastr_gdf.loc[q_gens.index, "building_id"] = chosen_buildings
1181
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
1182
1183
        if count % 100 == 0:
1184
            logger.debug(
1185
                f"Allocation of {count / num_ags * 100:g} % of AGS done. It took "
1186
                f"{perf_counter() - t0:g} seconds."
1187
            )
1188
1189
            t0 = perf_counter()
1190
1191
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
1192
1193
    q_mastr_gdf.loc[
1194
        assigned_buildings.gens_id, "building_id"
1195
    ] = assigned_buildings.index
1196
1197
    logger.debug("Allocated status quo generators to buildings.")
1198
1199
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
1200
1201
1202
def frame_to_numeric(
1203
    df: pd.DataFrame | gpd.GeoDataFrame,
1204
) -> pd.DataFrame | gpd.GeoDataFrame:
1205
    """
1206
    Try to convert all columns of a DataFrame to numeric ignoring errors.
1207
    Parameters
1208
    ----------
1209
    df : pandas.DataFrame or geopandas.GeoDataFrame
1210
    Returns
1211
    -------
1212
    pandas.DataFrame or geopandas.GeoDataFrame
1213
    """
1214
    if str(df.index.dtype) == "object":
1215
        df.index = pd.to_numeric(df.index, errors="ignore")
1216
1217
    for col in df.columns:
1218
        if str(df[col].dtype) == "object":
1219
            df[col] = pd.to_numeric(df[col], errors="ignore")
1220
1221
    return df
1222
1223
1224
def validate_output(
1225
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
1226
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
1227
) -> None:
1228
    """
1229
    Validate output.
1230
1231
    * Validate that there are exactly as many buildings with a pv system as there are
1232
      pv systems with a building
1233
    * Validate that the building IDs with a pv system are the same building IDs as
1234
      assigned to the pv systems
1235
    * Validate that the pv system IDs with a building are the same pv system IDs as
1236
      assigned to the buildings
1237
1238
    Parameters
1239
    -----------
1240
    desagg_mastr_gdf : geopandas.GeoDataFrame
1241
        GeoDataFrame containing MaStR data allocated to building IDs.
1242
    desagg_buildings_gdf : geopandas.GeoDataFrame
1243
        GeoDataFrame containing building data allocated to MaStR IDs.
1244
    """
1245
    assert len(
1246
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
1247
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
1248
    assert (
1249
        np.sort(
1250
            desagg_mastr_gdf.loc[
1251
                ~desagg_mastr_gdf.building_id.isna()
1252
            ].building_id.unique()
1253
        )
1254
        == np.sort(
1255
            desagg_buildings_gdf.loc[
1256
                ~desagg_buildings_gdf.gens_id.isna()
1257
            ].index.unique()
1258
        )
1259
    ).all()
1260
    assert (
1261
        np.sort(
1262
            desagg_mastr_gdf.loc[
1263
                ~desagg_mastr_gdf.building_id.isna()
1264
            ].index.unique()
1265
        )
1266
        == np.sort(
1267
            desagg_buildings_gdf.loc[
1268
                ~desagg_buildings_gdf.gens_id.isna()
1269
            ].gens_id.unique()
1270
        )
1271
    ).all()
1272
1273
    logger.debug("Validated output.")
1274
1275
1276
def drop_unallocated_gens(
1277
    gdf: gpd.GeoDataFrame,
1278
) -> gpd.GeoDataFrame:
1279
    """
1280
    Drop generators which did not get allocated.
1281
1282
    Parameters
1283
    -----------
1284
    gdf : geopandas.GeoDataFrame
1285
        GeoDataFrame containing MaStR data allocated to building IDs.
1286
    Returns
1287
    -------
1288
    geopandas.GeoDataFrame
1289
        GeoDataFrame containing MaStR data with generators dropped which did not get
1290
        allocated.
1291
    """
1292
    init_len = len(gdf)
1293
    gdf = gdf.loc[~gdf.building_id.isna()]
1294
    end_len = len(gdf)
1295
1296
    logger.debug(
1297
        f"Dropped {init_len - end_len} "
1298
        f"({((init_len - end_len) / init_len) * 100:g}%)"
1299
        f" of {init_len} unallocated rows from MaStR DataFrame."
1300
    )
1301
1302
    return gdf
1303
1304
1305
@timer_func
1306
def allocate_to_buildings(
1307
    mastr_gdf: gpd.GeoDataFrame,
1308
    buildings_gdf: gpd.GeoDataFrame,
1309
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1310
    """
1311
    Allocate status quo pv rooftop generators to buildings.
1312
    Parameters
1313
    -----------
1314
    mastr_gdf : geopandas.GeoDataFrame
1315
        GeoDataFrame containing MaStR data with geocoded locations.
1316
    buildings_gdf : geopandas.GeoDataFrame
1317
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
1318
        dropped.
1319
    Returns
1320
    -------
1321
    tuple with two geopandas.GeoDataFrame s
1322
        GeoDataFrame containing MaStR data allocated to building IDs.
1323
        GeoDataFrame containing building data allocated to MaStR IDs.
1324
    """
1325
    logger.debug("Starting allocation of status quo.")
1326
1327
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
1328
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
1329
1330
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
1331
        q_mastr_gdf, q_buildings_gdf, SEED
1332
    )
1333
1334
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
1335
1336
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
1337
1338
1339
@timer_func
1340
def grid_districts(
1341
    epsg: int,
1342
) -> gpd.GeoDataFrame:
1343
    """
1344
    Load mv grid district geo data from eGo^n Database as
1345
    geopandas.GeoDataFrame.
1346
    Parameters
1347
    -----------
1348
    epsg : int
1349
        EPSG ID to use as CRS.
1350
    Returns
1351
    -------
1352
    geopandas.GeoDataFrame
1353
        GeoDataFrame containing mv grid district ID and geo shapes data.
1354
    """
1355
    gdf = db.select_geodataframe(
1356
        """
1357
        SELECT bus_id, geom
1358
        FROM grid.egon_mv_grid_district
1359
        ORDER BY bus_id
1360
        """,
1361
        index_col="bus_id",
1362
        geom_col="geom",
1363
        epsg=epsg,
1364
    )
1365
1366
    gdf.index = gdf.index.astype(int)
1367
1368
    logger.debug("Grid districts loaded.")
1369
1370
    return gdf
1371
1372
1373
def scenario_data(
1374
    carrier: str = "solar_rooftop",
1375
    scenario: str = "eGon2035",
1376
) -> pd.DataFrame:
1377
    """
1378
    Get scenario capacity data from eGo^n Database.
1379
    Parameters
1380
    -----------
1381
    carrier : str
1382
        Carrier type to filter table by.
1383
    scenario : str
1384
        Scenario to filter table by.
1385
    Returns
1386
    -------
1387
    geopandas.GeoDataFrame
1388
        GeoDataFrame with scenario capacity data in GW.
1389
    """
1390
    with db.session_scope() as session:
1391
        query = session.query(EgonScenarioCapacities).filter(
1392
            EgonScenarioCapacities.carrier == carrier,
1393
            EgonScenarioCapacities.scenario_name == scenario,
1394
        )
1395
1396
    df = pd.read_sql(
1397
        query.statement, query.session.bind, index_col="index"
1398
    ).sort_index()
1399
1400
    logger.debug("Scenario capacity data loaded.")
1401
1402
    return df
1403
1404
1405 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1406
    __tablename__ = "vg250_lan"
1407
    __table_args__ = {"schema": "boundaries"}
1408
1409
    id = Column(BigInteger, primary_key=True, index=True)
1410
    ade = Column(BigInteger)
1411
    gf = Column(BigInteger)
1412
    bsg = Column(BigInteger)
1413
    ars = Column(String)
1414
    ags = Column(String)
1415
    sdv_ars = Column(String)
1416
    gen = Column(String)
1417
    bez = Column(String)
1418
    ibz = Column(BigInteger)
1419
    bem = Column(String)
1420
    nbd = Column(String)
1421
    sn_l = Column(String)
1422
    sn_r = Column(String)
1423
    sn_k = Column(String)
1424
    sn_v1 = Column(String)
1425
    sn_v2 = Column(String)
1426
    sn_g = Column(String)
1427
    fk_s3 = Column(String)
1428
    nuts = Column(String)
1429
    ars_0 = Column(String)
1430
    ags_0 = Column(String)
1431
    wsk = Column(String)
1432
    debkg_id = Column(String)
1433
    rs = Column(String)
1434
    sdv_rs = Column(String)
1435
    rs_0 = Column(String)
1436
    geometry = Column(Geometry(srid=EPSG), index=True)
1437
1438
1439
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1440
    """
1441
    Get feder state data from eGo^n Database.
1442
    Parameters
1443
    -----------
1444
    to_crs : pyproj.crs.crs.CRS
1445
        CRS to transform geometries to.
1446
    Returns
1447
    -------
1448
    geopandas.GeoDataFrame
1449
        GeoDataFrame with federal state data.
1450
    """
1451
    with db.session_scope() as session:
1452
        query = session.query(
1453
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1454
        )
1455
1456
    gdf = gpd.read_postgis(
1457
        query.statement, query.session.bind, index_col="id"
1458
    ).to_crs(to_crs)
1459
1460
    logger.debug("Federal State data loaded.")
1461
1462
    return gdf
1463
1464
1465
@timer_func
1466
def overlay_grid_districts_with_counties(
1467
    mv_grid_district_gdf: gpd.GeoDataFrame,
1468
    federal_state_gdf: gpd.GeoDataFrame,
1469
) -> gpd.GeoDataFrame:
1470
    """
1471
    Calculate the intersections of mv grid districts and counties.
1472
    Parameters
1473
    -----------
1474
    mv_grid_district_gdf : gpd.GeoDataFrame
1475
        GeoDataFrame containing mv grid district ID and geo shapes data.
1476
    federal_state_gdf : gpd.GeoDataFrame
1477
        GeoDataFrame with federal state data.
1478
    Returns
1479
    -------
1480
    geopandas.GeoDataFrame
1481
        GeoDataFrame containing OSM buildings data.
1482
    """
1483
    logger.debug(
1484
        "Calculating intersection overlay between mv grid districts and "
1485
        "counties. This may take a while..."
1486
    )
1487
1488
    gdf = gpd.overlay(
1489
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1490
        mv_grid_district_gdf.reset_index(),
1491
        how="intersection",
1492
        keep_geom_type=True,
1493
    )
1494
1495
    logger.debug("Done!")
1496
1497
    return gdf
1498
1499
1500
@timer_func
1501
def add_overlay_id_to_buildings(
1502
    buildings_gdf: gpd.GeoDataFrame,
1503
    grid_federal_state_gdf: gpd.GeoDataFrame,
1504
) -> gpd.GeoDataFrame:
1505
    """
1506
    Add information about overlay ID to buildings.
1507
    Parameters
1508
    -----------
1509
    buildings_gdf : geopandas.GeoDataFrame
1510
        GeoDataFrame containing OSM buildings data.
1511
    grid_federal_state_gdf : geopandas.GeoDataFrame
1512
        GeoDataFrame with intersection shapes between counties and grid districts.
1513
    Returns
1514
    -------
1515
    geopandas.GeoDataFrame
1516
        GeoDataFrame containing OSM buildings data with overlay ID added.
1517
    """
1518
    gdf = (
1519
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1520
        .sjoin(
1521
            grid_federal_state_gdf,
1522
            how="left",
1523
            predicate="intersects",
1524
        )
1525
        .rename(columns={"index_right": "overlay_id"})
1526
    )
1527
1528
    logger.debug("Added overlay ID to OSM buildings.")
1529
1530
    return gdf
1531
1532
1533
def drop_buildings_outside_grids(
1534
    buildings_gdf: gpd.GeoDataFrame,
1535
) -> gpd.GeoDataFrame:
1536
    """
1537
    Drop all buildings outside of grid areas.
1538
    Parameters
1539
    -----------
1540
    buildings_gdf : geopandas.GeoDataFrame
1541
        GeoDataFrame containing OSM buildings data.
1542
    Returns
1543
    -------
1544
    gepandas.GeoDataFrame
1545
        GeoDataFrame containing OSM buildings data
1546
        with buildings without an bus ID dropped.
1547
    """
1548
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1549
1550
    logger.debug(
1551
        f"{len(buildings_gdf) - len(gdf)} "
1552
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1553
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1554
        "and are therefore dropped."
1555
    )
1556
1557
    return gdf
1558
1559
1560
def cap_per_bus_id(
1561
    scenario: str,
1562
) -> pd.DataFrame:
1563
    """
1564
    Get table with total pv rooftop capacity per grid district.
1565
1566
    Parameters
1567
    -----------
1568
    scenario : str
1569
        Scenario name.
1570
    Returns
1571
    -------
1572
    pandas.DataFrame
1573
        DataFrame with total rooftop capacity per mv grid.
1574
    """
1575
    targets = config.datasets()["solar_rooftop"]["targets"]
1576
1577
    sql = f"""
1578
    SELECT bus as bus_id, p_nom as capacity
1579
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1580
    WHERE carrier = 'solar_rooftop'
1581
    AND scn_name = '{scenario}'
1582
    """
1583
1584
    return db.select_dataframe(sql, index_col="bus_id")
1585
1586
    # overlay_gdf = overlay_gdf.assign(capacity=np.nan)
1587
    #
1588
    # for cap, nuts in scenario_df[["capacity", "nuts"]].itertuples(index=False):
1589
    #     nuts_gdf = overlay_gdf.loc[overlay_gdf.nuts == nuts]
1590
    #
1591
    #     capacity = nuts_gdf.building_area.multiply(
1592
    #         cap / nuts_gdf.building_area.sum()
1593
    #     )
1594
    #
1595
    #     overlay_gdf.loc[nuts_gdf.index] = overlay_gdf.loc[
1596
    #         nuts_gdf.index
1597
    #     ].assign(capacity=capacity.multiply(conversion).to_numpy())
1598
    #
1599
    # return overlay_gdf[["bus_id", "capacity"]].groupby("bus_id").sum()
1600
1601
1602
def determine_end_of_life_gens(
1603
    mastr_gdf: gpd.GeoDataFrame,
1604
    scenario_timestamp: pd.Timestamp,
1605
    pv_rooftop_lifetime: pd.Timedelta,
1606
) -> gpd.GeoDataFrame:
1607
    """
1608
    Determine if an old PV system has reached its end of life.
1609
    Parameters
1610
    -----------
1611
    mastr_gdf : geopandas.GeoDataFrame
1612
        GeoDataFrame containing geocoded MaStR data.
1613
    scenario_timestamp : pandas.Timestamp
1614
        Timestamp at which the scenario takes place.
1615
    pv_rooftop_lifetime : pandas.Timedelta
1616
        Average expected lifetime of PV rooftop systems.
1617
    Returns
1618
    -------
1619
    geopandas.GeoDataFrame
1620
        GeoDataFrame containing geocoded MaStR data and info if the system
1621
        has reached its end of life.
1622
    """
1623
    mastr_gdf = mastr_gdf.assign(
1624
        age=scenario_timestamp - mastr_gdf.start_up_date
1625
    )
1626
1627
    logger.debug("Determined if pv rooftop systems reached their end of life.")
1628
1629
    return mastr_gdf.assign(end_of_life=pv_rooftop_lifetime < mastr_gdf.age)
1630
1631
1632
def calculate_max_pv_cap_per_building(
1633
    buildings_gdf: gpd.GeoDataFrame,
1634
    mastr_gdf: gpd.GeoDataFrame,
1635
    pv_cap_per_sq_m: float | int,
1636
    roof_factor: float | int,
1637
) -> gpd.GeoDataFrame:
1638
    """
1639
    Calculate the estimated maximum possible PV capacity per building.
1640
    Parameters
1641
    -----------
1642
    buildings_gdf : geopandas.GeoDataFrame
1643
        GeoDataFrame containing OSM buildings data.
1644
    mastr_gdf : geopandas.GeoDataFrame
1645
        GeoDataFrame containing geocoded MaStR data.
1646
    pv_cap_per_sq_m : float, int
1647
        Average expected, installable PV capacity per square meter.
1648
    roof_factor : float, int
1649
        Average for PV usable roof area share.
1650
    Returns
1651
    -------
1652
    geopandas.GeoDataFrame
1653
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1654
        capacity.
1655
    """
1656
    gdf = (
1657
        buildings_gdf.reset_index()
1658
        .merge(
1659
            mastr_gdf[
1660
                [
1661
                    "capacity",
1662
                    "end_of_life",
1663
                    "building_id",
1664
                    "EinheitlicheAusrichtungUndNeigungswinkel",
1665
                    "Hauptausrichtung",
1666
                    "HauptausrichtungNeigungswinkel",
1667
                ]
1668
            ],
1669
            how="left",
1670
            left_on="id",
1671
            right_on="building_id",
1672
        )
1673
        .set_index("id")
1674
        .drop(columns="building_id")
1675
    )
1676
1677
    return gdf.assign(
1678
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1679
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1680
        bus_id=gdf.bus_id.astype(int),
1681
    )
1682
1683
1684
def calculate_building_load_factor(
1685
    mastr_gdf: gpd.GeoDataFrame,
1686
    buildings_gdf: gpd.GeoDataFrame,
1687
    rounding: int = 4,
1688
) -> gpd.GeoDataFrame:
1689
    """
1690
    Calculate the roof load factor from existing PV systems.
1691
    Parameters
1692
    -----------
1693
    mastr_gdf : geopandas.GeoDataFrame
1694
        GeoDataFrame containing geocoded MaStR data.
1695
    buildings_gdf : geopandas.GeoDataFrame
1696
        GeoDataFrame containing OSM buildings data.
1697
    rounding : int
1698
        Rounding to use for load factor.
1699
    Returns
1700
    -------
1701
    geopandas.GeoDataFrame
1702
        GeoDataFrame containing geocoded MaStR data with calculated load factor.
1703
    """
1704
    gdf = mastr_gdf.merge(
1705
        buildings_gdf[["max_cap", "building_area"]].loc[
1706
            ~buildings_gdf["max_cap"].isna()
1707
        ],
1708
        how="left",
1709
        left_on="building_id",
1710
        right_index=True,
1711
    )
1712
1713
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1714
1715
1716
def get_probability_for_property(
1717
    mastr_gdf: gpd.GeoDataFrame,
1718
    cap_range: tuple[int | float, int | float],
1719
    prop: str,
1720
) -> tuple[np.array, np.array]:
1721
    """
1722
    Calculate the probability of the different options of a property of the
1723
    existing PV plants.
1724
    Parameters
1725
    -----------
1726
    mastr_gdf : geopandas.GeoDataFrame
1727
        GeoDataFrame containing geocoded MaStR data.
1728
    cap_range : tuple(int, int)
1729
        Capacity range of PV plants to look at.
1730
    prop : str
1731
        Property to calculate probabilities for. String needs to be in columns
1732
        of mastr_gdf.
1733
    Returns
1734
    -------
1735
    tuple
1736
        numpy.array
1737
            Unique values of property.
1738
        numpy.array
1739
            Probabilties per unique value.
1740
    """
1741
    cap_range_gdf = mastr_gdf.loc[
1742
        (mastr_gdf.capacity > cap_range[0])
1743
        & (mastr_gdf.capacity <= cap_range[1])
1744
    ]
1745
1746
    if prop == "load_factor":
1747
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1748
1749
    count = Counter(
1750
        cap_range_gdf[prop].loc[
1751
            ~cap_range_gdf[prop].isna()
1752
            & ~cap_range_gdf[prop].isnull()
1753
            & ~(cap_range_gdf[prop] == "None")
1754
        ]
1755
    )
1756
1757
    values = np.array(list(count.keys()))
1758
    probabilities = np.fromiter(count.values(), dtype=float)
1759
    probabilities = probabilities / np.sum(probabilities)
1760
1761
    return values, probabilities
1762
1763
1764
@timer_func
1765
def probabilities(
1766
    mastr_gdf: gpd.GeoDataFrame,
1767
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1768
    properties: list[str] | None = None,
1769
) -> dict:
1770
    """
1771
    Calculate the probability of the different options of properties of the
1772
    existing PV plants.
1773
    Parameters
1774
    -----------
1775
    mastr_gdf : geopandas.GeoDataFrame
1776
        GeoDataFrame containing geocoded MaStR data.
1777
    cap_ranges : list(tuple(int, int))
1778
        List of capacity ranges to distinguish between. The first tuple should
1779
        start with a zero and the last one should end with infinite.
1780
    properties : list(str)
1781
        List of properties to calculate probabilities for. Strings needs to be
1782
        in columns of mastr_gdf.
1783
    Returns
1784
    -------
1785
    dict
1786
        Dictionary with values and probabilities per capacity range.
1787
    """
1788
    if cap_ranges is None:
1789
        cap_ranges = [
1790
            (0, 30),
1791
            (30, 100),
1792
            (100, float("inf")),
1793
        ]
1794
    if properties is None:
1795
        properties = [
1796
            "EinheitlicheAusrichtungUndNeigungswinkel",
1797
            "Hauptausrichtung",
1798
            "HauptausrichtungNeigungswinkel",
1799
            "load_factor",
1800
        ]
1801
1802
    prob_dict = {}
1803
1804
    for cap_range in cap_ranges:
1805
        prob_dict[cap_range] = {
1806
            "values": {},
1807
            "probabilities": {},
1808
        }
1809
1810
        for prop in properties:
1811
            v, p = get_probability_for_property(
1812
                mastr_gdf,
1813
                cap_range,
1814
                prop,
1815
            )
1816
1817
            prob_dict[cap_range]["values"][prop] = v
1818
            prob_dict[cap_range]["probabilities"][prop] = p
1819
1820
    return prob_dict
1821
1822
1823
def cap_share_per_cap_range(
1824
    mastr_gdf: gpd.GeoDataFrame,
1825
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1826
) -> dict[tuple[int | float, int | float], float]:
1827
    """
1828
    Calculate the share of PV capacity from the total PV capacity within
1829
    capacity ranges.
1830
    Parameters
1831
    -----------
1832
    mastr_gdf : geopandas.GeoDataFrame
1833
        GeoDataFrame containing geocoded MaStR data.
1834
    cap_ranges : list(tuple(int, int))
1835
        List of capacity ranges to distinguish between. The first tuple should
1836
        start with a zero and the last one should end with infinite.
1837
    Returns
1838
    -------
1839
    dict
1840
        Dictionary with share of PV capacity from the total PV capacity within
1841
        capacity ranges.
1842
    """
1843
    if cap_ranges is None:
1844
        cap_ranges = [
1845
            (0, 30),
1846
            (30, 100),
1847
            (100, float("inf")),
1848
        ]
1849
1850
    cap_share_dict = {}
1851
1852
    total_cap = mastr_gdf.capacity.sum()
1853
1854
    for cap_range in cap_ranges:
1855
        cap_share = (
1856
            mastr_gdf.loc[
1857
                (mastr_gdf.capacity > cap_range[0])
1858
                & (mastr_gdf.capacity <= cap_range[1])
1859
            ].capacity.sum()
1860
            / total_cap
1861
        )
1862
1863
        cap_share_dict[cap_range] = cap_share
1864
1865
    return cap_share_dict
1866
1867
1868
def mean_load_factor_per_cap_range(
1869
    mastr_gdf: gpd.GeoDataFrame,
1870
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1871
) -> dict[tuple[int | float, int | float], float]:
1872
    """
1873
    Calculate the mean roof load factor per capacity range from existing PV
1874
    plants.
1875
    Parameters
1876
    -----------
1877
    mastr_gdf : geopandas.GeoDataFrame
1878
        GeoDataFrame containing geocoded MaStR data.
1879
    cap_ranges : list(tuple(int, int))
1880
        List of capacity ranges to distinguish between. The first tuple should
1881
        start with a zero and the last one should end with infinite.
1882
    Returns
1883
    -------
1884
    dict
1885
        Dictionary with mean roof load factor per capacity range.
1886
    """
1887
    if cap_ranges is None:
1888
        cap_ranges = [
1889
            (0, 30),
1890
            (30, 100),
1891
            (100, float("inf")),
1892
        ]
1893
1894
    load_factor_dict = {}
1895
1896
    for cap_range in cap_ranges:
1897
        load_factor = mastr_gdf.loc[
1898
            (mastr_gdf.load_factor <= 1)
1899
            & (mastr_gdf.capacity > cap_range[0])
1900
            & (mastr_gdf.capacity <= cap_range[1])
1901
        ].load_factor.mean()
1902
1903
        load_factor_dict[cap_range] = load_factor
1904
1905
    return load_factor_dict
1906
1907
1908
def building_area_range_per_cap_range(
1909
    mastr_gdf: gpd.GeoDataFrame,
1910
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1911
    min_building_size: int | float = 10.0,
1912
    upper_quantile: float = 0.95,
1913
    lower_quantile: float = 0.05,
1914
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1915
    """
1916
    Estimate normal building area range per capacity range.
1917
    Calculate the mean roof load factor per capacity range from existing PV
1918
    plants.
1919
    Parameters
1920
    -----------
1921
    mastr_gdf : geopandas.GeoDataFrame
1922
        GeoDataFrame containing geocoded MaStR data.
1923
    cap_ranges : list(tuple(int, int))
1924
        List of capacity ranges to distinguish between. The first tuple should
1925
        start with a zero and the last one should end with infinite.
1926
    min_building_size : int, float
1927
        Minimal building size to consider for PV plants.
1928
    upper_quantile : float
1929
        Upper quantile to estimate maximum building size per capacity range.
1930
    lower_quantile : float
1931
        Lower quantile to estimate minimum building size per capacity range.
1932
    Returns
1933
    -------
1934
    dict
1935
        Dictionary with estimated normal building area range per capacity
1936
        range.
1937
    """
1938
    if cap_ranges is None:
1939
        cap_ranges = [
1940
            (0, 30),
1941
            (30, 100),
1942
            (100, float("inf")),
1943
        ]
1944
1945
    building_area_range_dict = {}
1946
1947
    n_ranges = len(cap_ranges)
1948
1949
    for count, cap_range in enumerate(cap_ranges):
1950
        cap_range_gdf = mastr_gdf.loc[
1951
            (mastr_gdf.capacity > cap_range[0])
1952
            & (mastr_gdf.capacity <= cap_range[1])
1953
        ]
1954
1955
        if count == 0:
1956
            building_area_range_dict[cap_range] = (
1957
                min_building_size,
1958
                cap_range_gdf.building_area.quantile(upper_quantile),
1959
            )
1960
        elif count == n_ranges - 1:
1961
            building_area_range_dict[cap_range] = (
1962
                cap_range_gdf.building_area.quantile(lower_quantile),
1963
                float("inf"),
1964
            )
1965
        else:
1966
            building_area_range_dict[cap_range] = (
1967
                cap_range_gdf.building_area.quantile(lower_quantile),
1968
                cap_range_gdf.building_area.quantile(upper_quantile),
1969
            )
1970
1971
    values = list(building_area_range_dict.values())
1972
1973
    building_area_range_normed_dict = {}
1974
1975
    for count, (cap_range, (min_area, max_area)) in enumerate(
1976
        building_area_range_dict.items()
1977
    ):
1978
        if count == 0:
1979
            building_area_range_normed_dict[cap_range] = (
1980
                min_area,
1981
                np.mean((values[count + 1][0], max_area)),
1982
            )
1983
        elif count == n_ranges - 1:
1984
            building_area_range_normed_dict[cap_range] = (
1985
                np.mean((values[count - 1][1], min_area)),
1986
                max_area,
1987
            )
1988
        else:
1989
            building_area_range_normed_dict[cap_range] = (
1990
                np.mean((values[count - 1][1], min_area)),
1991
                np.mean((values[count + 1][0], max_area)),
1992
            )
1993
1994
    return building_area_range_normed_dict
1995
1996
1997
@timer_func
1998
def desaggregate_pv_in_mv_grid(
1999
    buildings_gdf: gpd.GeoDataFrame,
2000
    pv_cap: float | int,
2001
    **kwargs,
2002
) -> gpd.GeoDataFrame:
2003
    """
2004
    Desaggregate PV capacity on buildings within a given grid district.
2005
    Parameters
2006
    -----------
2007
    buildings_gdf : geopandas.GeoDataFrame
2008
        GeoDataFrame containing buildings within the grid district.
2009
    pv_cap : float, int
2010
        PV capacity to desaggregate.
2011
    Other Parameters
2012
    -----------
2013
    prob_dict : dict
2014
        Dictionary with values and probabilities per capacity range.
2015
    cap_share_dict : dict
2016
        Dictionary with share of PV capacity from the total PV capacity within
2017
        capacity ranges.
2018
    building_area_range_dict : dict
2019
        Dictionary with estimated normal building area range per capacity
2020
        range.
2021
    load_factor_dict : dict
2022
        Dictionary with mean roof load factor per capacity range.
2023
    seed : int
2024
        Seed to use for random operations with NumPy and pandas.
2025
    pv_cap_per_sq_m : float, int
2026
        Average expected, installable PV capacity per square meter.
2027
    Returns
2028
    -------
2029
    geopandas.GeoDataFrame
2030
        GeoDataFrame containing OSM building data with desaggregated PV
2031
        plants.
2032
    """
2033
    bus_id = int(buildings_gdf.bus_id.iat[0])
2034
2035
    rng = default_rng(seed=kwargs["seed"])
2036
    random_state = RandomState(seed=kwargs["seed"])
2037
2038
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
2039
2040
    for cap_range, share in kwargs["cap_share_dict"].items():
2041
        pv_cap_range = pv_cap * share
2042
2043
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
2044
2045
        cap_range_buildings_gdf = buildings_gdf.loc[
2046
            ~buildings_gdf.index.isin(results_df.index)
2047
            & (buildings_gdf.building_area > b_area_min)
2048
            & (buildings_gdf.building_area <= b_area_max)
2049
        ]
2050
2051
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
2052
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2053
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2054
            load_factor=np.nan,
2055
            capacity=np.nan,
2056
        )
2057
2058
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2059
2060
        if total_mean_cap == 0:
2061
            logger.warning(
2062
                f"There are no matching roof for capacity range {cap_range} "
2063
                f"kW in grid {bus_id}. Using all buildings as fallback."
2064
            )
2065
2066
            cap_range_buildings_gdf = buildings_gdf.loc[
2067
                ~buildings_gdf.index.isin(results_df.index)
2068
            ]
2069
2070
            if len(cap_range_buildings_gdf) == 0:
2071
                logger.warning(
2072
                    "There are no roofes available for capacity range "
2073
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
2074
                )
2075
                cap_range_buildings_gdf = buildings_gdf.copy()
2076
2077
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2078
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2079
                load_factor=np.nan,
2080
                capacity=np.nan,
2081
            )
2082
2083
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2084
2085
        elif total_mean_cap < pv_cap_range:
2086
            logger.warning(
2087
                f"Average roof utilization of the roof area in grid {bus_id} "
2088
                f"and capacity range {cap_range} kW is not sufficient. The "
2089
                "roof utilization will be above average."
2090
            )
2091
2092
        frac = max(
2093
            pv_cap_range / total_mean_cap,
2094
            1 / len(cap_range_buildings_gdf),
2095
        )
2096
2097
        samples_gdf = cap_range_buildings_gdf.sample(
2098
            frac=min(1, frac),
2099
            random_state=random_state,
2100
        )
2101
2102
        cap_range_dict = kwargs["prob_dict"][cap_range]
2103
2104
        values_dict = cap_range_dict["values"]
2105
        p_dict = cap_range_dict["probabilities"]
2106
2107
        load_factors = rng.choice(
2108
            a=values_dict["load_factor"],
2109
            size=len(samples_gdf),
2110
            p=p_dict["load_factor"],
2111
        )
2112
2113
        samples_gdf = samples_gdf.assign(
2114
            load_factor=load_factors,
2115
            capacity=(
2116
                samples_gdf.building_area
2117
                * load_factors
2118
                * kwargs["pv_cap_per_sq_m"]
2119
            ).clip(lower=0.4),
2120
        )
2121
2122
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
2123
2124
        samples_gdf = samples_gdf.assign(
2125
            capacity=(samples_gdf.capacity * missing_factor),
2126
            load_factor=(samples_gdf.load_factor * missing_factor),
2127
        )
2128
2129
        assert np.isclose(
2130
            samples_gdf.capacity.sum(),
2131
            pv_cap_range,
2132
            rtol=1e-03,
2133
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
2134
2135
        results_df = pd.concat(
2136
            [
2137
                results_df,
2138
                samples_gdf,
2139
            ],
2140
        )
2141
2142
    total_missing_factor = pv_cap / results_df.capacity.sum()
2143
2144
    results_df = results_df.assign(
2145
        capacity=(results_df.capacity * total_missing_factor),
2146
    )
2147
2148
    assert np.isclose(
2149
        results_df.capacity.sum(),
2150
        pv_cap,
2151
        rtol=1e-03,
2152
    ), f"{results_df.capacity.sum()} != {pv_cap}"
2153
2154
    return gpd.GeoDataFrame(
2155
        results_df,
2156
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 2040 is not entered. Are you sure this can never be the case?
Loading history...
2157
        geometry="geom",
2158
    )
2159
2160
2161
@timer_func
2162
def desaggregate_pv(
2163
    buildings_gdf: gpd.GeoDataFrame,
2164
    cap_df: pd.DataFrame,
2165
    **kwargs,
2166
) -> gpd.GeoDataFrame:
2167
    """
2168
    Desaggregate PV capacity on buildings within a given grid district.
2169
    Parameters
2170
    -----------
2171
    buildings_gdf : geopandas.GeoDataFrame
2172
        GeoDataFrame containing OSM buildings data.
2173
    cap_df : pandas.DataFrame
2174
        DataFrame with total rooftop capacity per mv grid.
2175
    Other Parameters
2176
    -----------
2177
    prob_dict : dict
2178
        Dictionary with values and probabilities per capacity range.
2179
    cap_share_dict : dict
2180
        Dictionary with share of PV capacity from the total PV capacity within
2181
        capacity ranges.
2182
    building_area_range_dict : dict
2183
        Dictionary with estimated normal building area range per capacity
2184
        range.
2185
    load_factor_dict : dict
2186
        Dictionary with mean roof load factor per capacity range.
2187
    seed : int
2188
        Seed to use for random operations with NumPy and pandas.
2189
    pv_cap_per_sq_m : float, int
2190
        Average expected, installable PV capacity per square meter.
2191
    Returns
2192
    -------
2193
    geopandas.GeoDataFrame
2194
        GeoDataFrame containing OSM building data with desaggregated PV
2195
        plants.
2196
    """
2197
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
2198
2199
    building_bus_ids = set(buildings_gdf.bus_id)
2200
    cap_bus_ids = set(cap_df.index)
2201
2202
    logger.debug(
2203
        f"Bus IDs from buildings: {len(building_bus_ids)}\nBus IDs from capacity: "
2204
        f"{len(cap_bus_ids)}"
2205
    )
2206
2207
    if len(building_bus_ids) > len(cap_bus_ids):
2208
        missing = building_bus_ids - cap_bus_ids
2209
    else:
2210
        missing = cap_bus_ids - building_bus_ids
2211
2212
    logger.debug(str(missing))
2213
2214
    # assert set(buildings_gdf.bus_id.unique()) == set(cap_df.index)
2215
2216
    for bus_id in buildings_gdf.bus_id.unique():
2217
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
2218
2219
        pv_installed_gdf = buildings_grid_gdf.loc[
2220
            ~buildings_grid_gdf.end_of_life
2221
        ]
2222
2223
        pv_installed = pv_installed_gdf.capacity.sum()
2224
2225
        pot_buildings_gdf = buildings_grid_gdf.drop(
2226
            index=pv_installed_gdf.index
2227
        )
2228
2229
        if len(pot_buildings_gdf) == 0:
2230
            logger.error(
2231
                f"In grid {bus_id} there are no potential buildings to allocate "
2232
                "PV capacity to. The grid is skipped. This message should only "
2233
                "appear doing test runs with few buildings."
2234
            )
2235
2236
            continue
2237
2238
        pv_target = cap_df.at[bus_id, "capacity"] * 1000
2239
2240
        logger.debug(f"pv_target: {pv_target}")
2241
2242
        pv_missing = pv_target - pv_installed
2243
2244
        if pv_missing <= 0:
2245
            logger.info(
2246
                f"In grid {bus_id} there is more PV installed ({pv_installed: g}) in "
2247
                f"status Quo than allocated within the scenario ({pv_target: g}). No "
2248
                f"new generators are added."
2249
            )
2250
2251
            continue
2252
2253
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
2254
            logger.error(
2255
                f"In grid {bus_id} there is less PV potential ("
2256
                f"{pot_buildings_gdf.max_cap.sum():g} kW) than allocated PV  capacity "
2257
                f"({pv_missing:g} kW). The average roof utilization will be very high."
2258
            )
2259
2260
        gdf = desaggregate_pv_in_mv_grid(
2261
            buildings_gdf=pot_buildings_gdf,
2262
            pv_cap=pv_missing,
2263
            **kwargs,
2264
        )
2265
2266
        logger.debug(f"New cap in grid {bus_id}: {gdf.capacity.sum()}")
2267
        logger.debug(f"Installed cap in grid {bus_id}: {pv_installed}")
2268
        logger.debug(
2269
            f"Total cap in grid {bus_id}: {gdf.capacity.sum() + pv_installed}"
2270
        )
2271
2272
        if not np.isclose(
2273
            gdf.capacity.sum() + pv_installed, pv_target, rtol=1e-3
2274
        ):
2275
            logger.warning(
2276
                f"The desired capacity and actual capacity in grid {bus_id} differ.\n"
2277
                f"Desired cap: {pv_target}\nActual cap: "
2278
                f"{gdf.capacity.sum() + pv_installed}"
2279
            )
2280
2281
        allocated_buildings_gdf = pd.concat(
2282
            [
2283
                allocated_buildings_gdf,
2284
                gdf,
2285
            ]
2286
        )
2287
2288
    logger.debug("Desaggregated scenario.")
2289
    logger.debug(f"Scenario capacity: {cap_df.capacity.sum(): g}")
2290
    logger.debug(
2291
        f"Generator capacity: {allocated_buildings_gdf.capacity.sum(): g}"
2292
    )
2293
2294
    return gpd.GeoDataFrame(
2295
        allocated_buildings_gdf,
2296
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
2297
        geometry="geom",
2298
    )
2299
2300
2301
@timer_func
2302
def add_buildings_meta_data(
2303
    buildings_gdf: gpd.GeoDataFrame,
2304
    prob_dict: dict,
2305
    seed: int,
2306
) -> gpd.GeoDataFrame:
2307
    """
2308
    Randomly add additional metadata to desaggregated PV plants.
2309
    Parameters
2310
    -----------
2311
    buildings_gdf : geopandas.GeoDataFrame
2312
        GeoDataFrame containing OSM buildings data with desaggregated PV
2313
        plants.
2314
    prob_dict : dict
2315
        Dictionary with values and probabilities per capacity range.
2316
    seed : int
2317
        Seed to use for random operations with NumPy and pandas.
2318
    Returns
2319
    -------
2320
    geopandas.GeoDataFrame
2321
        GeoDataFrame containing OSM building data with desaggregated PV
2322
        plants.
2323
    """
2324
    rng = default_rng(seed=seed)
2325
    buildings_gdf = buildings_gdf.reset_index().rename(
2326
        columns={
2327
            "index": "building_id",
2328
        }
2329
    )
2330
2331
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
2332
        cap_range_gdf = buildings_gdf.loc[
2333
            (buildings_gdf.capacity >= min_cap)
2334
            & (buildings_gdf.capacity < max_cap)
2335
        ]
2336
2337
        for key, values in cap_range_prob_dict["values"].items():
2338
            if key == "load_factor":
2339
                continue
2340
2341
            gdf = cap_range_gdf.loc[
2342
                cap_range_gdf[key].isna()
2343
                | cap_range_gdf[key].isnull()
2344
                | (cap_range_gdf[key] == "None")
2345
            ]
2346
2347
            key_vals = rng.choice(
2348
                a=values,
2349
                size=len(gdf),
2350
                p=cap_range_prob_dict["probabilities"][key],
2351
            )
2352
2353
            buildings_gdf.loc[gdf.index, key] = key_vals
2354
2355
    return buildings_gdf
2356
2357
2358
def add_voltage_level(
2359
    buildings_gdf: gpd.GeoDataFrame,
2360
) -> gpd.GeoDataFrame:
2361
    """
2362
    Add voltage level derived from generator capacity to the power plants.
2363
    Parameters
2364
    -----------
2365
    buildings_gdf : geopandas.GeoDataFrame
2366
        GeoDataFrame containing OSM buildings data with desaggregated PV
2367
        plants.
2368
    Returns
2369
    -------
2370
    geopandas.GeoDataFrame
2371
        GeoDataFrame containing OSM building data with voltage level per generator.
2372
    """
2373
2374
    def voltage_levels(p: float) -> int:
2375
        if p < 100:
2376
            return 7
2377
        elif p < 200:
2378
            return 6
2379
        elif p < 5500:
2380
            return 5
2381
        elif p < 20000:
2382
            return 4
2383
        elif p < 120000:
2384
            return 3
2385
        return 1
2386
2387
    return buildings_gdf.assign(
2388
        voltage_level=buildings_gdf.capacity.apply(voltage_levels)
2389
    )
2390
2391
2392
def add_start_up_date(
2393
    buildings_gdf: gpd.GeoDataFrame,
2394
    start: pd.Timestamp,
2395
    end: pd.Timestamp,
2396
    seed: int,
2397
):
2398
    """
2399
    Randomly and linear add start-up date to new pv generators.
2400
    Parameters
2401
    ----------
2402
    buildings_gdf : geopandas.GeoDataFrame
2403
        GeoDataFrame containing OSM buildings data with desaggregated PV
2404
        plants.
2405
    start : pandas.Timestamp
2406
        Minimum Timestamp to use.
2407
    end : pandas.Timestamp
2408
        Maximum Timestamp to use.
2409
    seed : int
2410
        Seed to use for random operations with NumPy and pandas.
2411
    Returns
2412
    -------
2413
    geopandas.GeoDataFrame
2414
        GeoDataFrame containing OSM buildings data with start-up date added.
2415
    """
2416
    rng = default_rng(seed=seed)
2417
2418
    date_range = pd.date_range(start=start, end=end, freq="1D")
2419
2420
    return buildings_gdf.assign(
2421
        start_up_date=rng.choice(date_range, size=len(buildings_gdf))
2422
    )
2423
2424
2425
@timer_func
2426
def allocate_scenarios(
2427
    mastr_gdf: gpd.GeoDataFrame,
2428
    buildings_gdf: gpd.GeoDataFrame,
2429
    last_scenario_gdf: gpd.GeoDataFrame,
2430
    scenario: str,
2431
):
2432
    """
2433
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2434
    Parameters
2435
    ----------
2436
    mastr_gdf : geopandas.GeoDataFrame
2437
        GeoDataFrame containing geocoded MaStR data.
2438
    buildings_gdf : geopandas.GeoDataFrame
2439
        GeoDataFrame containing OSM buildings data.
2440
    last_scenario_gdf : geopandas.GeoDataFrame
2441
        GeoDataFrame containing OSM buildings matched with pv generators from temporal
2442
        preceding scenario.
2443
    scenario : str
2444
        Scenario to desaggrgate and allocate.
2445
    Returns
2446
    -------
2447
    tuple
2448
        geopandas.GeoDataFrame
2449
            GeoDataFrame containing OSM buildings matched with pv generators.
2450
        pandas.DataFrame
2451
            DataFrame containing pv rooftop capacity per grid id.
2452
    """
2453
    grid_districts_gdf = grid_districts(EPSG)
2454
2455
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
2456
2457
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
2458
        grid_districts_gdf,
2459
        federal_state_gdf,
2460
    )
2461
2462
    buildings_overlay_gdf = add_overlay_id_to_buildings(
2463
        buildings_gdf,
2464
        grid_federal_state_gdf,
2465
    )
2466
2467
    valid_buildings_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
2468
2469
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2470
2471
    logger.debug(
2472
        f"cap_per_bus_id_df total capacity: {cap_per_bus_id_df.capacity.sum()}"
2473
    )
2474
2475
    last_scenario_gdf = determine_end_of_life_gens(
2476
        last_scenario_gdf,
2477
        SCENARIO_TIMESTAMP[scenario],
2478
        PV_ROOFTOP_LIFETIME,
2479
    )
2480
2481
    buildings_gdf = calculate_max_pv_cap_per_building(
2482
        valid_buildings_gdf,
2483
        last_scenario_gdf,
2484
        PV_CAP_PER_SQ_M,
2485
        ROOF_FACTOR,
2486
    )
2487
2488
    mastr_gdf = calculate_building_load_factor(
2489
        mastr_gdf,
2490
        buildings_gdf,
2491
    )
2492
2493
    probabilities_dict = probabilities(
2494
        mastr_gdf,
2495
        cap_ranges=CAP_RANGES,
2496
    )
2497
2498
    cap_share_dict = cap_share_per_cap_range(
2499
        mastr_gdf,
2500
        cap_ranges=CAP_RANGES,
2501
    )
2502
2503
    load_factor_dict = mean_load_factor_per_cap_range(
2504
        mastr_gdf,
2505
        cap_ranges=CAP_RANGES,
2506
    )
2507
2508
    building_area_range_dict = building_area_range_per_cap_range(
2509
        mastr_gdf,
2510
        cap_ranges=CAP_RANGES,
2511
        min_building_size=MIN_BUILDING_SIZE,
2512
        upper_quantile=UPPER_QUNATILE,
2513
        lower_quantile=LOWER_QUANTILE,
2514
    )
2515
2516
    allocated_buildings_gdf = desaggregate_pv(
2517
        buildings_gdf=buildings_gdf,
2518
        cap_df=cap_per_bus_id_df,
2519
        prob_dict=probabilities_dict,
2520
        cap_share_dict=cap_share_dict,
2521
        building_area_range_dict=building_area_range_dict,
2522
        load_factor_dict=load_factor_dict,
2523
        seed=SEED,
2524
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2525
    )
2526
2527
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2528
2529
    meta_buildings_gdf = frame_to_numeric(
2530
        add_buildings_meta_data(
2531
            allocated_buildings_gdf,
2532
            probabilities_dict,
2533
            SEED,
2534
        )
2535
    )
2536
2537
    return (
2538
        add_start_up_date(
2539
            meta_buildings_gdf,
2540
            start=last_scenario_gdf.start_up_date.max(),
2541
            end=SCENARIO_TIMESTAMP[scenario],
2542
            seed=SEED,
2543
        ),
2544
        cap_per_bus_id_df,
2545
    )
2546
2547
2548
class EgonPowerPlantPvRoofBuildingScenario(Base):
2549
    __tablename__ = "egon_power_plants_pv_roof_building"
2550
    __table_args__ = {"schema": "supply"}
2551
2552
    index = Column(Integer, primary_key=True, index=True)
2553
    scenario = Column(String)
2554
    building_id = Column(Integer)
2555
    gens_id = Column(String, nullable=True)
2556
    capacity = Column(Float)
2557
    einheitliche_ausrichtung_und_neigungswinkel = Column(Float)
2558
    hauptausrichtung = Column(String)
2559
    hauptausrichtung_neigungswinkel = Column(String)
2560
    voltage_level = Column(Integer)
2561
2562
2563
def create_scenario_table(buildings_gdf):
2564
    """Create mapping table pv_unit <-> building for scenario"""
2565
    EgonPowerPlantPvRoofBuildingScenario.__table__.drop(
2566
        bind=engine, checkfirst=True
2567
    )
2568
    EgonPowerPlantPvRoofBuildingScenario.__table__.create(
2569
        bind=engine, checkfirst=True
2570
    )
2571
2572
    buildings_gdf.rename(columns=COLS_TO_RENAME)[
2573
        COLS_TO_EXPORT
2574
    ].reset_index().to_sql(
2575
        name=EgonPowerPlantPvRoofBuildingScenario.__table__.name,
2576
        schema=EgonPowerPlantPvRoofBuildingScenario.__table__.schema,
2577
        con=db.engine(),
2578
        if_exists="append",
2579
        index=False,
2580
    )
2581
2582
2583
def geocode_mastr_data():
2584
    """
2585
    Read PV rooftop data from MaStR CSV
2586
    TODO: the source will be replaced as soon as the MaStR data is available
2587
     in DB.
2588
    """
2589
    mastr_df = mastr_data(
2590
        MASTR_INDEX_COL,
2591
        MASTR_RELEVANT_COLS,
2592
        MASTR_DTYPES,
2593
        MASTR_PARSE_DATES,
2594
    )
2595
2596
    clean_mastr_df = clean_mastr_data(
2597
        mastr_df,
2598
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
2599
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
2600
        seed=SEED,
2601
        rounding=ROUNDING,
2602
    )
2603
2604
    geocoding_df = geocoding_data(clean_mastr_df)
2605
2606
    ratelimiter = geocoder(USER_AGENT, MIN_DELAY_SECONDS)
2607
2608
    geocode_gdf = geocode_data(geocoding_df, ratelimiter, EPSG)
2609
2610
    create_geocoded_table(geocode_gdf)
2611
2612
2613
def pv_rooftop_to_buildings():
2614
    """Main script, executed as task"""
2615
2616
    mastr_gdf = load_mastr_data()
2617
2618
    buildings_gdf = load_building_data()
2619
2620
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2621
        mastr_gdf, buildings_gdf
2622
    )
2623
2624
    all_buildings_gdf = (
2625
        desagg_mastr_gdf.assign(scenario="status_quo")
2626
        .reset_index()
2627
        .rename(columns={"geometry": "geom", "EinheitMastrNummer": "gens_id"})
2628
    )
2629
2630
    scenario_buildings_gdf = all_buildings_gdf.copy()
2631
2632
    cap_per_bus_id_df = pd.DataFrame()
2633
2634
    for scenario in SCENARIOS:
2635
        logger.debug(f"Desaggregating scenario {scenario}.")
2636
        (
2637
            scenario_buildings_gdf,
2638
            cap_per_bus_id_scenario_df,
2639
        ) = allocate_scenarios(  # noqa: F841
2640
            desagg_mastr_gdf,
2641
            desagg_buildings_gdf,
2642
            scenario_buildings_gdf,
2643
            scenario,
2644
        )
2645
2646
        all_buildings_gdf = gpd.GeoDataFrame(
2647
            pd.concat(
2648
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2649
            ),
2650
            crs=scenario_buildings_gdf.crs,
2651
            geometry="geom",
2652
        )
2653
2654
        cap_per_bus_id_df = pd.concat(
2655
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2656
        )
2657
2658
    # export scenario
2659
    create_scenario_table(add_voltage_level(all_buildings_gdf))
2660