Passed
Pull Request — dev (#809)
by
unknown
01:38
created

data.datasets.power_plants.pv_rooftop_buildings   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 2665
Duplicated Lines 1.2 %

Importance

Changes 0
Metric Value
wmc 115
eloc 1182
dl 32
loc 2665
rs 0.8719
c 0
b 0
f 0

53 Functions

Rating   Name   Duplication   Size   Complexity  
A egon_building_peak_loads() 0 7 1
A synthetic_buildings() 0 25 2
A geocoding_data() 0 18 1
A geocoded_data_from_db() 0 23 2
A add_ags_to_gens() 0 24 1
A drop_buildings_outside_muns() 0 25 1
B clean_mastr_data() 0 143 2
A mastr_data() 0 60 2
A geocode_data() 0 46 3
B zip_and_municipality_from_standort() 0 50 7
A merge_geocode_with_mastr() 0 27 1
A municipality_data() 0 14 2
A osm_buildings() 0 25 2
A add_ags_to_buildings() 0 24 1
A timer_func() 0 11 1
A create_geocoded_table() 0 17 1
A load_mastr_data() 0 35 1
A drop_gens_outside_muns() 0 25 1
A geocoder() 0 22 1
A drop_invalid_entries_from_gdf() 0 25 1
A most_plausible() 0 41 4
A load_building_data() 0 65 3
A get_probability_for_property() 0 46 2
A calculate_building_load_factor() 0 30 1
A determine_end_of_life_gens() 0 28 1
A add_start_up_date() 0 30 1
A buildings_area_per_overlay_id() 0 33 1
B add_voltage_level() 0 31 6
A add_overlay_id_to_buildings() 0 31 1
A cap_per_bus_id() 0 33 1
B desaggregate_pv_in_mv_grid() 0 153 5
A calculate_max_pv_cap_per_building() 0 49 1
A scenario_data() 0 30 2
A drop_buildings_outside_grids() 0 25 1
C building_area_range_per_cap_range() 0 87 8
A add_buildings_meta_data() 0 55 4
B allocate_pv() 0 125 6
A sort_and_qcut_df() 0 29 1
A federal_state_data() 0 24 2
A geocode_mastr_data() 0 28 1
A mean_load_factor_per_cap_range() 0 38 3
B desaggregate_pv() 0 104 5
A grid_districts() 0 32 1
A frame_to_numeric() 0 20 4
A pv_rooftop_to_buildings() 0 51 2
A allocate_to_buildings() 0 32 1
A overlay_grid_districts_with_counties() 0 33 1
B allocate_scenarios() 0 124 1
A validate_output() 0 50 1
A create_scenario_table() 0 17 1
A cap_share_per_cap_range() 0 43 3
A drop_unallocated_gens() 0 27 1
B probabilities() 0 57 5

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.power_plants.pv_rooftop_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate new
3
PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
Data cleaning: Drop duplicates and entries with missing critical data. Determine most
5
plausible capacity from multiple values given in MaStR data. Drop generators which don't
6
have any plausible capacity data (23.5MW > P > 0.1). Randomly and weighted add a
7
start-up date if it is missing. Extract zip and municipality from 'Standort' given in
8
MaStR data. Geocode unique zip and municipality combinations with Nominatim (1sec
9
delay). Drop generators for which geocoding failed or which are located outside the
10
municipalities of Germany. Add some visual sanity checks for cleaned data.
11
Allocation of MaStR data: Allocate each generator to an existing building from OSM.
12
Determine the quantile each generator and building is in depending on the capacity of
13
the generator and the area of the polygon of the building. Randomly distribute
14
generators within each municipality preferably within the same building area quantile as
15
the generators are capacity wise. If not enough buildings exists within a municipality
16
and quantile additional buildings from other quantiles are chosen randomly.
17
Desegregation of pv rooftop scenarios: The scenario data per federal state is linear
18
distributed to the mv grid districts according to the pv rooftop potential per mv grid
19
district. The rooftop potential is estimated from the building area given from the OSM
20
buildings. Grid districts, which are located in several federal states, are allocated PV
21
capacity according to their respective roof potential in the individual federal states.
22
The desegregation of PV plants within a grid districts respects existing plants from
23
MaStR, which did not reach their end of life. New PV plants are randomly and weighted
24
generated using a breakdown of MaStR data as generator basis. Plant metadata (e.g. plant
25
orientation) is also added random and weighted from MaStR data as basis.
26
"""
27
from __future__ import annotations
28
29
from collections import Counter
30
from functools import wraps
31
from pathlib import Path
32
from time import perf_counter
33
from typing import Any
34
35
from geoalchemy2 import Geometry
36
from geopy.extra.rate_limiter import RateLimiter
37
from geopy.geocoders import Nominatim
38
from loguru import logger
39
from numpy.random import RandomState, default_rng
40
from pyproj.crs.crs import CRS
41
from sqlalchemy import BigInteger, Column, Float, Integer, String
42
from sqlalchemy.dialects.postgresql import HSTORE
43
from sqlalchemy.ext.declarative import declarative_base
44
import geopandas as gpd
45
import numpy as np
46
import pandas as pd
47
48
from egon.data import config, db
49
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
50
    OsmBuildingsSynthetic,
51
)
52
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
53
from egon.data.datasets.zensus_vg250 import Vg250Gem
54
55
engine = db.engine()
56
Base = declarative_base()
57
SEED = int(config.settings()["egon-data"]["--random-seed"])
58
59
# TODO: move to yml
60
# mastr data
61
MASTR_RELEVANT_COLS = [
62
    "EinheitMastrNummer",
63
    "Bruttoleistung",
64
    "StatisikFlag",
65
    "Bruttoleistung_extended",
66
    "Nettonennleistung",
67
    "InstallierteLeistung",
68
    "zugeordneteWirkleistungWechselrichter",
69
    "EinheitBetriebsstatus",
70
    "Standort",
71
    "Bundesland",
72
    "Land",
73
    "Landkreis",
74
    "Gemeinde",
75
    "Postleitzahl",
76
    "Ort",
77
    "GeplantesInbetriebnahmedatum",
78
    "Inbetriebnahmedatum",
79
    "GemeinsamerWechselrichterMitSpeicher",
80
    "Lage",
81
    "Leistungsbegrenzung",
82
    "EinheitlicheAusrichtungUndNeigungswinkel",
83
    "Hauptausrichtung",
84
    "HauptausrichtungNeigungswinkel",
85
    "Nebenausrichtung",
86
]
87
88
MASTR_DTYPES = {
89
    "EinheitMastrNummer": str,
90
    "Bruttoleistung": float,
91
    "StatisikFlag": str,
92
    "Bruttoleistung_extended": float,
93
    "Nettonennleistung": float,
94
    "InstallierteLeistung": float,
95
    "zugeordneteWirkleistungWechselrichter": float,
96
    "EinheitBetriebsstatus": str,
97
    "Standort": str,
98
    "Bundesland": str,
99
    "Land": str,
100
    "Landkreis": str,
101
    "Gemeinde": str,
102
    # "Postleitzahl": int,  # fails because of nan values
103
    "Ort": str,
104
    "GemeinsamerWechselrichterMitSpeicher": str,
105
    "Lage": str,
106
    "Leistungsbegrenzung": str,
107
    # this will parse nan values as false wich is not always correct
108
    # "EinheitlicheAusrichtungUndNeigungswinkel": bool,
109
    "Hauptausrichtung": str,
110
    "HauptausrichtungNeigungswinkel": str,
111
    "Nebenausrichtung": str,
112
    "NebenausrichtungNeigungswinkel": str,
113
}
114
115
MASTR_PARSE_DATES = [
116
    "GeplantesInbetriebnahmedatum",
117
    "Inbetriebnahmedatum",
118
]
119
120
MASTR_INDEX_COL = "EinheitMastrNummer"
121
122
EPSG = 4326
123
SRID = 3035
124
125
# data cleaning
126
MAX_REALISTIC_PV_CAP = 23500
127
MIN_REALISTIC_PV_CAP = 0.1
128
ROUNDING = 1
129
130
# geopy
131
MIN_DELAY_SECONDS = 1
132
USER_AGENT = "rli_kh_geocoder"
133
134
# show additional logging information
135
VERBOSE = False
136
137
EXPORT_DIR = Path(__name__).resolve().parent / "data"
138
EXPORT_FILE = "mastr_geocoded.gpkg"
139
EXPORT_PATH = EXPORT_DIR / EXPORT_FILE
140
DRIVER = "GPKG"
141
142
# Number of quantiles
143
Q = 5
144
145
# Scenario Data
146
CARRIER = "solar_rooftop"
147
SCENARIOS = ["eGon2035"]  # , "eGon100RE"]
148
SCENARIO_TIMESTAMP = {
149
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
150
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
151
}
152
PV_ROOFTOP_LIFETIME = pd.Timedelta(30 * 365, unit="D")
153
154
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
155
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
156
MODUL_CAP = 0.4  # kWp
157
MODUL_SIZE = 1.096 * 1.754  # m²
158
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
159
160
# Estimation of usable roof area
161
# Factor for the conversion of building area to roof area
162
# estimation mean roof pitch: 35°
163
# estimation usable roof share: 80%
164
# estimation that only the south side of the building is used for pv
165
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
166
# AREA_FACTOR = 1.221
167
# USABLE_ROOF_SHARE = 0.8
168
# SOUTH_SHARE = 0.5
169
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
170
ROOF_FACTOR = 0.5
171
172
CAP_RANGES = [
173
    (0, 30),
174
    (30, 100),
175
    (100, float("inf")),
176
]
177
178
MIN_BUILDING_SIZE = 10.0
179
UPPER_QUNATILE = 0.95
180
LOWER_QUANTILE = 0.05
181
182
COLS_TO_RENAME = {
183
    "EinheitlicheAusrichtungUndNeigungswinkel": (
184
        "einheitliche_ausrichtung_und_neigungswinkel"
185
    ),
186
    "Hauptausrichtung": "hauptausrichtung",
187
    "HauptausrichtungNeigungswinkel": "hauptausrichtung_neigungswinkel",
188
}
189
190
COLS_TO_EXPORT = [
191
    "scenario",
192
    "building_id",
193
    "gens_id",
194
    "capacity",
195
    "einheitliche_ausrichtung_und_neigungswinkel",
196
    "hauptausrichtung",
197
    "hauptausrichtung_neigungswinkel",
198
    "voltage_level",
199
]
200
201
INCLUDE_SYNTHETIC_BUILDINGS = False
202
ONLY_BUILDINGS_WITH_DEMAND = True
203
ASSURE_MINIMUM_BUILDING_SIZE = True
204
205
206
def timer_func(func):
207
    @wraps(func)
208
    def timeit_wrapper(*args, **kwargs):
209
        start_time = perf_counter()
210
        result = func(*args, **kwargs)
211
        end_time = perf_counter()
212
        total_time = end_time - start_time
213
        print(f"Function {func.__name__} took {total_time:.4f} seconds.")
214
        return result
215
216
    return timeit_wrapper
217
218
219
@timer_func
220
def mastr_data(
221
    index_col: str | int | list[str] | list[int],
222
    usecols: list[str],
223
    dtype: dict[str, Any] | None,
224
    parse_dates: list[str] | None,
225
) -> pd.DataFrame:
226
    """
227
    Read MaStR data from csv.
228
229
    Parameters
230
    -----------
231
    index_col : str, int or list of str or int
232
        Column(s) to use as the row labels of the DataFrame.
233
    usecols : list of str
234
        Return a subset of the columns.
235
    dtype : dict of column (str) -> type (any), optional
236
        Data type for data or columns.
237
    parse_dates : list of names (str), optional
238
        Try to parse given columns to datetime.
239
    Returns
240
    -------
241
    pandas.DataFrame
242
        DataFrame containing MaStR data.
243
    """
244
    mastr_path = Path(
245
        config.datasets()["power_plants"]["sources"]["mastr_pv"]
246
    ).resolve()
247
248
    mastr_df = pd.read_csv(
249
        mastr_path,
250
        index_col=index_col,
251
        usecols=usecols,
252
        dtype=dtype,
253
        parse_dates=parse_dates,
254
    )
255
256
    mastr_df = mastr_df.loc[
257
        (mastr_df.StatisikFlag == "B")
258
        & (mastr_df.EinheitBetriebsstatus == "InBetrieb")
259
        & (mastr_df.Land == "Deutschland")
260
        & (mastr_df.Lage == "BaulicheAnlagen")
261
    ]
262
263
    if (
264
        config.settings()["egon-data"]["--dataset-boundary"]
265
        == "Schleswig-Holstein"
266
    ):
267
        init_len = len(mastr_df)
268
269
        mastr_df = mastr_df.loc[mastr_df.Bundesland == "SchleswigHolstein"]
270
271
        logger.info(
272
            f"Using only MaStR data within Schleswig-Holstein. "
273
            f"{init_len - len(mastr_df)} of {init_len} generators are dropped."
274
        )
275
276
    logger.debug("MaStR data loaded.")
277
278
    return mastr_df
279
280
281
@timer_func
282
def clean_mastr_data(
283
    mastr_df: pd.DataFrame,
284
    max_realistic_pv_cap: int | float,
285
    min_realistic_pv_cap: int | float,
286
    rounding: int,
287
    seed: int,
288
) -> pd.DataFrame:
289
    """
290
    Clean the MaStR data from implausible data.
291
292
    * Drop MaStR ID duplicates.
293
    * Drop generators with implausible capacities.
294
    * Drop generators without any kind of start-up date.
295
    * Clean up Standort column and capacity.
296
297
    Parameters
298
    -----------
299
    mastr_df : pandas.DataFrame
300
        DataFrame containing MaStR data.
301
    max_realistic_pv_cap : int or float
302
        Maximum capacity, which is considered to be realistic.
303
    min_realistic_pv_cap : int or float
304
        Minimum capacity, which is considered to be realistic.
305
    rounding : int
306
        Rounding to use when cleaning up capacity. E.g. when
307
        rounding is 1 a capacity of 9.93 will be rounded to 9.9.
308
    seed : int
309
        Seed to use for random operations with NumPy and pandas.
310
    Returns
311
    -------
312
    pandas.DataFrame
313
        DataFrame containing cleaned MaStR data.
314
    """
315
    init_len = len(mastr_df)
316
317
    # drop duplicates
318
    mastr_df = mastr_df.loc[~mastr_df.index.duplicated()]
319
320
    # drop invalid entries in standort
321
    index_to_drop = mastr_df.loc[
322
        (mastr_df.Standort.isna()) | (mastr_df.Standort.isnull())
323
    ].index
324
325
    mastr_df = mastr_df.loc[~mastr_df.index.isin(index_to_drop)]
326
327
    df = mastr_df[
328
        [
329
            "Bruttoleistung",
330
            "Bruttoleistung_extended",
331
            "Nettonennleistung",
332
            "zugeordneteWirkleistungWechselrichter",
333
            "InstallierteLeistung",
334
        ]
335
    ].round(rounding)
336
337
    # use only the smallest capacity rating if multiple are given
338
    mastr_df = mastr_df.assign(
339
        capacity=[
340
            most_plausible(p_tub, min_realistic_pv_cap)
341
            for p_tub in df.itertuples(index=False)
342
        ]
343
    )
344
345
    # drop generators without any capacity info
346
    # and capacity of zero
347
    # and if the capacity is > 23.5 MW, because
348
    # Germanies largest rooftop PV is 23 MW
349
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
350
    mastr_df = mastr_df.loc[
351
        (~mastr_df.capacity.isna())
352
        & (mastr_df.capacity <= max_realistic_pv_cap)
353
        & (mastr_df.capacity > min_realistic_pv_cap)
354
    ]
355
356
    # get zip and municipality
357
    mastr_df[["zip_and_municipality", "drop_this"]] = pd.DataFrame(
358
        mastr_df.Standort.astype(str)
359
        .apply(
360
            zip_and_municipality_from_standort,
361
            args=(VERBOSE,),
362
        )
363
        .tolist(),
364
        index=mastr_df.index,
365
    )
366
367
    # drop invalid entries
368
    mastr_df = mastr_df.loc[mastr_df.drop_this].drop(columns="drop_this")
369
370
    # add ", Deutschland" just in case
371
    mastr_df = mastr_df.assign(
372
        zip_and_municipality=(mastr_df.zip_and_municipality + ", Deutschland")
373
    )
374
375
    # get consistent start-up date
376
    mastr_df = mastr_df.assign(
377
        start_up_date=mastr_df.Inbetriebnahmedatum,
378
    )
379
380
    mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
381
        mastr_df.start_up_date.isna()
382
    ].assign(
383
        start_up_date=mastr_df.GeplantesInbetriebnahmedatum.loc[
384
            mastr_df.start_up_date.isna()
385
        ]
386
    )
387
388
    # randomly and weighted fill missing start-up dates
389
    pool = mastr_df.loc[
390
        ~mastr_df.start_up_date.isna()
391
    ].start_up_date.to_numpy()
392
393
    size = len(mastr_df) - len(pool)
394
395
    if size > 0:
396
        np.random.seed(seed)
397
398
        choice = np.random.choice(
399
            pool,
400
            size=size,
401
            replace=False,
402
        )
403
404
        mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
405
            mastr_df.start_up_date.isna()
406
        ].assign(start_up_date=choice)
407
408
        logger.info(
409
            f"Randomly and weigthed added start-up date to {size} generators."
410
        )
411
412
    mastr_df = mastr_df.assign(
413
        start_up_date=pd.to_datetime(mastr_df.start_up_date, utc=True)
414
    )
415
416
    end_len = len(mastr_df)
417
    logger.debug(
418
        f"Dropped {init_len - end_len} "
419
        f"({((init_len - end_len) / init_len) * 100:g}%)"
420
        f" of {init_len} rows from MaStR DataFrame."
421
    )
422
423
    return mastr_df
424
425
426
def zip_and_municipality_from_standort(
427
    standort: str,
428
    verbose: bool = False,
429
) -> tuple[str, bool]:
430
    """
431
    Get zip code and municipality from Standort string split into a list.
432
    Parameters
433
    -----------
434
    standort : str
435
        Standort as given from MaStR data.
436
    verbose : bool
437
        Logs additional info if True.
438
    Returns
439
    -------
440
    str
441
        Standort with only the zip code and municipality
442
        as well a ', Germany' added.
443
    """
444
    if verbose:
445
        logger.debug(f"Uncleaned String: {standort}")
446
447
    standort_list = standort.split()
448
449
    found = False
450
    count = 0
451
452
    for count, elem in enumerate(standort_list):
453
        if len(elem) != 5:
454
            continue
455
        if not elem.isnumeric():
456
            continue
457
458
        found = True
459
460
        break
461
462
    if found:
463
        cleaned_str = " ".join(standort_list[count:])
464
465
        if verbose:
466
            logger.debug(f"Cleaned String:   {cleaned_str}")
467
468
        return cleaned_str, found
469
470
    logger.warning(
471
        "Couldn't identify zip code. This entry will be dropped."
472
        f" Original standort: {standort}."
473
    )
474
475
    return standort, found
476
477
478
def most_plausible(
479
    p_tub: tuple,
480
    min_realistic_pv_cap: int | float,
481
) -> float:
482
    """
483
    Try to determine the most plausible capacity.
484
    Try to determine the most plausible capacity from a given
485
    generator from MaStR data.
486
    Parameters
487
    -----------
488
    p_tub : tuple
489
        Tuple containing the different capacities given in
490
        the MaStR data.
491
    min_realistic_pv_cap : int or float
492
        Minimum capacity, which is considered to be realistic.
493
    Returns
494
    -------
495
    float
496
        Capacity of the generator estimated as the most realistic.
497
    """
498
    count = Counter(p_tub).most_common(3)
499
500
    if len(count) == 1:
501
        return count[0][0]
502
503
    val1 = count[0][0]
504
    val2 = count[1][0]
505
506
    if len(count) == 2:
507
        min_val = min(val1, val2)
508
        max_val = max(val1, val2)
509
    else:
510
        val3 = count[2][0]
511
512
        min_val = min(val1, val2, val3)
513
        max_val = max(val1, val2, val3)
514
515
    if min_val < min_realistic_pv_cap:
516
        return max_val
517
518
    return min_val
519
520
521
def geocoder(
522
    user_agent: str,
523
    min_delay_seconds: int,
524
) -> RateLimiter:
525
    """
526
    Setup Nominatim geocoding class.
527
    Parameters
528
    -----------
529
    user_agent : str
530
        The app name.
531
    min_delay_seconds : int
532
        Delay in seconds to use between requests to Nominatim.
533
        A minimum of 1 is advised.
534
    Returns
535
    -------
536
    geopy.extra.rate_limiter.RateLimiter
537
        Nominatim RateLimiter geocoding class to use for geocoding.
538
    """
539
    locator = Nominatim(user_agent=user_agent)
540
    return RateLimiter(
541
        locator.geocode,
542
        min_delay_seconds=min_delay_seconds,
543
    )
544
545
546
def geocoding_data(
547
    clean_mastr_df: pd.DataFrame,
548
) -> pd.DataFrame:
549
    """
550
    Setup DataFrame to geocode.
551
    Parameters
552
    -----------
553
    clean_mastr_df : pandas.DataFrame
554
        DataFrame containing cleaned MaStR data.
555
    Returns
556
    -------
557
    pandas.DataFrame
558
        DataFrame containing all unique combinations of
559
        zip codes with municipalities for geocoding.
560
    """
561
    return pd.DataFrame(
562
        data=clean_mastr_df.zip_and_municipality.unique(),
563
        columns=["zip_and_municipality"],
564
    )
565
566
567
@timer_func
568
def geocode_data(
569
    geocoding_df: pd.DataFrame,
570
    ratelimiter: RateLimiter,
571
    epsg: int,
572
) -> gpd.GeoDataFrame:
573
    """
574
    Geocode zip code and municipality.
575
    Extract latitude, longitude and altitude.
576
    Transfrom latitude and longitude to shapely
577
    Point and return a geopandas GeoDataFrame.
578
    Parameters
579
    -----------
580
    geocoding_df : pandas.DataFrame
581
        DataFrame containing all unique combinations of
582
        zip codes with municipalities for geocoding.
583
    ratelimiter : geopy.extra.rate_limiter.RateLimiter
584
        Nominatim RateLimiter geocoding class to use for geocoding.
585
    epsg : int
586
        EPSG ID to use as CRS.
587
    Returns
588
    -------
589
    geopandas.GeoDataFrame
590
        GeoDataFrame containing all unique combinations of
591
        zip codes with municipalities with matching geolocation.
592
    """
593
    logger.info(f"Geocoding {len(geocoding_df)} locations.")
594
595
    geocode_df = geocoding_df.assign(
596
        location=geocoding_df.zip_and_municipality.apply(ratelimiter)
597
    )
598
599
    geocode_df = geocode_df.assign(
600
        point=geocode_df.location.apply(
601
            lambda loc: tuple(loc.point) if loc else None
602
        )
603
    )
604
605
    geocode_df[["latitude", "longitude", "altitude"]] = pd.DataFrame(
606
        geocode_df.point.tolist(), index=geocode_df.index
607
    )
608
609
    return gpd.GeoDataFrame(
610
        geocode_df,
611
        geometry=gpd.points_from_xy(geocode_df.longitude, geocode_df.latitude),
612
        crs=f"EPSG:{epsg}",
613
    )
614
615
616
def merge_geocode_with_mastr(
617
    clean_mastr_df: pd.DataFrame, geocode_gdf: gpd.GeoDataFrame
618
) -> gpd.GeoDataFrame:
619
    """
620
    Merge geometry to original mastr data.
621
    Parameters
622
    -----------
623
    clean_mastr_df : pandas.DataFrame
624
        DataFrame containing cleaned MaStR data.
625
    geocode_gdf : geopandas.GeoDataFrame
626
        GeoDataFrame containing all unique combinations of
627
        zip codes with municipalities with matching geolocation.
628
    Returns
629
    -------
630
    gepandas.GeoDataFrame
631
        GeoDataFrame containing cleaned MaStR data with
632
        matching geolocation from geocoding.
633
    """
634
    return gpd.GeoDataFrame(
635
        clean_mastr_df.merge(
636
            geocode_gdf[["zip_and_municipality", "geometry"]],
637
            how="left",
638
            left_on="zip_and_municipality",
639
            right_on="zip_and_municipality",
640
        ),
641
        crs=geocode_gdf.crs,
642
    ).set_index(clean_mastr_df.index)
643
644
645
def drop_invalid_entries_from_gdf(
646
    gdf: gpd.GeoDataFrame,
647
) -> gpd.GeoDataFrame:
648
    """
649
    Drop invalid entries from geopandas GeoDataFrame.
650
    TODO: how to omit the logging from geos here???
651
    Parameters
652
    -----------
653
    gdf : geopandas.GeoDataFrame
654
        GeoDataFrame to be checked for validity.
655
    Returns
656
    -------
657
    gepandas.GeoDataFrame
658
        GeoDataFrame with rows with invalid geometries
659
        dropped.
660
    """
661
    valid_gdf = gdf.loc[gdf.is_valid]
662
663
    logger.debug(
664
        f"{len(gdf) - len(valid_gdf)} "
665
        f"({(len(gdf) - len(valid_gdf)) / len(gdf) * 100:g}%) "
666
        f"of {len(gdf)} values were invalid and are dropped."
667
    )
668
669
    return valid_gdf
670
671
672
@timer_func
673
def municipality_data() -> gpd.GeoDataFrame:
674
    """
675
    Get municipality data from eGo^n Database.
676
    Returns
677
    -------
678
    gepandas.GeoDataFrame
679
        GeoDataFrame with municipality data.
680
    """
681
    with db.session_scope() as session:
682
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
683
684
    return gpd.read_postgis(
685
        query.statement, query.session.bind, index_col="ags"
686
    )
687
688
689
@timer_func
690
def add_ags_to_gens(
691
    valid_mastr_gdf: gpd.GeoDataFrame,
692
    municipalities_gdf: gpd.GeoDataFrame,
693
) -> gpd.GeoDataFrame:
694
    """
695
    Add information about AGS ID to generators.
696
    Parameters
697
    -----------
698
    valid_mastr_gdf : geopandas.GeoDataFrame
699
        GeoDataFrame with valid and cleaned MaStR data.
700
    municipalities_gdf : geopandas.GeoDataFrame
701
        GeoDataFrame with municipality data.
702
    Returns
703
    -------
704
    gepandas.GeoDataFrame
705
        GeoDataFrame with valid and cleaned MaStR data
706
        with AGS ID added.
707
    """
708
    return valid_mastr_gdf.sjoin(
709
        municipalities_gdf,
710
        how="left",
711
        predicate="intersects",
712
    ).rename(columns={"index_right": "ags"})
713
714
715
def drop_gens_outside_muns(
716
    valid_mastr_gdf: gpd.GeoDataFrame,
717
) -> gpd.GeoDataFrame:
718
    """
719
    Drop all generators outside of municipalities.
720
    Parameters
721
    -----------
722
    valid_mastr_gdf : geopandas.GeoDataFrame
723
        GeoDataFrame with valid and cleaned MaStR data.
724
    Returns
725
    -------
726
    gepandas.GeoDataFrame
727
        GeoDataFrame with valid and cleaned MaStR data
728
        with generatos without an AGS ID dropped.
729
    """
730
    gdf = valid_mastr_gdf.loc[~valid_mastr_gdf.ags.isna()]
731
732
    logger.debug(
733
        f"{len(valid_mastr_gdf) - len(gdf)} "
734
        f"({(len(valid_mastr_gdf) - len(gdf)) / len(valid_mastr_gdf) * 100:g}%) "
735
        f"of {len(valid_mastr_gdf)} values are outside of the municipalities"
736
        " and are therefore dropped."
737
    )
738
739
    return gdf
740
741
742
class EgonMastrPvRoofGeocoded(Base):
743
    __tablename__ = "egon_mastr_pv_roof_geocoded"
744
    __table_args__ = {"schema": "supply"}
745
746
    zip_and_municipality = Column(String, primary_key=True, index=True)
747
    location = Column(String)
748
    point = Column(String)
749
    latitude = Column(Float)
750
    longitude = Column(Float)
751
    altitude = Column(Float)
752
    geometry = Column(Geometry(srid=EPSG))
753
754
755
def create_geocoded_table(geocode_gdf):
756
    """
757
    Create geocoded table mastr pv rooftop
758
    Parameters
759
    -----------
760
    geocode_gdf : geopandas.GeoDataFrame
761
        GeoDataFrame containing geocoding information on pv rooftop locations.
762
    """
763
    EgonMastrPvRoofGeocoded.__table__.drop(bind=engine, checkfirst=True)
764
    EgonMastrPvRoofGeocoded.__table__.create(bind=engine, checkfirst=True)
765
766
    geocode_gdf.to_postgis(
767
        name=EgonMastrPvRoofGeocoded.__table__.name,
768
        schema=EgonMastrPvRoofGeocoded.__table__.schema,
769
        con=db.engine(),
770
        if_exists="append",
771
        index=False,
772
        # dtype={}
773
    )
774
775
776
def geocoded_data_from_db(
777
    epsg: str | int,
778
) -> gpd.GeoDataFrame:
779
    """
780
    Read OSM buildings data from eGo^n Database.
781
    Parameters
782
    -----------
783
    to_crs : pyproj.crs.crs.CRS
784
        CRS to transform geometries to.
785
    Returns
786
    -------
787
    geopandas.GeoDataFrame
788
        GeoDataFrame containing OSM buildings data.
789
    """
790
    with db.session_scope() as session:
791
        query = session.query(
792
            EgonMastrPvRoofGeocoded.zip_and_municipality,
793
            EgonMastrPvRoofGeocoded.geometry,
794
        )
795
796
    return gpd.read_postgis(
797
        query.statement, query.session.bind, geom_col="geometry"
798
    ).to_crs(f"EPSG:{epsg}")
799
800
801
def load_mastr_data():
802
    """Read PV rooftop data from MaStR CSV
803
    Note: the source will be replaced as soon as the MaStR data is available
804
    in DB.
805
    Returns
806
    -------
807
    geopandas.GeoDataFrame
808
        GeoDataFrame containing MaStR data with geocoded locations.
809
    """
810
    mastr_df = mastr_data(
811
        MASTR_INDEX_COL,
812
        MASTR_RELEVANT_COLS,
813
        MASTR_DTYPES,
814
        MASTR_PARSE_DATES,
815
    )
816
817
    clean_mastr_df = clean_mastr_data(
818
        mastr_df,
819
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
820
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
821
        seed=SEED,
822
        rounding=ROUNDING,
823
    )
824
825
    geocode_gdf = geocoded_data_from_db(EPSG)
826
827
    mastr_gdf = merge_geocode_with_mastr(clean_mastr_df, geocode_gdf)
828
829
    valid_mastr_gdf = drop_invalid_entries_from_gdf(mastr_gdf)
830
831
    municipalities_gdf = municipality_data()
832
833
    valid_mastr_gdf = add_ags_to_gens(valid_mastr_gdf, municipalities_gdf)
834
835
    return drop_gens_outside_muns(valid_mastr_gdf)
836
837
838
class OsmBuildingsFiltered(Base):
839
    __tablename__ = "osm_buildings_filtered"
840
    __table_args__ = {"schema": "openstreetmap"}
841
842
    osm_id = Column(BigInteger)
843
    amenity = Column(String)
844
    building = Column(String)
845
    name = Column(String)
846
    geom = Column(Geometry(srid=SRID), index=True)
847
    area = Column(Float)
848
    geom_point = Column(Geometry(srid=SRID), index=True)
849
    tags = Column(HSTORE)
850
    id = Column(BigInteger, primary_key=True, index=True)
851
852
853
@timer_func
854
def osm_buildings(
855
    to_crs: CRS,
856
) -> gpd.GeoDataFrame:
857
    """
858
    Read OSM buildings data from eGo^n Database.
859
    Parameters
860
    -----------
861
    to_crs : pyproj.crs.crs.CRS
862
        CRS to transform geometries to.
863
    Returns
864
    -------
865
    geopandas.GeoDataFrame
866
        GeoDataFrame containing OSM buildings data.
867
    """
868
    with db.session_scope() as session:
869
        query = session.query(
870
            OsmBuildingsFiltered.id,
871
            OsmBuildingsFiltered.area,
872
            OsmBuildingsFiltered.geom_point.label("geom"),
873
        )
874
875
    return gpd.read_postgis(
876
        query.statement, query.session.bind, index_col="id"
877
    ).to_crs(to_crs)
878
879
880
@timer_func
881
def synthetic_buildings(
882
    to_crs: CRS,
883
) -> gpd.GeoDataFrame:
884
    """
885
    Read synthetic buildings data from eGo^n Database.
886
    Parameters
887
    -----------
888
    to_crs : pyproj.crs.crs.CRS
889
        CRS to transform geometries to.
890
    Returns
891
    -------
892
    geopandas.GeoDataFrame
893
        GeoDataFrame containing OSM buildings data.
894
    """
895
    with db.session_scope() as session:
896
        query = session.query(
897
            OsmBuildingsSynthetic.id,
898
            OsmBuildingsSynthetic.area,
899
            OsmBuildingsSynthetic.geom_point.label("geom"),
900
        )
901
902
    return gpd.read_postgis(
903
        query.statement, query.session.bind, index_col="id"
904
    ).to_crs(to_crs)
905
906
907
@timer_func
908
def add_ags_to_buildings(
909
    buildings_gdf: gpd.GeoDataFrame,
910
    municipalities_gdf: gpd.GeoDataFrame,
911
) -> gpd.GeoDataFrame:
912
    """
913
    Add information about AGS ID to buildings.
914
    Parameters
915
    -----------
916
    buildings_gdf : geopandas.GeoDataFrame
917
        GeoDataFrame containing OSM buildings data.
918
    municipalities_gdf : geopandas.GeoDataFrame
919
        GeoDataFrame with municipality data.
920
    Returns
921
    -------
922
    gepandas.GeoDataFrame
923
        GeoDataFrame containing OSM buildings data
924
        with AGS ID added.
925
    """
926
    return buildings_gdf.sjoin(
927
        municipalities_gdf,
928
        how="left",
929
        predicate="intersects",
930
    ).rename(columns={"index_right": "ags"})
931
932
933
def drop_buildings_outside_muns(
934
    buildings_gdf: gpd.GeoDataFrame,
935
) -> gpd.GeoDataFrame:
936
    """
937
    Drop all buildings outside of municipalities.
938
    Parameters
939
    -----------
940
    buildings_gdf : geopandas.GeoDataFrame
941
        GeoDataFrame containing OSM buildings data.
942
    Returns
943
    -------
944
    gepandas.GeoDataFrame
945
        GeoDataFrame containing OSM buildings data
946
        with buildings without an AGS ID dropped.
947
    """
948
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
949
950
    logger.debug(
951
        f"{len(buildings_gdf) - len(gdf)} "
952
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
953
        f"of {len(buildings_gdf)} values are outside of the municipalities "
954
        "and are therefore dropped."
955
    )
956
957
    return gdf
958
959
960
def egon_building_peak_loads():
961
    sql = f"""
962
    SELECT building_id
963
    FROM demand.egon_building_peak_loads
964
    """
965
966
    return db.select_dataframe(sql).building_id.astype(int).sort_values()
967
968
969
@timer_func
970
def load_building_data():
971
    """
972
    Read buildings from DB
973
    Tables:
974
975
    * `openstreetmap.osm_buildings_filtered` (from OSM)
976
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
977
978
    Use column `id` for both as it is unique hence you concat both datasets. If
979
    INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be loaded.
980
981
    Returns
982
    -------
983
    gepandas.GeoDataFrame
984
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
985
        dropped.
986
    """
987
988
    municipalities_gdf = municipality_data()
989
990
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
991
992
    if INCLUDE_SYNTHETIC_BUILDINGS:
993
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
994
995
        buildings_gdf = gpd.GeoDataFrame(
996
            pd.concat(
997
                [
998
                    osm_buildings_gdf,
999
                    synthetic_buildings_gdf,
1000
                ]
1001
            ),
1002
            geometry="geom",
1003
            crs=osm_buildings_gdf.crs,
1004
        ).rename(columns={"area": "building_area"})
1005
1006
    else:
1007
        buildings_gdf = osm_buildings_gdf.rename(
1008
            columns={"area": "building_area"}
1009
        )
1010
1011
    if ONLY_BUILDINGS_WITH_DEMAND:
1012
        building_ids = egon_building_peak_loads()
1013
1014
        init_len = len(building_ids)
1015
1016
        building_ids = [
1017
            b_id for b_id in building_ids if b_id in buildings_gdf.index
1018
        ]
1019
1020
        end_len = len(building_ids)
1021
1022
        logger.debug(
1023
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) of IDs within OSM"
1024
            f" buildings."
1025
        )
1026
1027
        buildings_gdf = buildings_gdf.loc[building_ids]
1028
1029
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
1030
1031
    logger.debug("Loaded buildings.")
1032
1033
    return drop_buildings_outside_muns(buildings_ags_gdf)
1034
1035
1036
@timer_func
1037
def sort_and_qcut_df(
1038
    df: pd.DataFrame | gpd.GeoDataFrame,
1039
    col: str,
1040
    q: int,
1041
) -> pd.DataFrame | gpd.GeoDataFrame:
1042
    """
1043
    Determine the quantile of a given attribute in a (Geo)DataFrame.
1044
    Sort the (Geo)DataFrame in ascending order for the given attribute.
1045
    Parameters
1046
    -----------
1047
    df : pandas.DataFrame or geopandas.GeoDataFrame
1048
        (Geo)DataFrame to sort and qcut.
1049
    col : str
1050
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
1051
    q : int
1052
        Number of quantiles.
1053
    Returns
1054
    -------
1055
    pandas.DataFrame or gepandas.GeoDataFrame
1056
        Sorted and qcut (Geo)DataFrame.
1057
    """
1058
    df = df.sort_values(col, ascending=True)
1059
1060
    return df.assign(
1061
        quant=pd.qcut(
1062
            df[col],
1063
            q=q,
1064
            labels=range(q),
1065
        )
1066
    )
1067
1068
1069
@timer_func
1070
def allocate_pv(
1071
    q_mastr_gdf: gpd.GeoDataFrame,
1072
    q_buildings_gdf: gpd.GeoDataFrame,
1073
    seed: int,
1074
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1075
    """
1076
    Allocate the MaStR pv generators to the OSM buildings.
1077
    This will determine a building for each pv generator if there are more
1078
    buildings than generators within a given AGS. Primarily generators are
1079
    distributed with the same qunatile as the buildings. Multiple assignment
1080
    is excluded.
1081
    Parameters
1082
    -----------
1083
    q_mastr_gdf : geopandas.GeoDataFrame
1084
        GeoDataFrame containing geocoded and qcut MaStR data.
1085
    q_buildings_gdf : geopandas.GeoDataFrame
1086
        GeoDataFrame containing qcut OSM buildings data.
1087
    seed : int
1088
        Seed to use for random operations with NumPy and pandas.
1089
    Returns
1090
    -------
1091
    tuple with two geopandas.GeoDataFrame s
1092
        GeoDataFrame containing MaStR data allocated to building IDs.
1093
        GeoDataFrame containing building data allocated to MaStR IDs.
1094
    """
1095
    rng = default_rng(seed=seed)
1096
1097
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan)
1098
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan)
1099
1100
    ags_list = q_buildings_gdf.ags.unique()
1101
1102
    num_ags = len(ags_list)
1103
1104
    t0 = perf_counter()
1105
1106
    for count, ags in enumerate(ags_list):
1107
1108
        buildings = q_buildings_gdf.loc[
1109
            (q_buildings_gdf.ags == ags) & (q_buildings_gdf.gens_id.isna())
1110
        ]
1111
        gens = q_mastr_gdf.loc[
1112
            (q_mastr_gdf.ags == ags) & (q_mastr_gdf.building_id.isna())
1113
        ]
1114
1115
        len_build = len(buildings)
1116
        len_gens = len(gens)
1117
1118
        if len_build < len_gens:
1119
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
1120
            logger.error(
1121
                f"There are {len_gens} generators and only {len_build}"
1122
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
1123
                "generators were truncated to match the amount of buildings."
1124
            )
1125
1126
            assert len_build == len(gens)
1127
1128
        for quant in gens.quant.unique():
1129
            q_buildings = buildings.loc[
1130
                (buildings.quant == quant) & (buildings.gens_id.isna())
1131
            ]
1132
            q_gens = gens.loc[gens.quant == quant]
1133
1134
            len_build = len(q_buildings)
1135
            len_gens = len(q_gens)
1136
1137
            if len_build < len_gens:
1138
                delta = len_gens - len_build
1139
1140
                logger.warning(
1141
                    f"There are {len_gens} generators and only {len_build} "
1142
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
1143
                    f"buildings from AGS {ags} will be added randomly."
1144
                )
1145
1146
                add_buildings = pd.Index(
1147
                    rng.choice(
1148
                        buildings.loc[
1149
                            (buildings.quant != quant)
1150
                            & (buildings.gens_id.isna())
1151
                        ].index,
1152
                        size=delta,
1153
                        replace=False,
1154
                    )
1155
                )
1156
1157
                q_buildings = buildings.loc[
1158
                    q_buildings.index.append(add_buildings)
1159
                ]
1160
1161
                assert len(q_buildings) == len_gens
1162
1163
            chosen_buildings = pd.Index(
1164
                rng.choice(
1165
                    q_buildings.index,
1166
                    size=len_gens,
1167
                    replace=False,
1168
                )
1169
            )
1170
1171
            # q_mastr_gdf.loc[q_gens.index] = q_mastr_gdf.loc[
1172
            #     q_gens.index
1173
            # ].assign(building_id=chosen_buildings)
1174
1175
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
1176
1177
        if count % 100 == 0:
1178
            logger.debug(
1179
                f"Allocation of {count / num_ags * 100:g} % of AGS done. It took "
1180
                f"{perf_counter() - t0:g} seconds."
1181
            )
1182
1183
            t0 = perf_counter()
1184
1185
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
1186
1187
    q_mastr_gdf.loc[
1188
        assigned_buildings.gens_id, "building_id"
1189
    ] = assigned_buildings.index
1190
1191
    logger.debug("Allocated status quo generators to buildings.")
1192
1193
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
1194
1195
1196
def frame_to_numeric(
1197
    df: pd.DataFrame | gpd.GeoDataFrame,
1198
) -> pd.DataFrame | gpd.GeoDataFrame:
1199
    """
1200
    Try to convert all columns of a DataFrame to numeric ignoring errors.
1201
    Parameters
1202
    ----------
1203
    df : pandas.DataFrame or geopandas.GeoDataFrame
1204
    Returns
1205
    -------
1206
    pandas.DataFrame or geopandas.GeoDataFrame
1207
    """
1208
    if str(df.index.dtype) == "object":
1209
        df.index = pd.to_numeric(df.index, errors="ignore")
1210
1211
    for col in df.columns:
1212
        if str(df[col].dtype) == "object":
1213
            df[col] = pd.to_numeric(df[col], errors="ignore")
1214
1215
    return df
1216
1217
1218
def validate_output(
1219
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
1220
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
1221
) -> None:
1222
    """
1223
    Validate output.
1224
1225
    * Validate that there are exactly as many buildings with a pv system as there are
1226
      pv systems with a building
1227
    * Validate that the building IDs with a pv system are the same building IDs as
1228
      assigned to the pv systems
1229
    * Validate that the pv system IDs with a building are the same pv system IDs as
1230
      assigned to the buildings
1231
1232
    Parameters
1233
    -----------
1234
    desagg_mastr_gdf : geopandas.GeoDataFrame
1235
        GeoDataFrame containing MaStR data allocated to building IDs.
1236
    desagg_buildings_gdf : geopandas.GeoDataFrame
1237
        GeoDataFrame containing building data allocated to MaStR IDs.
1238
    """
1239
    assert len(
1240
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
1241
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
1242
    assert (
1243
        np.sort(
1244
            desagg_mastr_gdf.loc[
1245
                ~desagg_mastr_gdf.building_id.isna()
1246
            ].building_id.unique()
1247
        )
1248
        == np.sort(
1249
            desagg_buildings_gdf.loc[
1250
                ~desagg_buildings_gdf.gens_id.isna()
1251
            ].index.unique()
1252
        )
1253
    ).all()
1254
    assert (
1255
        np.sort(
1256
            desagg_mastr_gdf.loc[
1257
                ~desagg_mastr_gdf.building_id.isna()
1258
            ].index.unique()
1259
        )
1260
        == np.sort(
1261
            desagg_buildings_gdf.loc[
1262
                ~desagg_buildings_gdf.gens_id.isna()
1263
            ].gens_id.unique()
1264
        )
1265
    ).all()
1266
1267
    logger.debug("Validated output.")
1268
1269
1270
def drop_unallocated_gens(
1271
    gdf: gpd.GeoDataFrame,
1272
) -> gpd.GeoDataFrame:
1273
    """
1274
    Drop generators which did not get allocated.
1275
1276
    Parameters
1277
    -----------
1278
    gdf : geopandas.GeoDataFrame
1279
        GeoDataFrame containing MaStR data allocated to building IDs.
1280
    Returns
1281
    -------
1282
    geopandas.GeoDataFrame
1283
        GeoDataFrame containing MaStR data with generators dropped which did not get
1284
        allocated.
1285
    """
1286
    init_len = len(gdf)
1287
    gdf = gdf.loc[~gdf.building_id.isna()]
1288
    end_len = len(gdf)
1289
1290
    logger.debug(
1291
        f"Dropped {init_len - end_len} "
1292
        f"({((init_len - end_len) / init_len) * 100:g}%)"
1293
        f" of {init_len} unallocated rows from MaStR DataFrame."
1294
    )
1295
1296
    return gdf
1297
1298
1299
@timer_func
1300
def allocate_to_buildings(
1301
    mastr_gdf: gpd.GeoDataFrame,
1302
    buildings_gdf: gpd.GeoDataFrame,
1303
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1304
    """
1305
    Allocate status quo pv rooftop generators to buildings.
1306
    Parameters
1307
    -----------
1308
    mastr_gdf : geopandas.GeoDataFrame
1309
        GeoDataFrame containing MaStR data with geocoded locations.
1310
    buildings_gdf : geopandas.GeoDataFrame
1311
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
1312
        dropped.
1313
    Returns
1314
    -------
1315
    tuple with two geopandas.GeoDataFrame s
1316
        GeoDataFrame containing MaStR data allocated to building IDs.
1317
        GeoDataFrame containing building data allocated to MaStR IDs.
1318
    """
1319
    logger.debug("Starting allocation of status quo.")
1320
1321
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
1322
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
1323
1324
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
1325
        q_mastr_gdf, q_buildings_gdf, SEED
1326
    )
1327
1328
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
1329
1330
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
1331
1332
1333
@timer_func
1334
def grid_districts(
1335
    epsg: int,
1336
) -> gpd.GeoDataFrame:
1337
    """
1338
    Load mv grid district geo data from eGo^n Database as
1339
    geopandas.GeoDataFrame.
1340
    Parameters
1341
    -----------
1342
    epsg : int
1343
        EPSG ID to use as CRS.
1344
    Returns
1345
    -------
1346
    geopandas.GeoDataFrame
1347
        GeoDataFrame containing mv grid district ID and geo shapes data.
1348
    """
1349
    gdf = db.select_geodataframe(
1350
        """
1351
        SELECT bus_id, geom
1352
        FROM grid.egon_mv_grid_district
1353
        ORDER BY bus_id
1354
        """,
1355
        index_col="bus_id",
1356
        geom_col="geom",
1357
        epsg=epsg,
1358
    )
1359
1360
    gdf.index = gdf.index.astype(int)
1361
1362
    logger.debug("Grid districts loaded.")
1363
1364
    return gdf
1365
1366
1367
def scenario_data(
1368
    carrier: str = "solar_rooftop",
1369
    scenario: str = "eGon2035",
1370
) -> pd.DataFrame:
1371
    """
1372
    Get scenario capacity data from eGo^n Database.
1373
    Parameters
1374
    -----------
1375
    carrier : str
1376
        Carrier type to filter table by.
1377
    scenario : str
1378
        Scenario to filter table by.
1379
    Returns
1380
    -------
1381
    geopandas.GeoDataFrame
1382
        GeoDataFrame with scenario capacity data in GW.
1383
    """
1384
    with db.session_scope() as session:
1385
        query = session.query(EgonScenarioCapacities).filter(
1386
            EgonScenarioCapacities.carrier == carrier,
1387
            EgonScenarioCapacities.scenario_name == scenario,
1388
        )
1389
1390
    df = pd.read_sql(
1391
        query.statement, query.session.bind, index_col="index"
1392
    ).sort_index()
1393
1394
    logger.debug("Scenario capacity data loaded.")
1395
1396
    return df
1397
1398
1399 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1400
    __tablename__ = "vg250_lan"
1401
    __table_args__ = {"schema": "boundaries"}
1402
1403
    id = Column(BigInteger, primary_key=True, index=True)
1404
    ade = Column(BigInteger)
1405
    gf = Column(BigInteger)
1406
    bsg = Column(BigInteger)
1407
    ars = Column(String)
1408
    ags = Column(String)
1409
    sdv_ars = Column(String)
1410
    gen = Column(String)
1411
    bez = Column(String)
1412
    ibz = Column(BigInteger)
1413
    bem = Column(String)
1414
    nbd = Column(String)
1415
    sn_l = Column(String)
1416
    sn_r = Column(String)
1417
    sn_k = Column(String)
1418
    sn_v1 = Column(String)
1419
    sn_v2 = Column(String)
1420
    sn_g = Column(String)
1421
    fk_s3 = Column(String)
1422
    nuts = Column(String)
1423
    ars_0 = Column(String)
1424
    ags_0 = Column(String)
1425
    wsk = Column(String)
1426
    debkg_id = Column(String)
1427
    rs = Column(String)
1428
    sdv_rs = Column(String)
1429
    rs_0 = Column(String)
1430
    geometry = Column(Geometry(srid=EPSG), index=True)
1431
1432
1433
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1434
    """
1435
    Get feder state data from eGo^n Database.
1436
    Parameters
1437
    -----------
1438
    to_crs : pyproj.crs.crs.CRS
1439
        CRS to transform geometries to.
1440
    Returns
1441
    -------
1442
    geopandas.GeoDataFrame
1443
        GeoDataFrame with federal state data.
1444
    """
1445
    with db.session_scope() as session:
1446
        query = session.query(
1447
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1448
        )
1449
1450
    gdf = gpd.read_postgis(
1451
        query.statement, query.session.bind, index_col="id"
1452
    ).to_crs(to_crs)
1453
1454
    logger.debug("Federal State data loaded.")
1455
1456
    return gdf
1457
1458
1459
@timer_func
1460
def overlay_grid_districts_with_counties(
1461
    mv_grid_district_gdf: gpd.GeoDataFrame,
1462
    federal_state_gdf: gpd.GeoDataFrame,
1463
) -> gpd.GeoDataFrame:
1464
    """
1465
    Calculate the intersections of mv grid districts and counties.
1466
    Parameters
1467
    -----------
1468
    mv_grid_district_gdf : gpd.GeoDataFrame
1469
        GeoDataFrame containing mv grid district ID and geo shapes data.
1470
    federal_state_gdf : gpd.GeoDataFrame
1471
        GeoDataFrame with federal state data.
1472
    Returns
1473
    -------
1474
    geopandas.GeoDataFrame
1475
        GeoDataFrame containing OSM buildings data.
1476
    """
1477
    logger.debug(
1478
        "Calculating intersection overlay between mv grid districts and "
1479
        "counties. This may take a while..."
1480
    )
1481
1482
    gdf = gpd.overlay(
1483
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1484
        mv_grid_district_gdf.reset_index(),
1485
        how="intersection",
1486
        keep_geom_type=True,
1487
    )
1488
1489
    logger.debug("Done!")
1490
1491
    return gdf
1492
1493
1494
@timer_func
1495
def add_overlay_id_to_buildings(
1496
    buildings_gdf: gpd.GeoDataFrame,
1497
    grid_federal_state_gdf: gpd.GeoDataFrame,
1498
) -> gpd.GeoDataFrame:
1499
    """
1500
    Add information about overlay ID to buildings.
1501
    Parameters
1502
    -----------
1503
    buildings_gdf : geopandas.GeoDataFrame
1504
        GeoDataFrame containing OSM buildings data.
1505
    grid_federal_state_gdf : geopandas.GeoDataFrame
1506
        GeoDataFrame with intersection shapes between counties and grid districts.
1507
    Returns
1508
    -------
1509
    geopandas.GeoDataFrame
1510
        GeoDataFrame containing OSM buildings data with overlay ID added.
1511
    """
1512
    gdf = (
1513
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1514
        .sjoin(
1515
            grid_federal_state_gdf,
1516
            how="left",
1517
            predicate="intersects",
1518
        )
1519
        .rename(columns={"index_right": "overlay_id"})
1520
    )
1521
1522
    logger.debug("Added overlay ID to OSM buildings.")
1523
1524
    return gdf
1525
1526
1527
def drop_buildings_outside_grids(
1528
    buildings_gdf: gpd.GeoDataFrame,
1529
) -> gpd.GeoDataFrame:
1530
    """
1531
    Drop all buildings outside of grid areas.
1532
    Parameters
1533
    -----------
1534
    buildings_gdf : geopandas.GeoDataFrame
1535
        GeoDataFrame containing OSM buildings data.
1536
    Returns
1537
    -------
1538
    gepandas.GeoDataFrame
1539
        GeoDataFrame containing OSM buildings data
1540
        with buildings without an bus ID dropped.
1541
    """
1542
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1543
1544
    logger.debug(
1545
        f"{len(buildings_gdf) - len(gdf)} "
1546
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1547
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1548
        "and are therefore dropped."
1549
    )
1550
1551
    return gdf
1552
1553
1554
def buildings_area_per_overlay_id(
1555
    valid_buildings_gdf: gpd.GeoDataFrame,
1556
    grid_federal_state_gdf: gpd.GeoDataFrame,
1557
) -> pd.DataFrame:
1558
    """
1559
    Calculate total area of building per overlay ID.
1560
    TODO: This is very simplified at the moment. If possible add some kind
1561
     of weights to the area per building to differentiate between single-
1562
     family and multi-family houses as well as agricultural and industrial
1563
     buildings.
1564
    Parameters
1565
    -----------
1566
    valid_buildings_gdf : geopandas.GeoDataFrame
1567
        GeoDataFrame containing OSM buildings data with overlay ID added.
1568
    grid_federal_state_gdf : geopandas.GeoDataFrame
1569
        GeoDataFrame with intersection shapes between counties and grid districts.
1570
    Returns
1571
    -------
1572
    geopandas.GeoDataFrame
1573
        GeoDataFrame with grid data and total buildings area.
1574
    """
1575
    return grid_federal_state_gdf.merge(
1576
        valid_buildings_gdf[
1577
            [
1578
                "building_area",
1579
                "overlay_id",
1580
            ]
1581
        ]
1582
        .groupby("overlay_id")
1583
        .sum(),
1584
        how="left",
1585
        left_index=True,
1586
        right_index=True,
1587
    )
1588
1589
1590
def cap_per_bus_id(
1591
    scenario: str,
1592
    # overlay_gdf: gpd.GeoDataFrame,
1593
    # scenario_df: pd.DataFrame,
1594
    # conversion: int | float = 10**3,
1595
) -> pd.DataFrame:
1596
    """
1597
    Calculate total pv rooftop capacity per grid district dependent on
1598
    available buildings area.
1599
    TODO: This is very simplified at the moment as it assumes an evenly
1600
     distribution of generators depending on the available buildings
1601
     area per grid district.
1602
    Parameters
1603
    -----------
1604
    overlay_gdf : geopandas.GeoDataFrame
1605
        GeoDataFrame with grid data and total buildings area.
1606
    conversion : int, float
1607
        Conversion factor to match units. E.g. MW -> kW
1608
    Returns
1609
    -------
1610
    pandas.DataFrame
1611
        DataFrame with total rooftop capacity per mv grid.
1612
    """
1613
    targets = config.datasets()["solar_rooftop"]["targets"]
1614
1615
    sql = f"""
1616
    SELECT bus as bus_id, p_nom as capacity
1617
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1618
    WHERE carrier = 'solar_rooftop'
1619
    AND scn_name = '{scenario}'
1620
    """
1621
1622
    return db.select_dataframe(sql, index_col="bus_id")
1623
1624
    # overlay_gdf = overlay_gdf.assign(capacity=np.nan)
1625
    #
1626
    # for cap, nuts in scenario_df[["capacity", "nuts"]].itertuples(index=False):
1627
    #     nuts_gdf = overlay_gdf.loc[overlay_gdf.nuts == nuts]
1628
    #
1629
    #     capacity = nuts_gdf.building_area.multiply(
1630
    #         cap / nuts_gdf.building_area.sum()
1631
    #     )
1632
    #
1633
    #     overlay_gdf.loc[nuts_gdf.index] = overlay_gdf.loc[
1634
    #         nuts_gdf.index
1635
    #     ].assign(capacity=capacity.multiply(conversion).to_numpy())
1636
    #
1637
    # return overlay_gdf[["bus_id", "capacity"]].groupby("bus_id").sum()
1638
1639
1640
def determine_end_of_life_gens(
1641
    mastr_gdf: gpd.GeoDataFrame,
1642
    scenario_timestamp: pd.Timestamp,
1643
    pv_rooftop_lifetime: pd.Timedelta,
1644
) -> gpd.GeoDataFrame:
1645
    """
1646
    Determine if an old PV system has reached its end of life.
1647
    Parameters
1648
    -----------
1649
    mastr_gdf : geopandas.GeoDataFrame
1650
        GeoDataFrame containing geocoded MaStR data.
1651
    scenario_timestamp : pandas.Timestamp
1652
        Timestamp at which the scenario takes place.
1653
    pv_rooftop_lifetime : pandas.Timedelta
1654
        Average expected lifetime of PV rooftop systems.
1655
    Returns
1656
    -------
1657
    geopandas.GeoDataFrame
1658
        GeoDataFrame containing geocoded MaStR data and info if the system
1659
        has reached its end of life.
1660
    """
1661
    mastr_gdf = mastr_gdf.assign(
1662
        age=scenario_timestamp - mastr_gdf.start_up_date
1663
    )
1664
1665
    logger.debug("Determined if pv rooftop systems reached their end of life.")
1666
1667
    return mastr_gdf.assign(end_of_life=pv_rooftop_lifetime < mastr_gdf.age)
1668
1669
1670
def calculate_max_pv_cap_per_building(
1671
    buildings_gdf: gpd.GeoDataFrame,
1672
    mastr_gdf: gpd.GeoDataFrame,
1673
    pv_cap_per_sq_m: float | int,
1674
    roof_factor: float | int,
1675
) -> gpd.GeoDataFrame:
1676
    """
1677
    Calculate the estimated maximum possible PV capacity per building.
1678
    Parameters
1679
    -----------
1680
    buildings_gdf : geopandas.GeoDataFrame
1681
        GeoDataFrame containing OSM buildings data.
1682
    mastr_gdf : geopandas.GeoDataFrame
1683
        GeoDataFrame containing geocoded MaStR data.
1684
    pv_cap_per_sq_m : float, int
1685
        Average expected, installable PV capacity per square meter.
1686
    roof_factor : float, int
1687
        Average for PV usable roof area share.
1688
    Returns
1689
    -------
1690
    geopandas.GeoDataFrame
1691
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1692
        capacity.
1693
    """
1694
    gdf = (
1695
        buildings_gdf.reset_index()
1696
        .merge(
1697
            mastr_gdf[
1698
                [
1699
                    "capacity",
1700
                    "end_of_life",
1701
                    "building_id",
1702
                    "EinheitlicheAusrichtungUndNeigungswinkel",
1703
                    "Hauptausrichtung",
1704
                    "HauptausrichtungNeigungswinkel",
1705
                ]
1706
            ],
1707
            how="left",
1708
            left_on="id",
1709
            right_on="building_id",
1710
        )
1711
        .set_index("id")
1712
        .drop(columns="building_id")
1713
    )
1714
1715
    return gdf.assign(
1716
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1717
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1718
        bus_id=gdf.bus_id.astype(int),
1719
    )
1720
1721
1722
def calculate_building_load_factor(
1723
    mastr_gdf: gpd.GeoDataFrame,
1724
    buildings_gdf: gpd.GeoDataFrame,
1725
    rounding: int = 4,
1726
) -> gpd.GeoDataFrame:
1727
    """
1728
    Calculate the roof load factor from existing PV systems.
1729
    Parameters
1730
    -----------
1731
    mastr_gdf : geopandas.GeoDataFrame
1732
        GeoDataFrame containing geocoded MaStR data.
1733
    buildings_gdf : geopandas.GeoDataFrame
1734
        GeoDataFrame containing OSM buildings data.
1735
    rounding : int
1736
        Rounding to use for load factor.
1737
    Returns
1738
    -------
1739
    geopandas.GeoDataFrame
1740
        GeoDataFrame containing geocoded MaStR data with calculated load factor.
1741
    """
1742
    gdf = mastr_gdf.merge(
1743
        buildings_gdf[["max_cap", "building_area"]].loc[
1744
            ~buildings_gdf["max_cap"].isna()
1745
        ],
1746
        how="left",
1747
        left_on="building_id",
1748
        right_index=True,
1749
    )
1750
1751
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1752
1753
1754
def get_probability_for_property(
1755
    mastr_gdf: gpd.GeoDataFrame,
1756
    cap_range: tuple[int | float, int | float],
1757
    prop: str,
1758
) -> tuple[np.array, np.array]:
1759
    """
1760
    Calculate the probability of the different options of a property of the
1761
    existing PV plants.
1762
    Parameters
1763
    -----------
1764
    mastr_gdf : geopandas.GeoDataFrame
1765
        GeoDataFrame containing geocoded MaStR data.
1766
    cap_range : tuple(int, int)
1767
        Capacity range of PV plants to look at.
1768
    prop : str
1769
        Property to calculate probabilities for. String needs to be in columns
1770
        of mastr_gdf.
1771
    Returns
1772
    -------
1773
    tuple
1774
        numpy.array
1775
            Unique values of property.
1776
        numpy.array
1777
            Probabilties per unique value.
1778
    """
1779
    cap_range_gdf = mastr_gdf.loc[
1780
        (mastr_gdf.capacity > cap_range[0])
1781
        & (mastr_gdf.capacity <= cap_range[1])
1782
    ]
1783
1784
    if prop == "load_factor":
1785
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1786
1787
    count = Counter(
1788
        cap_range_gdf[prop].loc[
1789
            ~cap_range_gdf[prop].isna()
1790
            & ~cap_range_gdf[prop].isnull()
1791
            & ~(cap_range_gdf[prop] == "None")
1792
        ]
1793
    )
1794
1795
    values = np.array(list(count.keys()))
1796
    probabilities = np.fromiter(count.values(), dtype=float)
1797
    probabilities = probabilities / np.sum(probabilities)
1798
1799
    return values, probabilities
1800
1801
1802
@timer_func
1803
def probabilities(
1804
    mastr_gdf: gpd.GeoDataFrame,
1805
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1806
    properties: list[str] | None = None,
1807
) -> dict:
1808
    """
1809
    Calculate the probability of the different options of properties of the
1810
    existing PV plants.
1811
    Parameters
1812
    -----------
1813
    mastr_gdf : geopandas.GeoDataFrame
1814
        GeoDataFrame containing geocoded MaStR data.
1815
    cap_ranges : list(tuple(int, int))
1816
        List of capacity ranges to distinguish between. The first tuple should
1817
        start with a zero and the last one should end with infinite.
1818
    properties : list(str)
1819
        List of properties to calculate probabilities for. Strings needs to be
1820
        in columns of mastr_gdf.
1821
    Returns
1822
    -------
1823
    dict
1824
        Dictionary with values and probabilities per capacity range.
1825
    """
1826
    if cap_ranges is None:
1827
        cap_ranges = [
1828
            (0, 30),
1829
            (30, 100),
1830
            (100, float("inf")),
1831
        ]
1832
    if properties is None:
1833
        properties = [
1834
            "EinheitlicheAusrichtungUndNeigungswinkel",
1835
            "Hauptausrichtung",
1836
            "HauptausrichtungNeigungswinkel",
1837
            "load_factor",
1838
        ]
1839
1840
    prob_dict = {}
1841
1842
    for cap_range in cap_ranges:
1843
        prob_dict[cap_range] = {
1844
            "values": {},
1845
            "probabilities": {},
1846
        }
1847
1848
        for prop in properties:
1849
            v, p = get_probability_for_property(
1850
                mastr_gdf,
1851
                cap_range,
1852
                prop,
1853
            )
1854
1855
            prob_dict[cap_range]["values"][prop] = v
1856
            prob_dict[cap_range]["probabilities"][prop] = p
1857
1858
    return prob_dict
1859
1860
1861
def cap_share_per_cap_range(
1862
    mastr_gdf: gpd.GeoDataFrame,
1863
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1864
) -> dict[tuple[int | float, int | float], float]:
1865
    """
1866
    Calculate the share of PV capacity from the total PV capacity within
1867
    capacity ranges.
1868
    Parameters
1869
    -----------
1870
    mastr_gdf : geopandas.GeoDataFrame
1871
        GeoDataFrame containing geocoded MaStR data.
1872
    cap_ranges : list(tuple(int, int))
1873
        List of capacity ranges to distinguish between. The first tuple should
1874
        start with a zero and the last one should end with infinite.
1875
    Returns
1876
    -------
1877
    dict
1878
        Dictionary with share of PV capacity from the total PV capacity within
1879
        capacity ranges.
1880
    """
1881
    if cap_ranges is None:
1882
        cap_ranges = [
1883
            (0, 30),
1884
            (30, 100),
1885
            (100, float("inf")),
1886
        ]
1887
1888
    cap_share_dict = {}
1889
1890
    total_cap = mastr_gdf.capacity.sum()
1891
1892
    for cap_range in cap_ranges:
1893
        cap_share = (
1894
            mastr_gdf.loc[
1895
                (mastr_gdf.capacity > cap_range[0])
1896
                & (mastr_gdf.capacity <= cap_range[1])
1897
            ].capacity.sum()
1898
            / total_cap
1899
        )
1900
1901
        cap_share_dict[cap_range] = cap_share
1902
1903
    return cap_share_dict
1904
1905
1906
def mean_load_factor_per_cap_range(
1907
    mastr_gdf: gpd.GeoDataFrame,
1908
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1909
) -> dict[tuple[int | float, int | float], float]:
1910
    """
1911
    Calculate the mean roof load factor per capacity range from existing PV
1912
    plants.
1913
    Parameters
1914
    -----------
1915
    mastr_gdf : geopandas.GeoDataFrame
1916
        GeoDataFrame containing geocoded MaStR data.
1917
    cap_ranges : list(tuple(int, int))
1918
        List of capacity ranges to distinguish between. The first tuple should
1919
        start with a zero and the last one should end with infinite.
1920
    Returns
1921
    -------
1922
    dict
1923
        Dictionary with mean roof load factor per capacity range.
1924
    """
1925
    if cap_ranges is None:
1926
        cap_ranges = [
1927
            (0, 30),
1928
            (30, 100),
1929
            (100, float("inf")),
1930
        ]
1931
1932
    load_factor_dict = {}
1933
1934
    for cap_range in cap_ranges:
1935
        load_factor = mastr_gdf.loc[
1936
            (mastr_gdf.load_factor <= 1)
1937
            & (mastr_gdf.capacity > cap_range[0])
1938
            & (mastr_gdf.capacity <= cap_range[1])
1939
        ].load_factor.mean()
1940
1941
        load_factor_dict[cap_range] = load_factor
1942
1943
    return load_factor_dict
1944
1945
1946
def building_area_range_per_cap_range(
1947
    mastr_gdf: gpd.GeoDataFrame,
1948
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1949
    min_building_size: int | float = 10.0,
1950
    upper_quantile: float = 0.95,
1951
    lower_quantile: float = 0.05,
1952
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1953
    """
1954
    Estimate normal building area range per capacity range.
1955
    Calculate the mean roof load factor per capacity range from existing PV
1956
    plants.
1957
    Parameters
1958
    -----------
1959
    mastr_gdf : geopandas.GeoDataFrame
1960
        GeoDataFrame containing geocoded MaStR data.
1961
    cap_ranges : list(tuple(int, int))
1962
        List of capacity ranges to distinguish between. The first tuple should
1963
        start with a zero and the last one should end with infinite.
1964
    min_building_size : int, float
1965
        Minimal building size to consider for PV plants.
1966
    upper_quantile : float
1967
        Upper quantile to estimate maximum building size per capacity range.
1968
    lower_quantile : float
1969
        Lower quantile to estimate minimum building size per capacity range.
1970
    Returns
1971
    -------
1972
    dict
1973
        Dictionary with estimated normal building area range per capacity
1974
        range.
1975
    """
1976
    if cap_ranges is None:
1977
        cap_ranges = [
1978
            (0, 30),
1979
            (30, 100),
1980
            (100, float("inf")),
1981
        ]
1982
1983
    building_area_range_dict = {}
1984
1985
    n_ranges = len(cap_ranges)
1986
1987
    for count, cap_range in enumerate(cap_ranges):
1988
        cap_range_gdf = mastr_gdf.loc[
1989
            (mastr_gdf.capacity > cap_range[0])
1990
            & (mastr_gdf.capacity <= cap_range[1])
1991
        ]
1992
1993
        if count == 0:
1994
            building_area_range_dict[cap_range] = (
1995
                min_building_size,
1996
                cap_range_gdf.building_area.quantile(upper_quantile),
1997
            )
1998
        elif count == n_ranges - 1:
1999
            building_area_range_dict[cap_range] = (
2000
                cap_range_gdf.building_area.quantile(lower_quantile),
2001
                float("inf"),
2002
            )
2003
        else:
2004
            building_area_range_dict[cap_range] = (
2005
                cap_range_gdf.building_area.quantile(lower_quantile),
2006
                cap_range_gdf.building_area.quantile(upper_quantile),
2007
            )
2008
2009
    values = list(building_area_range_dict.values())
2010
2011
    building_area_range_normed_dict = {}
2012
2013
    for count, (cap_range, (min_area, max_area)) in enumerate(
2014
        building_area_range_dict.items()
2015
    ):
2016
        if count == 0:
2017
            building_area_range_normed_dict[cap_range] = (
2018
                min_area,
2019
                np.mean((values[count + 1][0], max_area)),
2020
            )
2021
        elif count == n_ranges - 1:
2022
            building_area_range_normed_dict[cap_range] = (
2023
                np.mean((values[count - 1][1], min_area)),
2024
                max_area,
2025
            )
2026
        else:
2027
            building_area_range_normed_dict[cap_range] = (
2028
                np.mean((values[count - 1][1], min_area)),
2029
                np.mean((values[count + 1][0], max_area)),
2030
            )
2031
2032
    return building_area_range_normed_dict
2033
2034
2035
@timer_func
2036
def desaggregate_pv_in_mv_grid(
2037
    buildings_gdf: gpd.GeoDataFrame,
2038
    pv_cap: float | int,
2039
    **kwargs,
2040
) -> gpd.GeoDataFrame:
2041
    """
2042
    Desaggregate PV capacity on buildings within a given grid district.
2043
    Parameters
2044
    -----------
2045
    buildings_gdf : geopandas.GeoDataFrame
2046
        GeoDataFrame containing buildings within the grid district.
2047
    pv_cap : float, int
2048
        PV capacity to desaggregate.
2049
    Other Parameters
2050
    -----------
2051
    prob_dict : dict
2052
        Dictionary with values and probabilities per capacity range.
2053
    cap_share_dict : dict
2054
        Dictionary with share of PV capacity from the total PV capacity within
2055
        capacity ranges.
2056
    building_area_range_dict : dict
2057
        Dictionary with estimated normal building area range per capacity
2058
        range.
2059
    load_factor_dict : dict
2060
        Dictionary with mean roof load factor per capacity range.
2061
    seed : int
2062
        Seed to use for random operations with NumPy and pandas.
2063
    pv_cap_per_sq_m : float, int
2064
        Average expected, installable PV capacity per square meter.
2065
    Returns
2066
    -------
2067
    geopandas.GeoDataFrame
2068
        GeoDataFrame containing OSM building data with desaggregated PV
2069
        plants.
2070
    """
2071
    bus_id = int(buildings_gdf.bus_id.iat[0])
2072
2073
    rng = default_rng(seed=kwargs["seed"])
2074
    random_state = RandomState(seed=kwargs["seed"])
2075
2076
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
2077
2078
    for cap_range, share in kwargs["cap_share_dict"].items():
2079
        pv_cap_range = pv_cap * share
2080
2081
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
2082
2083
        cap_range_buildings_gdf = buildings_gdf.loc[
2084
            ~buildings_gdf.index.isin(results_df.index)
2085
            & (buildings_gdf.building_area > b_area_min)
2086
            & (buildings_gdf.building_area <= b_area_max)
2087
        ]
2088
2089
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
2090
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2091
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2092
            load_factor=np.nan,
2093
            capacity=np.nan,
2094
        )
2095
2096
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2097
2098
        if total_mean_cap == 0:
2099
            logger.warning(
2100
                f"There are no matching roof for capacity range {cap_range} "
2101
                f"kW in grid {bus_id}. Using all buildings as fallback."
2102
            )
2103
2104
            cap_range_buildings_gdf = buildings_gdf.loc[
2105
                ~buildings_gdf.index.isin(results_df.index)
2106
            ]
2107
2108
            if len(cap_range_buildings_gdf) == 0:
2109
                logger.warning(
2110
                    "There are no roofes available for capacity range "
2111
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
2112
                )
2113
                cap_range_buildings_gdf = buildings_gdf.copy()
2114
2115
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2116
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2117
                load_factor=np.nan,
2118
                capacity=np.nan,
2119
            )
2120
2121
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2122
2123
        elif total_mean_cap < pv_cap_range:
2124
            logger.warning(
2125
                f"Average roof utilization of the roof area in grid {bus_id} "
2126
                f"and capacity range {cap_range} kW is not sufficient. The "
2127
                "roof utilization will be above average."
2128
            )
2129
2130
        frac = max(
2131
            pv_cap_range / total_mean_cap,
2132
            1 / len(cap_range_buildings_gdf),
2133
        )
2134
2135
        samples_gdf = cap_range_buildings_gdf.sample(
2136
            frac=min(1, frac),
2137
            random_state=random_state,
2138
        )
2139
2140
        cap_range_dict = kwargs["prob_dict"][cap_range]
2141
2142
        values_dict = cap_range_dict["values"]
2143
        p_dict = cap_range_dict["probabilities"]
2144
2145
        load_factors = rng.choice(
2146
            a=values_dict["load_factor"],
2147
            size=len(samples_gdf),
2148
            p=p_dict["load_factor"],
2149
        )
2150
2151
        samples_gdf = samples_gdf.assign(
2152
            load_factor=load_factors,
2153
            capacity=(
2154
                samples_gdf.building_area
2155
                * load_factors
2156
                * kwargs["pv_cap_per_sq_m"]
2157
            ).clip(lower=0.4),
2158
        )
2159
2160
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
2161
2162
        samples_gdf = samples_gdf.assign(
2163
            capacity=(samples_gdf.capacity * missing_factor),
2164
            load_factor=(samples_gdf.load_factor * missing_factor),
2165
        )
2166
2167
        assert np.isclose(
2168
            samples_gdf.capacity.sum(),
2169
            pv_cap_range,
2170
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
2171
2172
        results_df = pd.concat(
2173
            [
2174
                results_df,
2175
                samples_gdf,
2176
            ],
2177
        )
2178
2179
    assert np.isclose(
2180
        results_df.capacity.sum(),
2181
        pv_cap,
2182
    ), f"{results_df.capacity.sum()} != {pv_cap}"
2183
2184
    return gpd.GeoDataFrame(
2185
        results_df,
2186
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 2078 is not entered. Are you sure this can never be the case?
Loading history...
2187
        geometry="geom",
2188
    )
2189
2190
2191
@timer_func
2192
def desaggregate_pv(
2193
    buildings_gdf: gpd.GeoDataFrame,
2194
    cap_df: pd.DataFrame,
2195
    **kwargs,
2196
) -> gpd.GeoDataFrame:
2197
    """
2198
    Desaggregate PV capacity on buildings within a given grid district.
2199
    Parameters
2200
    -----------
2201
    buildings_gdf : geopandas.GeoDataFrame
2202
        GeoDataFrame containing OSM buildings data.
2203
    cap_df : pandas.DataFrame
2204
        DataFrame with total rooftop capacity per mv grid.
2205
    Other Parameters
2206
    -----------
2207
    prob_dict : dict
2208
        Dictionary with values and probabilities per capacity range.
2209
    cap_share_dict : dict
2210
        Dictionary with share of PV capacity from the total PV capacity within
2211
        capacity ranges.
2212
    building_area_range_dict : dict
2213
        Dictionary with estimated normal building area range per capacity
2214
        range.
2215
    load_factor_dict : dict
2216
        Dictionary with mean roof load factor per capacity range.
2217
    seed : int
2218
        Seed to use for random operations with NumPy and pandas.
2219
    pv_cap_per_sq_m : float, int
2220
        Average expected, installable PV capacity per square meter.
2221
    Returns
2222
    -------
2223
    geopandas.GeoDataFrame
2224
        GeoDataFrame containing OSM building data with desaggregated PV
2225
        plants.
2226
    """
2227
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
2228
2229
    for bus_id in buildings_gdf.bus_id.unique():
2230
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
2231
2232
        pv_installed_gdf = buildings_grid_gdf.loc[
2233
            ~buildings_grid_gdf.end_of_life
2234
        ]
2235
2236
        pv_installed = pv_installed_gdf.capacity.sum()
2237
2238
        pot_buildings_gdf = buildings_grid_gdf.loc[
2239
            ~buildings_grid_gdf.index.isin(pv_installed_gdf.index)
2240
        ]
2241
2242
        if len(pot_buildings_gdf) == 0:
2243
            logger.error(
2244
                f"In grid {bus_id} there are no potential buildings to allocate "
2245
                "PV capacity to. The grid is skipped. This message should only "
2246
                "appear doing test runs with few buildings."
2247
            )
2248
2249
            continue
2250
2251
        pv_target = cap_df.at[
2252
            bus_id,
2253
            "capacity",
2254
        ]
2255
2256
        pv_missing = pv_target - pv_installed
2257
2258
        if pv_missing <= 0:
2259
            logger.info(
2260
                f"In grid {bus_id} there is more PV installed in status Quo than"
2261
                " allocated within the scenario. No new generators are added."
2262
            )
2263
2264
            continue
2265
2266
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
2267
            logger.error(
2268
                f"In grid {bus_id} there is less PV potential ("
2269
                f"{pot_buildings_gdf.max_cap.sum():g} kW) than allocated PV "
2270
                f"capacity ({pv_missing:g} kW). The grid is skipped. This message "
2271
                "should only appear doing test runs with few buildings."
2272
            )
2273
2274
            continue
2275
2276
        gdf = desaggregate_pv_in_mv_grid(
2277
            buildings_gdf=pot_buildings_gdf,
2278
            pv_cap=pv_missing,
2279
            **kwargs,
2280
        )
2281
2282
        allocated_buildings_gdf = pd.concat(
2283
            [
2284
                allocated_buildings_gdf,
2285
                gdf,
2286
            ]
2287
        )
2288
2289
    logger.debug("Desaggregated scenario.")
2290
2291
    return gpd.GeoDataFrame(
2292
        allocated_buildings_gdf,
2293
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
2294
        geometry="geom",
2295
    )
2296
2297
2298
@timer_func
2299
def add_buildings_meta_data(
2300
    buildings_gdf: gpd.GeoDataFrame,
2301
    prob_dict: dict,
2302
    seed: int,
2303
) -> gpd.GeoDataFrame:
2304
    """
2305
    Randomly add additional metadata to desaggregated PV plants.
2306
    Parameters
2307
    -----------
2308
    buildings_gdf : geopandas.GeoDataFrame
2309
        GeoDataFrame containing OSM buildings data with desaggregated PV
2310
        plants.
2311
    prob_dict : dict
2312
        Dictionary with values and probabilities per capacity range.
2313
    seed : int
2314
        Seed to use for random operations with NumPy and pandas.
2315
    Returns
2316
    -------
2317
    geopandas.GeoDataFrame
2318
        GeoDataFrame containing OSM building data with desaggregated PV
2319
        plants.
2320
    """
2321
    rng = default_rng(seed=seed)
2322
    buildings_gdf = buildings_gdf.reset_index().rename(
2323
        columns={
2324
            "index": "building_id",
2325
        }
2326
    )
2327
2328
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
2329
        cap_range_gdf = buildings_gdf.loc[
2330
            (buildings_gdf.capacity >= min_cap)
2331
            & (buildings_gdf.capacity < max_cap)
2332
        ]
2333
2334
        for key, values in cap_range_prob_dict["values"].items():
2335
            if key == "load_factor":
2336
                continue
2337
2338
            gdf = cap_range_gdf.loc[
2339
                cap_range_gdf[key].isna()
2340
                | cap_range_gdf[key].isnull()
2341
                | (cap_range_gdf[key] == "None")
2342
            ]
2343
2344
            key_vals = rng.choice(
2345
                a=values,
2346
                size=len(gdf),
2347
                p=cap_range_prob_dict["probabilities"][key],
2348
            )
2349
2350
            buildings_gdf.loc[gdf.index, key] = key_vals
2351
2352
    return buildings_gdf
2353
2354
2355
def add_voltage_level(
2356
    buildings_gdf: gpd.GeoDataFrame,
2357
) -> gpd.GeoDataFrame:
2358
    """
2359
    Add voltage level derived from generator capacity to the power plants.
2360
    Parameters
2361
    -----------
2362
    buildings_gdf : geopandas.GeoDataFrame
2363
        GeoDataFrame containing OSM buildings data with desaggregated PV
2364
        plants.
2365
    Returns
2366
    -------
2367
    geopandas.GeoDataFrame
2368
        GeoDataFrame containing OSM building data with voltage level per generator.
2369
    """
2370
2371
    def voltage_levels(p: float) -> int:
2372
        if p < 100:
2373
            return 7
2374
        elif p < 200:
2375
            return 6
2376
        elif p < 5500:
2377
            return 5
2378
        elif p < 20000:
2379
            return 4
2380
        elif p < 120000:
2381
            return 3
2382
        return 1
2383
2384
    return buildings_gdf.assign(
2385
        voltage_level=buildings_gdf.capacity.apply(voltage_levels)
2386
    )
2387
2388
2389
def add_start_up_date(
2390
    buildings_gdf: gpd.GeoDataFrame,
2391
    start: pd.Timestamp,
2392
    end: pd.Timestamp,
2393
    seed: int,
2394
):
2395
    """
2396
    Randomly and linear add start-up date to new pv generators.
2397
    Parameters
2398
    ----------
2399
    buildings_gdf : geopandas.GeoDataFrame
2400
        GeoDataFrame containing OSM buildings data with desaggregated PV
2401
        plants.
2402
    start : pandas.Timestamp
2403
        Minimum Timestamp to use.
2404
    end : pandas.Timestamp
2405
        Maximum Timestamp to use.
2406
    seed : int
2407
        Seed to use for random operations with NumPy and pandas.
2408
    Returns
2409
    -------
2410
    geopandas.GeoDataFrame
2411
        GeoDataFrame containing OSM buildings data with start-up date added.
2412
    """
2413
    rng = default_rng(seed=seed)
2414
2415
    date_range = pd.date_range(start=start, end=end, freq="1D")
2416
2417
    return buildings_gdf.assign(
2418
        start_up_date=rng.choice(date_range, size=len(buildings_gdf))
2419
    )
2420
2421
2422
@timer_func
2423
def allocate_scenarios(
2424
    mastr_gdf: gpd.GeoDataFrame,
2425
    buildings_gdf: gpd.GeoDataFrame,
2426
    last_scenario_gdf: gpd.GeoDataFrame,
2427
    scenario: str,
2428
):
2429
    """
2430
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2431
    Parameters
2432
    ----------
2433
    mastr_gdf : geopandas.GeoDataFrame
2434
        GeoDataFrame containing geocoded MaStR data.
2435
    buildings_gdf : geopandas.GeoDataFrame
2436
        GeoDataFrame containing OSM buildings data.
2437
    last_scenario_gdf : geopandas.GeoDataFrame
2438
        GeoDataFrame containing OSM buildings matched with pv generators from temporal
2439
        preceding scenario.
2440
    scenario : str
2441
        Scenario to desaggrgate and allocate.
2442
    Returns
2443
    -------
2444
    tuple
2445
        geopandas.GeoDataFrame
2446
            GeoDataFrame containing OSM buildings matched with pv generators.
2447
        pandas.DataFrame
2448
            DataFrame containing pv rooftop capacity per grid id.
2449
    """
2450
    grid_districts_gdf = grid_districts(EPSG)
2451
2452
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
2453
2454
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
2455
        grid_districts_gdf,
2456
        federal_state_gdf,
2457
    )
2458
2459
    buildings_overlay_gdf = add_overlay_id_to_buildings(
2460
        buildings_gdf,
2461
        grid_federal_state_gdf,
2462
    )
2463
2464
    valid_buildings_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
2465
2466
    # buildings_area_per_overlay_gdf = buildings_area_per_overlay_id(
2467
    #     valid_buildings_gdf,
2468
    #     grid_federal_state_gdf,
2469
    # )
2470
2471
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2472
    #     buildings_area_per_overlay_gdf,
2473
    #     scenario_data(CARRIER, scenario),
2474
    # )
2475
2476
    last_scenario_gdf = determine_end_of_life_gens(
2477
        last_scenario_gdf,
2478
        SCENARIO_TIMESTAMP[scenario],
2479
        PV_ROOFTOP_LIFETIME,
2480
    )
2481
2482
    buildings_gdf = calculate_max_pv_cap_per_building(
2483
        valid_buildings_gdf,
2484
        last_scenario_gdf,
2485
        PV_CAP_PER_SQ_M,
2486
        ROOF_FACTOR,
2487
    )
2488
2489
    mastr_gdf = calculate_building_load_factor(
2490
        mastr_gdf,
2491
        buildings_gdf,
2492
    )
2493
2494
    probabilities_dict = probabilities(
2495
        mastr_gdf,
2496
        cap_ranges=CAP_RANGES,
2497
    )
2498
2499
    cap_share_dict = cap_share_per_cap_range(
2500
        mastr_gdf,
2501
        cap_ranges=CAP_RANGES,
2502
    )
2503
2504
    load_factor_dict = mean_load_factor_per_cap_range(
2505
        mastr_gdf,
2506
        cap_ranges=CAP_RANGES,
2507
    )
2508
2509
    building_area_range_dict = building_area_range_per_cap_range(
2510
        mastr_gdf,
2511
        cap_ranges=CAP_RANGES,
2512
        min_building_size=MIN_BUILDING_SIZE,
2513
        upper_quantile=UPPER_QUNATILE,
2514
        lower_quantile=LOWER_QUANTILE,
2515
    )
2516
2517
    allocated_buildings_gdf = desaggregate_pv(
2518
        buildings_gdf=buildings_gdf,
2519
        cap_df=cap_per_bus_id_df,
2520
        prob_dict=probabilities_dict,
2521
        cap_share_dict=cap_share_dict,
2522
        building_area_range_dict=building_area_range_dict,
2523
        load_factor_dict=load_factor_dict,
2524
        seed=SEED,
2525
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2526
    )
2527
2528
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2529
2530
    meta_buildings_gdf = frame_to_numeric(
2531
        add_buildings_meta_data(
2532
            allocated_buildings_gdf,
2533
            probabilities_dict,
2534
            SEED,
2535
        )
2536
    )
2537
2538
    return (
2539
        add_start_up_date(
2540
            meta_buildings_gdf,
2541
            start=last_scenario_gdf.start_up_date.max(),
2542
            end=SCENARIO_TIMESTAMP[scenario],
2543
            seed=SEED,
2544
        ),
2545
        cap_per_bus_id_df,
2546
    )
2547
2548
2549
class EgonPowerPlantPvRoofBuildingScenario(Base):
2550
    __tablename__ = "egon_power_plants_pv_roof_building"
2551
    __table_args__ = {"schema": "supply"}
2552
2553
    index = Column(Integer, primary_key=True, index=True)
2554
    scenario = Column(String)
2555
    building_id = Column(Integer)
2556
    gens_id = Column(String, nullable=True)
2557
    capacity = Column(Float)
2558
    einheitliche_ausrichtung_und_neigungswinkel = Column(Float)
2559
    hauptausrichtung = Column(String)
2560
    hauptausrichtung_neigungswinkel = Column(String)
2561
    voltage_level = Column(Integer)
2562
2563
2564
def create_scenario_table(buildings_gdf):
2565
    """Create mapping table pv_unit <-> building for scenario"""
2566
    EgonPowerPlantPvRoofBuildingScenario.__table__.drop(
2567
        bind=engine, checkfirst=True
2568
    )
2569
    EgonPowerPlantPvRoofBuildingScenario.__table__.create(
2570
        bind=engine, checkfirst=True
2571
    )
2572
2573
    buildings_gdf.rename(columns=COLS_TO_RENAME)[
2574
        COLS_TO_EXPORT
2575
    ].reset_index().to_sql(
2576
        name=EgonPowerPlantPvRoofBuildingScenario.__table__.name,
2577
        schema=EgonPowerPlantPvRoofBuildingScenario.__table__.schema,
2578
        con=db.engine(),
2579
        if_exists="append",
2580
        index=False,
2581
    )
2582
2583
2584
def geocode_mastr_data():
2585
    """
2586
    Read PV rooftop data from MaStR CSV
2587
    TODO: the source will be replaced as soon as the MaStR data is available
2588
     in DB.
2589
    """
2590
    mastr_df = mastr_data(
2591
        MASTR_INDEX_COL,
2592
        MASTR_RELEVANT_COLS,
2593
        MASTR_DTYPES,
2594
        MASTR_PARSE_DATES,
2595
    )
2596
2597
    clean_mastr_df = clean_mastr_data(
2598
        mastr_df,
2599
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
2600
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
2601
        seed=SEED,
2602
        rounding=ROUNDING,
2603
    )
2604
2605
    geocoding_df = geocoding_data(clean_mastr_df)
2606
2607
    ratelimiter = geocoder(USER_AGENT, MIN_DELAY_SECONDS)
2608
2609
    geocode_gdf = geocode_data(geocoding_df, ratelimiter, EPSG)
2610
2611
    create_geocoded_table(geocode_gdf)
2612
2613
2614
def pv_rooftop_to_buildings():
2615
    """Main script, executed as task"""
2616
2617
    mastr_gdf = load_mastr_data()
2618
2619
    buildings_gdf = load_building_data()
2620
2621
    logger.debug(f"1: {buildings_gdf.head()}")
2622
2623
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2624
        mastr_gdf, buildings_gdf
2625
    )
2626
2627
    logger.debug(f"2: {desagg_buildings_gdf.head()}")
2628
2629
    all_buildings_gdf = (
2630
        desagg_mastr_gdf.assign(scenario="status_quo")
2631
        .reset_index()
2632
        .rename(columns={"geometry": "geom", "EinheitMastrNummer": "gens_id"})
2633
    )
2634
2635
    scenario_buildings_gdf = all_buildings_gdf.copy()
2636
2637
    cap_per_bus_id_df = pd.DataFrame()
2638
2639
    for scenario in SCENARIOS:
2640
        logger.debug(f"Desaggregating scenario {scenario}.")
2641
        (
2642
            scenario_buildings_gdf,
2643
            cap_per_bus_id_scenario_df,
2644
        ) = allocate_scenarios(  # noqa: F841
2645
            desagg_mastr_gdf,
2646
            desagg_buildings_gdf,
2647
            scenario_buildings_gdf,
2648
            scenario,
2649
        )
2650
2651
        all_buildings_gdf = gpd.GeoDataFrame(
2652
            pd.concat(
2653
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2654
            ),
2655
            crs=scenario_buildings_gdf.crs,
2656
            geometry="geom",
2657
        )
2658
2659
        cap_per_bus_id_df = pd.concat(
2660
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2661
        )
2662
2663
    # export scenario
2664
    create_scenario_table(add_voltage_level(all_buildings_gdf))
2665