Passed
Pull Request — dev (#934)
by
unknown
02:29
created

data.datasets.power_plants.pv_rooftop_buildings   F

Complexity

Total Complexity 115

Size/Duplication

Total Lines 2658
Duplicated Lines 1.2 %

Importance

Changes 0
Metric Value
wmc 115
eloc 1178
dl 32
loc 2658
rs 0.8879
c 0
b 0
f 0

52 Functions

Rating   Name   Duplication   Size   Complexity  
A synthetic_buildings() 0 25 2
A geocoding_data() 0 18 1
A geocoded_data_from_db() 0 23 2
A add_ags_to_gens() 0 24 1
A drop_buildings_outside_muns() 0 25 1
B clean_mastr_data() 0 143 2
A mastr_data() 0 60 2
A geocode_data() 0 46 3
B zip_and_municipality_from_standort() 0 50 7
A merge_geocode_with_mastr() 0 27 1
A municipality_data() 0 14 2
A osm_buildings() 0 25 2
A add_ags_to_buildings() 0 24 1
A timer_func() 0 11 1
A create_geocoded_table() 0 17 1
A load_mastr_data() 0 35 1
A drop_gens_outside_muns() 0 25 1
A geocoder() 0 22 1
A drop_invalid_entries_from_gdf() 0 25 1
A most_plausible() 0 41 4
A geocode_mastr_data() 0 28 1
A create_scenario_table() 0 17 1
A load_building_data() 0 65 3
A sort_and_qcut_df() 0 29 1
A egon_building_peak_loads() 0 8 1
A get_probability_for_property() 0 46 2
A calculate_building_load_factor() 0 30 1
A determine_end_of_life_gens() 0 28 1
A add_start_up_date() 0 30 1
B add_voltage_level() 0 31 6
A add_overlay_id_to_buildings() 0 31 1
A cap_per_bus_id() 0 25 1
B desaggregate_pv_in_mv_grid() 0 161 5
A calculate_max_pv_cap_per_building() 0 49 1
A scenario_data() 0 30 2
A drop_buildings_outside_grids() 0 25 1
C building_area_range_per_cap_range() 0 87 8
A add_buildings_meta_data() 0 55 4
B allocate_pv() 0 122 6
A federal_state_data() 0 24 2
A mean_load_factor_per_cap_range() 0 38 3
B desaggregate_pv() 0 144 6
A grid_districts() 0 32 1
A frame_to_numeric() 0 20 4
A pv_rooftop_to_buildings() 0 47 2
A allocate_to_buildings() 0 32 1
A overlay_grid_districts_with_counties() 0 33 1
B allocate_scenarios() 0 120 1
A validate_output() 0 50 1
A cap_share_per_cap_range() 0 43 3
A drop_unallocated_gens() 0 27 1
B probabilities() 0 57 5

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like data.datasets.power_plants.pv_rooftop_buildings often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate new
3
PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
Data cleaning: Drop duplicates and entries with missing critical data. Determine most
5
plausible capacity from multiple values given in MaStR data. Drop generators which don't
6
have any plausible capacity data (23.5MW > P > 0.1). Randomly and weighted add a
7
start-up date if it is missing. Extract zip and municipality from 'Standort' given in
8
MaStR data. Geocode unique zip and municipality combinations with Nominatim (1sec
9
delay). Drop generators for which geocoding failed or which are located outside the
10
municipalities of Germany. Add some visual sanity checks for cleaned data.
11
Allocation of MaStR data: Allocate each generator to an existing building from OSM.
12
Determine the quantile each generator and building is in depending on the capacity of
13
the generator and the area of the polygon of the building. Randomly distribute
14
generators within each municipality preferably within the same building area quantile as
15
the generators are capacity wise. If not enough buildings exists within a municipality
16
and quantile additional buildings from other quantiles are chosen randomly.
17
Desegregation of pv rooftop scenarios: The scenario data per federal state is linear
18
distributed to the mv grid districts according to the pv rooftop potential per mv grid
19
district. The rooftop potential is estimated from the building area given from the OSM
20
buildings. Grid districts, which are located in several federal states, are allocated PV
21
capacity according to their respective roof potential in the individual federal states.
22
The desegregation of PV plants within a grid districts respects existing plants from
23
MaStR, which did not reach their end of life. New PV plants are randomly and weighted
24
generated using a breakdown of MaStR data as generator basis. Plant metadata (e.g. plant
25
orientation) is also added random and weighted from MaStR data as basis.
26
"""
27
from __future__ import annotations
28
29
from collections import Counter
30
from functools import wraps
31
from pathlib import Path
32
from time import perf_counter
33
from typing import Any
34
35
from geoalchemy2 import Geometry
36
from geopy.extra.rate_limiter import RateLimiter
37
from geopy.geocoders import Nominatim
38
from loguru import logger
39
from numpy.random import RandomState, default_rng
40
from pyproj.crs.crs import CRS
41
from sqlalchemy import BigInteger, Column, Float, Integer, String
42
from sqlalchemy.dialects.postgresql import HSTORE
43
from sqlalchemy.ext.declarative import declarative_base
44
import geopandas as gpd
45
import numpy as np
46
import pandas as pd
47
48
from egon.data import config, db
49
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
50
    OsmBuildingsSynthetic,
51
)
52
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
53
from egon.data.datasets.zensus_vg250 import Vg250Gem
54
55
engine = db.engine()
56
Base = declarative_base()
57
SEED = int(config.settings()["egon-data"]["--random-seed"])
58
59
# TODO: move to yml
60
# mastr data
61
MASTR_RELEVANT_COLS = [
62
    "EinheitMastrNummer",
63
    "Bruttoleistung",
64
    "StatisikFlag",
65
    "Bruttoleistung_extended",
66
    "Nettonennleistung",
67
    "InstallierteLeistung",
68
    "zugeordneteWirkleistungWechselrichter",
69
    "EinheitBetriebsstatus",
70
    "Standort",
71
    "Bundesland",
72
    "Land",
73
    "Landkreis",
74
    "Gemeinde",
75
    "Postleitzahl",
76
    "Ort",
77
    "GeplantesInbetriebnahmedatum",
78
    "Inbetriebnahmedatum",
79
    "GemeinsamerWechselrichterMitSpeicher",
80
    "Lage",
81
    "Leistungsbegrenzung",
82
    "EinheitlicheAusrichtungUndNeigungswinkel",
83
    "Hauptausrichtung",
84
    "HauptausrichtungNeigungswinkel",
85
    "Nebenausrichtung",
86
]
87
88
MASTR_DTYPES = {
89
    "EinheitMastrNummer": str,
90
    "Bruttoleistung": float,
91
    "StatisikFlag": str,
92
    "Bruttoleistung_extended": float,
93
    "Nettonennleistung": float,
94
    "InstallierteLeistung": float,
95
    "zugeordneteWirkleistungWechselrichter": float,
96
    "EinheitBetriebsstatus": str,
97
    "Standort": str,
98
    "Bundesland": str,
99
    "Land": str,
100
    "Landkreis": str,
101
    "Gemeinde": str,
102
    # "Postleitzahl": int,  # fails because of nan values
103
    "Ort": str,
104
    "GemeinsamerWechselrichterMitSpeicher": str,
105
    "Lage": str,
106
    "Leistungsbegrenzung": str,
107
    # this will parse nan values as false wich is not always correct
108
    # "EinheitlicheAusrichtungUndNeigungswinkel": bool,
109
    "Hauptausrichtung": str,
110
    "HauptausrichtungNeigungswinkel": str,
111
    "Nebenausrichtung": str,
112
    "NebenausrichtungNeigungswinkel": str,
113
}
114
115
MASTR_PARSE_DATES = [
116
    "GeplantesInbetriebnahmedatum",
117
    "Inbetriebnahmedatum",
118
]
119
120
MASTR_INDEX_COL = "EinheitMastrNummer"
121
122
EPSG = 4326
123
SRID = 3035
124
125
# data cleaning
126
MAX_REALISTIC_PV_CAP = 23500
127
MIN_REALISTIC_PV_CAP = 0.1
128
ROUNDING = 1
129
130
# geopy
131
MIN_DELAY_SECONDS = 1
132
USER_AGENT = "rli_kh_geocoder"
133
134
# show additional logging information
135
VERBOSE = False
136
137
EXPORT_DIR = Path(__name__).resolve().parent / "data"
138
EXPORT_FILE = "mastr_geocoded.gpkg"
139
EXPORT_PATH = EXPORT_DIR / EXPORT_FILE
140
DRIVER = "GPKG"
141
142
# Number of quantiles
143
Q = 5
144
145
# Scenario Data
146
CARRIER = "solar_rooftop"
147
SCENARIOS = ["eGon2035"]  # , "eGon100RE"]
148
SCENARIO_TIMESTAMP = {
149
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
150
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
151
}
152
PV_ROOFTOP_LIFETIME = pd.Timedelta(30 * 365, unit="D")
153
154
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
155
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
156
MODUL_CAP = 0.4  # kWp
157
MODUL_SIZE = 1.096 * 1.754  # m²
158
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
159
160
# Estimation of usable roof area
161
# Factor for the conversion of building area to roof area
162
# estimation mean roof pitch: 35°
163
# estimation usable roof share: 80%
164
# estimation that only the south side of the building is used for pv
165
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
166
# AREA_FACTOR = 1.221
167
# USABLE_ROOF_SHARE = 0.8
168
# SOUTH_SHARE = 0.5
169
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
170
ROOF_FACTOR = 0.5
171
172
CAP_RANGES = [
173
    (0, 30),
174
    (30, 100),
175
    (100, float("inf")),
176
]
177
178
MIN_BUILDING_SIZE = 10.0
179
UPPER_QUNATILE = 0.95
180
LOWER_QUANTILE = 0.05
181
182
COLS_TO_RENAME = {
183
    "EinheitlicheAusrichtungUndNeigungswinkel": (
184
        "einheitliche_ausrichtung_und_neigungswinkel"
185
    ),
186
    "Hauptausrichtung": "hauptausrichtung",
187
    "HauptausrichtungNeigungswinkel": "hauptausrichtung_neigungswinkel",
188
}
189
190
COLS_TO_EXPORT = [
191
    "scenario",
192
    "building_id",
193
    "gens_id",
194
    "capacity",
195
    "einheitliche_ausrichtung_und_neigungswinkel",
196
    "hauptausrichtung",
197
    "hauptausrichtung_neigungswinkel",
198
    "voltage_level",
199
]
200
201
INCLUDE_SYNTHETIC_BUILDINGS = False
202
ONLY_BUILDINGS_WITH_DEMAND = True
203
204
205
def timer_func(func):
206
    @wraps(func)
207
    def timeit_wrapper(*args, **kwargs):
208
        start_time = perf_counter()
209
        result = func(*args, **kwargs)
210
        end_time = perf_counter()
211
        total_time = end_time - start_time
212
        print(f"Function {func.__name__} took {total_time:.4f} seconds.")
213
        return result
214
215
    return timeit_wrapper
216
217
218
@timer_func
219
def mastr_data(
220
    index_col: str | int | list[str] | list[int],
221
    usecols: list[str],
222
    dtype: dict[str, Any] | None,
223
    parse_dates: list[str] | None,
224
) -> pd.DataFrame:
225
    """
226
    Read MaStR data from csv.
227
228
    Parameters
229
    -----------
230
    index_col : str, int or list of str or int
231
        Column(s) to use as the row labels of the DataFrame.
232
    usecols : list of str
233
        Return a subset of the columns.
234
    dtype : dict of column (str) -> type (any), optional
235
        Data type for data or columns.
236
    parse_dates : list of names (str), optional
237
        Try to parse given columns to datetime.
238
    Returns
239
    -------
240
    pandas.DataFrame
241
        DataFrame containing MaStR data.
242
    """
243
    mastr_path = Path(
244
        config.datasets()["power_plants"]["sources"]["mastr_pv"]
245
    ).resolve()
246
247
    mastr_df = pd.read_csv(
248
        mastr_path,
249
        index_col=index_col,
250
        usecols=usecols,
251
        dtype=dtype,
252
        parse_dates=parse_dates,
253
    )
254
255
    mastr_df = mastr_df.loc[
256
        (mastr_df.StatisikFlag == "B")
257
        & (mastr_df.EinheitBetriebsstatus == "InBetrieb")
258
        & (mastr_df.Land == "Deutschland")
259
        & (mastr_df.Lage == "BaulicheAnlagen")
260
    ]
261
262
    if (
263
        config.settings()["egon-data"]["--dataset-boundary"]
264
        == "Schleswig-Holstein"
265
    ):
266
        init_len = len(mastr_df)
267
268
        mastr_df = mastr_df.loc[mastr_df.Bundesland == "SchleswigHolstein"]
269
270
        logger.info(
271
            f"Using only MaStR data within Schleswig-Holstein. "
272
            f"{init_len - len(mastr_df)} of {init_len} generators are dropped."
273
        )
274
275
    logger.debug("MaStR data loaded.")
276
277
    return mastr_df
278
279
280
@timer_func
281
def clean_mastr_data(
282
    mastr_df: pd.DataFrame,
283
    max_realistic_pv_cap: int | float,
284
    min_realistic_pv_cap: int | float,
285
    rounding: int,
286
    seed: int,
287
) -> pd.DataFrame:
288
    """
289
    Clean the MaStR data from implausible data.
290
291
    * Drop MaStR ID duplicates.
292
    * Drop generators with implausible capacities.
293
    * Drop generators without any kind of start-up date.
294
    * Clean up Standort column and capacity.
295
296
    Parameters
297
    -----------
298
    mastr_df : pandas.DataFrame
299
        DataFrame containing MaStR data.
300
    max_realistic_pv_cap : int or float
301
        Maximum capacity, which is considered to be realistic.
302
    min_realistic_pv_cap : int or float
303
        Minimum capacity, which is considered to be realistic.
304
    rounding : int
305
        Rounding to use when cleaning up capacity. E.g. when
306
        rounding is 1 a capacity of 9.93 will be rounded to 9.9.
307
    seed : int
308
        Seed to use for random operations with NumPy and pandas.
309
    Returns
310
    -------
311
    pandas.DataFrame
312
        DataFrame containing cleaned MaStR data.
313
    """
314
    init_len = len(mastr_df)
315
316
    # drop duplicates
317
    mastr_df = mastr_df.loc[~mastr_df.index.duplicated()]
318
319
    # drop invalid entries in standort
320
    index_to_drop = mastr_df.loc[
321
        (mastr_df.Standort.isna()) | (mastr_df.Standort.isnull())
322
    ].index
323
324
    mastr_df = mastr_df.loc[~mastr_df.index.isin(index_to_drop)]
325
326
    df = mastr_df[
327
        [
328
            "Bruttoleistung",
329
            "Bruttoleistung_extended",
330
            "Nettonennleistung",
331
            "zugeordneteWirkleistungWechselrichter",
332
            "InstallierteLeistung",
333
        ]
334
    ].round(rounding)
335
336
    # use only the smallest capacity rating if multiple are given
337
    mastr_df = mastr_df.assign(
338
        capacity=[
339
            most_plausible(p_tub, min_realistic_pv_cap)
340
            for p_tub in df.itertuples(index=False)
341
        ]
342
    )
343
344
    # drop generators without any capacity info
345
    # and capacity of zero
346
    # and if the capacity is > 23.5 MW, because
347
    # Germanies largest rooftop PV is 23 MW
348
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
349
    mastr_df = mastr_df.loc[
350
        (~mastr_df.capacity.isna())
351
        & (mastr_df.capacity <= max_realistic_pv_cap)
352
        & (mastr_df.capacity > min_realistic_pv_cap)
353
    ]
354
355
    # get zip and municipality
356
    mastr_df[["zip_and_municipality", "drop_this"]] = pd.DataFrame(
357
        mastr_df.Standort.astype(str)
358
        .apply(
359
            zip_and_municipality_from_standort,
360
            args=(VERBOSE,),
361
        )
362
        .tolist(),
363
        index=mastr_df.index,
364
    )
365
366
    # drop invalid entries
367
    mastr_df = mastr_df.loc[mastr_df.drop_this].drop(columns="drop_this")
368
369
    # add ", Deutschland" just in case
370
    mastr_df = mastr_df.assign(
371
        zip_and_municipality=(mastr_df.zip_and_municipality + ", Deutschland")
372
    )
373
374
    # get consistent start-up date
375
    mastr_df = mastr_df.assign(
376
        start_up_date=mastr_df.Inbetriebnahmedatum,
377
    )
378
379
    mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
380
        mastr_df.start_up_date.isna()
381
    ].assign(
382
        start_up_date=mastr_df.GeplantesInbetriebnahmedatum.loc[
383
            mastr_df.start_up_date.isna()
384
        ]
385
    )
386
387
    # randomly and weighted fill missing start-up dates
388
    pool = mastr_df.loc[
389
        ~mastr_df.start_up_date.isna()
390
    ].start_up_date.to_numpy()
391
392
    size = len(mastr_df) - len(pool)
393
394
    if size > 0:
395
        np.random.seed(seed)
396
397
        choice = np.random.choice(
398
            pool,
399
            size=size,
400
            replace=False,
401
        )
402
403
        mastr_df.loc[mastr_df.start_up_date.isna()] = mastr_df.loc[
404
            mastr_df.start_up_date.isna()
405
        ].assign(start_up_date=choice)
406
407
        logger.info(
408
            f"Randomly and weigthed added start-up date to {size} generators."
409
        )
410
411
    mastr_df = mastr_df.assign(
412
        start_up_date=pd.to_datetime(mastr_df.start_up_date, utc=True)
413
    )
414
415
    end_len = len(mastr_df)
416
    logger.debug(
417
        f"Dropped {init_len - end_len} "
418
        f"({((init_len - end_len) / init_len) * 100:g}%)"
419
        f" of {init_len} rows from MaStR DataFrame."
420
    )
421
422
    return mastr_df
423
424
425
def zip_and_municipality_from_standort(
426
    standort: str,
427
    verbose: bool = False,
428
) -> tuple[str, bool]:
429
    """
430
    Get zip code and municipality from Standort string split into a list.
431
    Parameters
432
    -----------
433
    standort : str
434
        Standort as given from MaStR data.
435
    verbose : bool
436
        Logs additional info if True.
437
    Returns
438
    -------
439
    str
440
        Standort with only the zip code and municipality
441
        as well a ', Germany' added.
442
    """
443
    if verbose:
444
        logger.debug(f"Uncleaned String: {standort}")
445
446
    standort_list = standort.split()
447
448
    found = False
449
    count = 0
450
451
    for count, elem in enumerate(standort_list):
452
        if len(elem) != 5:
453
            continue
454
        if not elem.isnumeric():
455
            continue
456
457
        found = True
458
459
        break
460
461
    if found:
462
        cleaned_str = " ".join(standort_list[count:])
463
464
        if verbose:
465
            logger.debug(f"Cleaned String:   {cleaned_str}")
466
467
        return cleaned_str, found
468
469
    logger.warning(
470
        "Couldn't identify zip code. This entry will be dropped."
471
        f" Original standort: {standort}."
472
    )
473
474
    return standort, found
475
476
477
def most_plausible(
478
    p_tub: tuple,
479
    min_realistic_pv_cap: int | float,
480
) -> float:
481
    """
482
    Try to determine the most plausible capacity.
483
    Try to determine the most plausible capacity from a given
484
    generator from MaStR data.
485
    Parameters
486
    -----------
487
    p_tub : tuple
488
        Tuple containing the different capacities given in
489
        the MaStR data.
490
    min_realistic_pv_cap : int or float
491
        Minimum capacity, which is considered to be realistic.
492
    Returns
493
    -------
494
    float
495
        Capacity of the generator estimated as the most realistic.
496
    """
497
    count = Counter(p_tub).most_common(3)
498
499
    if len(count) == 1:
500
        return count[0][0]
501
502
    val1 = count[0][0]
503
    val2 = count[1][0]
504
505
    if len(count) == 2:
506
        min_val = min(val1, val2)
507
        max_val = max(val1, val2)
508
    else:
509
        val3 = count[2][0]
510
511
        min_val = min(val1, val2, val3)
512
        max_val = max(val1, val2, val3)
513
514
    if min_val < min_realistic_pv_cap:
515
        return max_val
516
517
    return min_val
518
519
520
def geocoder(
521
    user_agent: str,
522
    min_delay_seconds: int,
523
) -> RateLimiter:
524
    """
525
    Setup Nominatim geocoding class.
526
    Parameters
527
    -----------
528
    user_agent : str
529
        The app name.
530
    min_delay_seconds : int
531
        Delay in seconds to use between requests to Nominatim.
532
        A minimum of 1 is advised.
533
    Returns
534
    -------
535
    geopy.extra.rate_limiter.RateLimiter
536
        Nominatim RateLimiter geocoding class to use for geocoding.
537
    """
538
    locator = Nominatim(user_agent=user_agent)
539
    return RateLimiter(
540
        locator.geocode,
541
        min_delay_seconds=min_delay_seconds,
542
    )
543
544
545
def geocoding_data(
546
    clean_mastr_df: pd.DataFrame,
547
) -> pd.DataFrame:
548
    """
549
    Setup DataFrame to geocode.
550
    Parameters
551
    -----------
552
    clean_mastr_df : pandas.DataFrame
553
        DataFrame containing cleaned MaStR data.
554
    Returns
555
    -------
556
    pandas.DataFrame
557
        DataFrame containing all unique combinations of
558
        zip codes with municipalities for geocoding.
559
    """
560
    return pd.DataFrame(
561
        data=clean_mastr_df.zip_and_municipality.unique(),
562
        columns=["zip_and_municipality"],
563
    )
564
565
566
@timer_func
567
def geocode_data(
568
    geocoding_df: pd.DataFrame,
569
    ratelimiter: RateLimiter,
570
    epsg: int,
571
) -> gpd.GeoDataFrame:
572
    """
573
    Geocode zip code and municipality.
574
    Extract latitude, longitude and altitude.
575
    Transfrom latitude and longitude to shapely
576
    Point and return a geopandas GeoDataFrame.
577
    Parameters
578
    -----------
579
    geocoding_df : pandas.DataFrame
580
        DataFrame containing all unique combinations of
581
        zip codes with municipalities for geocoding.
582
    ratelimiter : geopy.extra.rate_limiter.RateLimiter
583
        Nominatim RateLimiter geocoding class to use for geocoding.
584
    epsg : int
585
        EPSG ID to use as CRS.
586
    Returns
587
    -------
588
    geopandas.GeoDataFrame
589
        GeoDataFrame containing all unique combinations of
590
        zip codes with municipalities with matching geolocation.
591
    """
592
    logger.info(f"Geocoding {len(geocoding_df)} locations.")
593
594
    geocode_df = geocoding_df.assign(
595
        location=geocoding_df.zip_and_municipality.apply(ratelimiter)
596
    )
597
598
    geocode_df = geocode_df.assign(
599
        point=geocode_df.location.apply(
600
            lambda loc: tuple(loc.point) if loc else None
601
        )
602
    )
603
604
    geocode_df[["latitude", "longitude", "altitude"]] = pd.DataFrame(
605
        geocode_df.point.tolist(), index=geocode_df.index
606
    )
607
608
    return gpd.GeoDataFrame(
609
        geocode_df,
610
        geometry=gpd.points_from_xy(geocode_df.longitude, geocode_df.latitude),
611
        crs=f"EPSG:{epsg}",
612
    )
613
614
615
def merge_geocode_with_mastr(
616
    clean_mastr_df: pd.DataFrame, geocode_gdf: gpd.GeoDataFrame
617
) -> gpd.GeoDataFrame:
618
    """
619
    Merge geometry to original mastr data.
620
    Parameters
621
    -----------
622
    clean_mastr_df : pandas.DataFrame
623
        DataFrame containing cleaned MaStR data.
624
    geocode_gdf : geopandas.GeoDataFrame
625
        GeoDataFrame containing all unique combinations of
626
        zip codes with municipalities with matching geolocation.
627
    Returns
628
    -------
629
    gepandas.GeoDataFrame
630
        GeoDataFrame containing cleaned MaStR data with
631
        matching geolocation from geocoding.
632
    """
633
    return gpd.GeoDataFrame(
634
        clean_mastr_df.merge(
635
            geocode_gdf[["zip_and_municipality", "geometry"]],
636
            how="left",
637
            left_on="zip_and_municipality",
638
            right_on="zip_and_municipality",
639
        ),
640
        crs=geocode_gdf.crs,
641
    ).set_index(clean_mastr_df.index)
642
643
644
def drop_invalid_entries_from_gdf(
645
    gdf: gpd.GeoDataFrame,
646
) -> gpd.GeoDataFrame:
647
    """
648
    Drop invalid entries from geopandas GeoDataFrame.
649
    TODO: how to omit the logging from geos here???
650
    Parameters
651
    -----------
652
    gdf : geopandas.GeoDataFrame
653
        GeoDataFrame to be checked for validity.
654
    Returns
655
    -------
656
    gepandas.GeoDataFrame
657
        GeoDataFrame with rows with invalid geometries
658
        dropped.
659
    """
660
    valid_gdf = gdf.loc[gdf.is_valid]
661
662
    logger.debug(
663
        f"{len(gdf) - len(valid_gdf)} "
664
        f"({(len(gdf) - len(valid_gdf)) / len(gdf) * 100:g}%) "
665
        f"of {len(gdf)} values were invalid and are dropped."
666
    )
667
668
    return valid_gdf
669
670
671
@timer_func
672
def municipality_data() -> gpd.GeoDataFrame:
673
    """
674
    Get municipality data from eGo^n Database.
675
    Returns
676
    -------
677
    gepandas.GeoDataFrame
678
        GeoDataFrame with municipality data.
679
    """
680
    with db.session_scope() as session:
681
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
682
683
    return gpd.read_postgis(
684
        query.statement, query.session.bind, index_col="ags"
685
    )
686
687
688
@timer_func
689
def add_ags_to_gens(
690
    valid_mastr_gdf: gpd.GeoDataFrame,
691
    municipalities_gdf: gpd.GeoDataFrame,
692
) -> gpd.GeoDataFrame:
693
    """
694
    Add information about AGS ID to generators.
695
    Parameters
696
    -----------
697
    valid_mastr_gdf : geopandas.GeoDataFrame
698
        GeoDataFrame with valid and cleaned MaStR data.
699
    municipalities_gdf : geopandas.GeoDataFrame
700
        GeoDataFrame with municipality data.
701
    Returns
702
    -------
703
    gepandas.GeoDataFrame
704
        GeoDataFrame with valid and cleaned MaStR data
705
        with AGS ID added.
706
    """
707
    return valid_mastr_gdf.sjoin(
708
        municipalities_gdf,
709
        how="left",
710
        predicate="intersects",
711
    ).rename(columns={"index_right": "ags"})
712
713
714
def drop_gens_outside_muns(
715
    valid_mastr_gdf: gpd.GeoDataFrame,
716
) -> gpd.GeoDataFrame:
717
    """
718
    Drop all generators outside of municipalities.
719
    Parameters
720
    -----------
721
    valid_mastr_gdf : geopandas.GeoDataFrame
722
        GeoDataFrame with valid and cleaned MaStR data.
723
    Returns
724
    -------
725
    gepandas.GeoDataFrame
726
        GeoDataFrame with valid and cleaned MaStR data
727
        with generatos without an AGS ID dropped.
728
    """
729
    gdf = valid_mastr_gdf.loc[~valid_mastr_gdf.ags.isna()]
730
731
    logger.debug(
732
        f"{len(valid_mastr_gdf) - len(gdf)} "
733
        f"({(len(valid_mastr_gdf) - len(gdf)) / len(valid_mastr_gdf) * 100:g}%) "
734
        f"of {len(valid_mastr_gdf)} values are outside of the municipalities"
735
        " and are therefore dropped."
736
    )
737
738
    return gdf
739
740
741
class EgonMastrPvRoofGeocoded(Base):
742
    __tablename__ = "egon_mastr_pv_roof_geocoded"
743
    __table_args__ = {"schema": "supply"}
744
745
    zip_and_municipality = Column(String, primary_key=True, index=True)
746
    location = Column(String)
747
    point = Column(String)
748
    latitude = Column(Float)
749
    longitude = Column(Float)
750
    altitude = Column(Float)
751
    geometry = Column(Geometry(srid=EPSG))
752
753
754
def create_geocoded_table(geocode_gdf):
755
    """
756
    Create geocoded table mastr pv rooftop
757
    Parameters
758
    -----------
759
    geocode_gdf : geopandas.GeoDataFrame
760
        GeoDataFrame containing geocoding information on pv rooftop locations.
761
    """
762
    EgonMastrPvRoofGeocoded.__table__.drop(bind=engine, checkfirst=True)
763
    EgonMastrPvRoofGeocoded.__table__.create(bind=engine, checkfirst=True)
764
765
    geocode_gdf.to_postgis(
766
        name=EgonMastrPvRoofGeocoded.__table__.name,
767
        schema=EgonMastrPvRoofGeocoded.__table__.schema,
768
        con=db.engine(),
769
        if_exists="append",
770
        index=False,
771
        # dtype={}
772
    )
773
774
775
def geocoded_data_from_db(
776
    epsg: str | int,
777
) -> gpd.GeoDataFrame:
778
    """
779
    Read OSM buildings data from eGo^n Database.
780
    Parameters
781
    -----------
782
    to_crs : pyproj.crs.crs.CRS
783
        CRS to transform geometries to.
784
    Returns
785
    -------
786
    geopandas.GeoDataFrame
787
        GeoDataFrame containing OSM buildings data.
788
    """
789
    with db.session_scope() as session:
790
        query = session.query(
791
            EgonMastrPvRoofGeocoded.zip_and_municipality,
792
            EgonMastrPvRoofGeocoded.geometry,
793
        )
794
795
    return gpd.read_postgis(
796
        query.statement, query.session.bind, geom_col="geometry"
797
    ).to_crs(f"EPSG:{epsg}")
798
799
800
def load_mastr_data():
801
    """Read PV rooftop data from MaStR CSV
802
    Note: the source will be replaced as soon as the MaStR data is available
803
    in DB.
804
    Returns
805
    -------
806
    geopandas.GeoDataFrame
807
        GeoDataFrame containing MaStR data with geocoded locations.
808
    """
809
    mastr_df = mastr_data(
810
        MASTR_INDEX_COL,
811
        MASTR_RELEVANT_COLS,
812
        MASTR_DTYPES,
813
        MASTR_PARSE_DATES,
814
    )
815
816
    clean_mastr_df = clean_mastr_data(
817
        mastr_df,
818
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
819
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
820
        seed=SEED,
821
        rounding=ROUNDING,
822
    )
823
824
    geocode_gdf = geocoded_data_from_db(EPSG)
825
826
    mastr_gdf = merge_geocode_with_mastr(clean_mastr_df, geocode_gdf)
827
828
    valid_mastr_gdf = drop_invalid_entries_from_gdf(mastr_gdf)
829
830
    municipalities_gdf = municipality_data()
831
832
    valid_mastr_gdf = add_ags_to_gens(valid_mastr_gdf, municipalities_gdf)
833
834
    return drop_gens_outside_muns(valid_mastr_gdf)
835
836
837
class OsmBuildingsFiltered(Base):
838
    __tablename__ = "osm_buildings_filtered"
839
    __table_args__ = {"schema": "openstreetmap"}
840
841
    osm_id = Column(BigInteger)
842
    amenity = Column(String)
843
    building = Column(String)
844
    name = Column(String)
845
    geom = Column(Geometry(srid=SRID), index=True)
846
    area = Column(Float)
847
    geom_point = Column(Geometry(srid=SRID), index=True)
848
    tags = Column(HSTORE)
849
    id = Column(BigInteger, primary_key=True, index=True)
850
851
852
@timer_func
853
def osm_buildings(
854
    to_crs: CRS,
855
) -> gpd.GeoDataFrame:
856
    """
857
    Read OSM buildings data from eGo^n Database.
858
    Parameters
859
    -----------
860
    to_crs : pyproj.crs.crs.CRS
861
        CRS to transform geometries to.
862
    Returns
863
    -------
864
    geopandas.GeoDataFrame
865
        GeoDataFrame containing OSM buildings data.
866
    """
867
    with db.session_scope() as session:
868
        query = session.query(
869
            OsmBuildingsFiltered.id,
870
            OsmBuildingsFiltered.area,
871
            OsmBuildingsFiltered.geom_point.label("geom"),
872
        )
873
874
    return gpd.read_postgis(
875
        query.statement, query.session.bind, index_col="id"
876
    ).to_crs(to_crs)
877
878
879
@timer_func
880
def synthetic_buildings(
881
    to_crs: CRS,
882
) -> gpd.GeoDataFrame:
883
    """
884
    Read synthetic buildings data from eGo^n Database.
885
    Parameters
886
    -----------
887
    to_crs : pyproj.crs.crs.CRS
888
        CRS to transform geometries to.
889
    Returns
890
    -------
891
    geopandas.GeoDataFrame
892
        GeoDataFrame containing OSM buildings data.
893
    """
894
    with db.session_scope() as session:
895
        query = session.query(
896
            OsmBuildingsSynthetic.id,
897
            OsmBuildingsSynthetic.area,
898
            OsmBuildingsSynthetic.geom_point.label("geom"),
899
        )
900
901
    return gpd.read_postgis(
902
        query.statement, query.session.bind, index_col="id"
903
    ).to_crs(to_crs)
904
905
906
@timer_func
907
def add_ags_to_buildings(
908
    buildings_gdf: gpd.GeoDataFrame,
909
    municipalities_gdf: gpd.GeoDataFrame,
910
) -> gpd.GeoDataFrame:
911
    """
912
    Add information about AGS ID to buildings.
913
    Parameters
914
    -----------
915
    buildings_gdf : geopandas.GeoDataFrame
916
        GeoDataFrame containing OSM buildings data.
917
    municipalities_gdf : geopandas.GeoDataFrame
918
        GeoDataFrame with municipality data.
919
    Returns
920
    -------
921
    gepandas.GeoDataFrame
922
        GeoDataFrame containing OSM buildings data
923
        with AGS ID added.
924
    """
925
    return buildings_gdf.sjoin(
926
        municipalities_gdf,
927
        how="left",
928
        predicate="intersects",
929
    ).rename(columns={"index_right": "ags"})
930
931
932
def drop_buildings_outside_muns(
933
    buildings_gdf: gpd.GeoDataFrame,
934
) -> gpd.GeoDataFrame:
935
    """
936
    Drop all buildings outside of municipalities.
937
    Parameters
938
    -----------
939
    buildings_gdf : geopandas.GeoDataFrame
940
        GeoDataFrame containing OSM buildings data.
941
    Returns
942
    -------
943
    gepandas.GeoDataFrame
944
        GeoDataFrame containing OSM buildings data
945
        with buildings without an AGS ID dropped.
946
    """
947
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
948
949
    logger.debug(
950
        f"{len(buildings_gdf) - len(gdf)} "
951
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
952
        f"of {len(buildings_gdf)} values are outside of the municipalities "
953
        "and are therefore dropped."
954
    )
955
956
    return gdf
957
958
959
def egon_building_peak_loads():
960
    sql = """
961
    SELECT building_id
962
    FROM demand.egon_building_electricity_peak_loads
963
    WHERE scenario = 'eGon2035'
964
    """
965
966
    return db.select_dataframe(sql).building_id.astype(int).sort_values()
967
968
969
@timer_func
970
def load_building_data():
971
    """
972
    Read buildings from DB
973
    Tables:
974
975
    * `openstreetmap.osm_buildings_filtered` (from OSM)
976
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
977
978
    Use column `id` for both as it is unique hence you concat both datasets. If
979
    INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be loaded.
980
981
    Returns
982
    -------
983
    gepandas.GeoDataFrame
984
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
985
        dropped.
986
    """
987
988
    municipalities_gdf = municipality_data()
989
990
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
991
992
    if INCLUDE_SYNTHETIC_BUILDINGS:
993
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
994
995
        buildings_gdf = gpd.GeoDataFrame(
996
            pd.concat(
997
                [
998
                    osm_buildings_gdf,
999
                    synthetic_buildings_gdf,
1000
                ]
1001
            ),
1002
            geometry="geom",
1003
            crs=osm_buildings_gdf.crs,
1004
        ).rename(columns={"area": "building_area"})
1005
1006
    else:
1007
        buildings_gdf = osm_buildings_gdf.rename(
1008
            columns={"area": "building_area"}
1009
        )
1010
1011
    if ONLY_BUILDINGS_WITH_DEMAND:
1012
        building_ids = egon_building_peak_loads()
1013
1014
        init_len = len(building_ids)
1015
1016
        building_ids = [
1017
            b_id for b_id in building_ids if b_id in buildings_gdf.index
1018
        ]
1019
1020
        end_len = len(building_ids)
1021
1022
        logger.debug(
1023
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) of IDs within OSM"
1024
            f" buildings."
1025
        )
1026
1027
        buildings_gdf = buildings_gdf.loc[building_ids]
1028
1029
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
1030
1031
    logger.debug("Loaded buildings.")
1032
1033
    return drop_buildings_outside_muns(buildings_ags_gdf)
1034
1035
1036
@timer_func
1037
def sort_and_qcut_df(
1038
    df: pd.DataFrame | gpd.GeoDataFrame,
1039
    col: str,
1040
    q: int,
1041
) -> pd.DataFrame | gpd.GeoDataFrame:
1042
    """
1043
    Determine the quantile of a given attribute in a (Geo)DataFrame.
1044
    Sort the (Geo)DataFrame in ascending order for the given attribute.
1045
    Parameters
1046
    -----------
1047
    df : pandas.DataFrame or geopandas.GeoDataFrame
1048
        (Geo)DataFrame to sort and qcut.
1049
    col : str
1050
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
1051
    q : int
1052
        Number of quantiles.
1053
    Returns
1054
    -------
1055
    pandas.DataFrame or gepandas.GeoDataFrame
1056
        Sorted and qcut (Geo)DataFrame.
1057
    """
1058
    df = df.sort_values(col, ascending=True)
1059
1060
    return df.assign(
1061
        quant=pd.qcut(
1062
            df[col],
1063
            q=q,
1064
            labels=range(q),
1065
        )
1066
    )
1067
1068
1069
@timer_func
1070
def allocate_pv(
1071
    q_mastr_gdf: gpd.GeoDataFrame,
1072
    q_buildings_gdf: gpd.GeoDataFrame,
1073
    seed: int,
1074
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1075
    """
1076
    Allocate the MaStR pv generators to the OSM buildings.
1077
    This will determine a building for each pv generator if there are more
1078
    buildings than generators within a given AGS. Primarily generators are
1079
    distributed with the same qunatile as the buildings. Multiple assignment
1080
    is excluded.
1081
    Parameters
1082
    -----------
1083
    q_mastr_gdf : geopandas.GeoDataFrame
1084
        GeoDataFrame containing geocoded and qcut MaStR data.
1085
    q_buildings_gdf : geopandas.GeoDataFrame
1086
        GeoDataFrame containing qcut OSM buildings data.
1087
    seed : int
1088
        Seed to use for random operations with NumPy and pandas.
1089
    Returns
1090
    -------
1091
    tuple with two geopandas.GeoDataFrame s
1092
        GeoDataFrame containing MaStR data allocated to building IDs.
1093
        GeoDataFrame containing building data allocated to MaStR IDs.
1094
    """
1095
    rng = default_rng(seed=seed)
1096
1097
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan)
1098
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan)
1099
1100
    ags_list = q_buildings_gdf.ags.unique()
1101
1102
    num_ags = len(ags_list)
1103
1104
    t0 = perf_counter()
1105
1106
    for count, ags in enumerate(ags_list):
1107
1108
        buildings = q_buildings_gdf.loc[
1109
            (q_buildings_gdf.ags == ags) & (q_buildings_gdf.gens_id.isna())
1110
        ]
1111
        gens = q_mastr_gdf.loc[
1112
            (q_mastr_gdf.ags == ags) & (q_mastr_gdf.building_id.isna())
1113
        ]
1114
1115
        len_build = len(buildings)
1116
        len_gens = len(gens)
1117
1118
        if len_build < len_gens:
1119
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
1120
            logger.error(
1121
                f"There are {len_gens} generators and only {len_build}"
1122
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
1123
                "generators were truncated to match the amount of buildings."
1124
            )
1125
1126
            assert len_build == len(gens)
1127
1128
        for quant in gens.quant.unique():
1129
            q_buildings = buildings.loc[
1130
                (buildings.quant == quant) & (buildings.gens_id.isna())
1131
            ]
1132
            q_gens = gens.loc[gens.quant == quant]
1133
1134
            len_build = len(q_buildings)
1135
            len_gens = len(q_gens)
1136
1137
            if len_build < len_gens:
1138
                delta = len_gens - len_build
1139
1140
                logger.warning(
1141
                    f"There are {len_gens} generators and only {len_build} "
1142
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
1143
                    f"buildings from AGS {ags} will be added randomly."
1144
                )
1145
1146
                add_buildings = pd.Index(
1147
                    rng.choice(
1148
                        buildings.loc[
1149
                            (buildings.quant != quant)
1150
                            & (buildings.gens_id.isna())
1151
                        ].index,
1152
                        size=delta,
1153
                        replace=False,
1154
                    )
1155
                )
1156
1157
                q_buildings = buildings.loc[
1158
                    q_buildings.index.append(add_buildings)
1159
                ]
1160
1161
                assert len(q_buildings) == len_gens
1162
1163
            chosen_buildings = pd.Index(
1164
                rng.choice(
1165
                    q_buildings.index,
1166
                    size=len_gens,
1167
                    replace=False,
1168
                )
1169
            )
1170
1171
            q_mastr_gdf.loc[q_gens.index, "building_id"] = chosen_buildings
1172
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
1173
1174
        if count % 100 == 0:
1175
            logger.debug(
1176
                f"Allocation of {count / num_ags * 100:g} % of AGS done. It took "
1177
                f"{perf_counter() - t0:g} seconds."
1178
            )
1179
1180
            t0 = perf_counter()
1181
1182
    # assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
1183
    #
1184
    # q_mastr_gdf.loc[
1185
    #     assigned_buildings.gens_id, "building_id"
1186
    # ] = assigned_buildings.index
1187
1188
    logger.debug("Allocated status quo generators to buildings.")
1189
1190
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
1191
1192
1193
def frame_to_numeric(
1194
    df: pd.DataFrame | gpd.GeoDataFrame,
1195
) -> pd.DataFrame | gpd.GeoDataFrame:
1196
    """
1197
    Try to convert all columns of a DataFrame to numeric ignoring errors.
1198
    Parameters
1199
    ----------
1200
    df : pandas.DataFrame or geopandas.GeoDataFrame
1201
    Returns
1202
    -------
1203
    pandas.DataFrame or geopandas.GeoDataFrame
1204
    """
1205
    if str(df.index.dtype) == "object":
1206
        df.index = pd.to_numeric(df.index, errors="ignore")
1207
1208
    for col in df.columns:
1209
        if str(df[col].dtype) == "object":
1210
            df[col] = pd.to_numeric(df[col], errors="ignore")
1211
1212
    return df
1213
1214
1215
def validate_output(
1216
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
1217
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
1218
) -> None:
1219
    """
1220
    Validate output.
1221
1222
    * Validate that there are exactly as many buildings with a pv system as there are
1223
      pv systems with a building
1224
    * Validate that the building IDs with a pv system are the same building IDs as
1225
      assigned to the pv systems
1226
    * Validate that the pv system IDs with a building are the same pv system IDs as
1227
      assigned to the buildings
1228
1229
    Parameters
1230
    -----------
1231
    desagg_mastr_gdf : geopandas.GeoDataFrame
1232
        GeoDataFrame containing MaStR data allocated to building IDs.
1233
    desagg_buildings_gdf : geopandas.GeoDataFrame
1234
        GeoDataFrame containing building data allocated to MaStR IDs.
1235
    """
1236
    assert len(
1237
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
1238
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
1239
    assert (
1240
        np.sort(
1241
            desagg_mastr_gdf.loc[
1242
                ~desagg_mastr_gdf.building_id.isna()
1243
            ].building_id.unique()
1244
        )
1245
        == np.sort(
1246
            desagg_buildings_gdf.loc[
1247
                ~desagg_buildings_gdf.gens_id.isna()
1248
            ].index.unique()
1249
        )
1250
    ).all()
1251
    assert (
1252
        np.sort(
1253
            desagg_mastr_gdf.loc[
1254
                ~desagg_mastr_gdf.building_id.isna()
1255
            ].index.unique()
1256
        )
1257
        == np.sort(
1258
            desagg_buildings_gdf.loc[
1259
                ~desagg_buildings_gdf.gens_id.isna()
1260
            ].gens_id.unique()
1261
        )
1262
    ).all()
1263
1264
    logger.debug("Validated output.")
1265
1266
1267
def drop_unallocated_gens(
1268
    gdf: gpd.GeoDataFrame,
1269
) -> gpd.GeoDataFrame:
1270
    """
1271
    Drop generators which did not get allocated.
1272
1273
    Parameters
1274
    -----------
1275
    gdf : geopandas.GeoDataFrame
1276
        GeoDataFrame containing MaStR data allocated to building IDs.
1277
    Returns
1278
    -------
1279
    geopandas.GeoDataFrame
1280
        GeoDataFrame containing MaStR data with generators dropped which did not get
1281
        allocated.
1282
    """
1283
    init_len = len(gdf)
1284
    gdf = gdf.loc[~gdf.building_id.isna()]
1285
    end_len = len(gdf)
1286
1287
    logger.debug(
1288
        f"Dropped {init_len - end_len} "
1289
        f"({((init_len - end_len) / init_len) * 100:g}%)"
1290
        f" of {init_len} unallocated rows from MaStR DataFrame."
1291
    )
1292
1293
    return gdf
1294
1295
1296
@timer_func
1297
def allocate_to_buildings(
1298
    mastr_gdf: gpd.GeoDataFrame,
1299
    buildings_gdf: gpd.GeoDataFrame,
1300
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
1301
    """
1302
    Allocate status quo pv rooftop generators to buildings.
1303
    Parameters
1304
    -----------
1305
    mastr_gdf : geopandas.GeoDataFrame
1306
        GeoDataFrame containing MaStR data with geocoded locations.
1307
    buildings_gdf : geopandas.GeoDataFrame
1308
        GeoDataFrame containing OSM buildings data with buildings without an AGS ID
1309
        dropped.
1310
    Returns
1311
    -------
1312
    tuple with two geopandas.GeoDataFrame s
1313
        GeoDataFrame containing MaStR data allocated to building IDs.
1314
        GeoDataFrame containing building data allocated to MaStR IDs.
1315
    """
1316
    logger.debug("Starting allocation of status quo.")
1317
1318
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
1319
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
1320
1321
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
1322
        q_mastr_gdf, q_buildings_gdf, SEED
1323
    )
1324
1325
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
1326
1327
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
1328
1329
1330
@timer_func
1331
def grid_districts(
1332
    epsg: int,
1333
) -> gpd.GeoDataFrame:
1334
    """
1335
    Load mv grid district geo data from eGo^n Database as
1336
    geopandas.GeoDataFrame.
1337
    Parameters
1338
    -----------
1339
    epsg : int
1340
        EPSG ID to use as CRS.
1341
    Returns
1342
    -------
1343
    geopandas.GeoDataFrame
1344
        GeoDataFrame containing mv grid district ID and geo shapes data.
1345
    """
1346
    gdf = db.select_geodataframe(
1347
        """
1348
        SELECT bus_id, geom
1349
        FROM grid.egon_mv_grid_district
1350
        ORDER BY bus_id
1351
        """,
1352
        index_col="bus_id",
1353
        geom_col="geom",
1354
        epsg=epsg,
1355
    )
1356
1357
    gdf.index = gdf.index.astype(int)
1358
1359
    logger.debug("Grid districts loaded.")
1360
1361
    return gdf
1362
1363
1364
def scenario_data(
1365
    carrier: str = "solar_rooftop",
1366
    scenario: str = "eGon2035",
1367
) -> pd.DataFrame:
1368
    """
1369
    Get scenario capacity data from eGo^n Database.
1370
    Parameters
1371
    -----------
1372
    carrier : str
1373
        Carrier type to filter table by.
1374
    scenario : str
1375
        Scenario to filter table by.
1376
    Returns
1377
    -------
1378
    geopandas.GeoDataFrame
1379
        GeoDataFrame with scenario capacity data in GW.
1380
    """
1381
    with db.session_scope() as session:
1382
        query = session.query(EgonScenarioCapacities).filter(
1383
            EgonScenarioCapacities.carrier == carrier,
1384
            EgonScenarioCapacities.scenario_name == scenario,
1385
        )
1386
1387
    df = pd.read_sql(
1388
        query.statement, query.session.bind, index_col="index"
1389
    ).sort_index()
1390
1391
    logger.debug("Scenario capacity data loaded.")
1392
1393
    return df
1394
1395
1396 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1397
    __tablename__ = "vg250_lan"
1398
    __table_args__ = {"schema": "boundaries"}
1399
1400
    id = Column(BigInteger, primary_key=True, index=True)
1401
    ade = Column(BigInteger)
1402
    gf = Column(BigInteger)
1403
    bsg = Column(BigInteger)
1404
    ars = Column(String)
1405
    ags = Column(String)
1406
    sdv_ars = Column(String)
1407
    gen = Column(String)
1408
    bez = Column(String)
1409
    ibz = Column(BigInteger)
1410
    bem = Column(String)
1411
    nbd = Column(String)
1412
    sn_l = Column(String)
1413
    sn_r = Column(String)
1414
    sn_k = Column(String)
1415
    sn_v1 = Column(String)
1416
    sn_v2 = Column(String)
1417
    sn_g = Column(String)
1418
    fk_s3 = Column(String)
1419
    nuts = Column(String)
1420
    ars_0 = Column(String)
1421
    ags_0 = Column(String)
1422
    wsk = Column(String)
1423
    debkg_id = Column(String)
1424
    rs = Column(String)
1425
    sdv_rs = Column(String)
1426
    rs_0 = Column(String)
1427
    geometry = Column(Geometry(srid=EPSG), index=True)
1428
1429
1430
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1431
    """
1432
    Get feder state data from eGo^n Database.
1433
    Parameters
1434
    -----------
1435
    to_crs : pyproj.crs.crs.CRS
1436
        CRS to transform geometries to.
1437
    Returns
1438
    -------
1439
    geopandas.GeoDataFrame
1440
        GeoDataFrame with federal state data.
1441
    """
1442
    with db.session_scope() as session:
1443
        query = session.query(
1444
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1445
        )
1446
1447
    gdf = gpd.read_postgis(
1448
        query.statement, query.session.bind, index_col="id"
1449
    ).to_crs(to_crs)
1450
1451
    logger.debug("Federal State data loaded.")
1452
1453
    return gdf
1454
1455
1456
@timer_func
1457
def overlay_grid_districts_with_counties(
1458
    mv_grid_district_gdf: gpd.GeoDataFrame,
1459
    federal_state_gdf: gpd.GeoDataFrame,
1460
) -> gpd.GeoDataFrame:
1461
    """
1462
    Calculate the intersections of mv grid districts and counties.
1463
    Parameters
1464
    -----------
1465
    mv_grid_district_gdf : gpd.GeoDataFrame
1466
        GeoDataFrame containing mv grid district ID and geo shapes data.
1467
    federal_state_gdf : gpd.GeoDataFrame
1468
        GeoDataFrame with federal state data.
1469
    Returns
1470
    -------
1471
    geopandas.GeoDataFrame
1472
        GeoDataFrame containing OSM buildings data.
1473
    """
1474
    logger.debug(
1475
        "Calculating intersection overlay between mv grid districts and "
1476
        "counties. This may take a while..."
1477
    )
1478
1479
    gdf = gpd.overlay(
1480
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1481
        mv_grid_district_gdf.reset_index(),
1482
        how="intersection",
1483
        keep_geom_type=True,
1484
    )
1485
1486
    logger.debug("Done!")
1487
1488
    return gdf
1489
1490
1491
@timer_func
1492
def add_overlay_id_to_buildings(
1493
    buildings_gdf: gpd.GeoDataFrame,
1494
    grid_federal_state_gdf: gpd.GeoDataFrame,
1495
) -> gpd.GeoDataFrame:
1496
    """
1497
    Add information about overlay ID to buildings.
1498
    Parameters
1499
    -----------
1500
    buildings_gdf : geopandas.GeoDataFrame
1501
        GeoDataFrame containing OSM buildings data.
1502
    grid_federal_state_gdf : geopandas.GeoDataFrame
1503
        GeoDataFrame with intersection shapes between counties and grid districts.
1504
    Returns
1505
    -------
1506
    geopandas.GeoDataFrame
1507
        GeoDataFrame containing OSM buildings data with overlay ID added.
1508
    """
1509
    gdf = (
1510
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1511
        .sjoin(
1512
            grid_federal_state_gdf,
1513
            how="left",
1514
            predicate="intersects",
1515
        )
1516
        .rename(columns={"index_right": "overlay_id"})
1517
    )
1518
1519
    logger.debug("Added overlay ID to OSM buildings.")
1520
1521
    return gdf
1522
1523
1524
def drop_buildings_outside_grids(
1525
    buildings_gdf: gpd.GeoDataFrame,
1526
) -> gpd.GeoDataFrame:
1527
    """
1528
    Drop all buildings outside of grid areas.
1529
    Parameters
1530
    -----------
1531
    buildings_gdf : geopandas.GeoDataFrame
1532
        GeoDataFrame containing OSM buildings data.
1533
    Returns
1534
    -------
1535
    gepandas.GeoDataFrame
1536
        GeoDataFrame containing OSM buildings data
1537
        with buildings without an bus ID dropped.
1538
    """
1539
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1540
1541
    logger.debug(
1542
        f"{len(buildings_gdf) - len(gdf)} "
1543
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1544
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1545
        "and are therefore dropped."
1546
    )
1547
1548
    return gdf
1549
1550
1551
def cap_per_bus_id(
1552
    scenario: str,
1553
) -> pd.DataFrame:
1554
    """
1555
    Get table with total pv rooftop capacity per grid district.
1556
1557
    Parameters
1558
    -----------
1559
    scenario : str
1560
        Scenario name.
1561
    Returns
1562
    -------
1563
    pandas.DataFrame
1564
        DataFrame with total rooftop capacity per mv grid.
1565
    """
1566
    targets = config.datasets()["solar_rooftop"]["targets"]
1567
1568
    sql = f"""
1569
    SELECT bus as bus_id, p_nom as capacity
1570
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1571
    WHERE carrier = 'solar_rooftop'
1572
    AND scn_name = '{scenario}'
1573
    """
1574
1575
    return db.select_dataframe(sql, index_col="bus_id")
1576
1577
    # overlay_gdf = overlay_gdf.assign(capacity=np.nan)
1578
    #
1579
    # for cap, nuts in scenario_df[["capacity", "nuts"]].itertuples(index=False):
1580
    #     nuts_gdf = overlay_gdf.loc[overlay_gdf.nuts == nuts]
1581
    #
1582
    #     capacity = nuts_gdf.building_area.multiply(
1583
    #         cap / nuts_gdf.building_area.sum()
1584
    #     )
1585
    #
1586
    #     overlay_gdf.loc[nuts_gdf.index] = overlay_gdf.loc[
1587
    #         nuts_gdf.index
1588
    #     ].assign(capacity=capacity.multiply(conversion).to_numpy())
1589
    #
1590
    # return overlay_gdf[["bus_id", "capacity"]].groupby("bus_id").sum()
1591
1592
1593
def determine_end_of_life_gens(
1594
    mastr_gdf: gpd.GeoDataFrame,
1595
    scenario_timestamp: pd.Timestamp,
1596
    pv_rooftop_lifetime: pd.Timedelta,
1597
) -> gpd.GeoDataFrame:
1598
    """
1599
    Determine if an old PV system has reached its end of life.
1600
    Parameters
1601
    -----------
1602
    mastr_gdf : geopandas.GeoDataFrame
1603
        GeoDataFrame containing geocoded MaStR data.
1604
    scenario_timestamp : pandas.Timestamp
1605
        Timestamp at which the scenario takes place.
1606
    pv_rooftop_lifetime : pandas.Timedelta
1607
        Average expected lifetime of PV rooftop systems.
1608
    Returns
1609
    -------
1610
    geopandas.GeoDataFrame
1611
        GeoDataFrame containing geocoded MaStR data and info if the system
1612
        has reached its end of life.
1613
    """
1614
    mastr_gdf = mastr_gdf.assign(
1615
        age=scenario_timestamp - mastr_gdf.start_up_date
1616
    )
1617
1618
    logger.debug("Determined if pv rooftop systems reached their end of life.")
1619
1620
    return mastr_gdf.assign(end_of_life=pv_rooftop_lifetime < mastr_gdf.age)
1621
1622
1623
def calculate_max_pv_cap_per_building(
1624
    buildings_gdf: gpd.GeoDataFrame,
1625
    mastr_gdf: gpd.GeoDataFrame,
1626
    pv_cap_per_sq_m: float | int,
1627
    roof_factor: float | int,
1628
) -> gpd.GeoDataFrame:
1629
    """
1630
    Calculate the estimated maximum possible PV capacity per building.
1631
    Parameters
1632
    -----------
1633
    buildings_gdf : geopandas.GeoDataFrame
1634
        GeoDataFrame containing OSM buildings data.
1635
    mastr_gdf : geopandas.GeoDataFrame
1636
        GeoDataFrame containing geocoded MaStR data.
1637
    pv_cap_per_sq_m : float, int
1638
        Average expected, installable PV capacity per square meter.
1639
    roof_factor : float, int
1640
        Average for PV usable roof area share.
1641
    Returns
1642
    -------
1643
    geopandas.GeoDataFrame
1644
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1645
        capacity.
1646
    """
1647
    gdf = (
1648
        buildings_gdf.reset_index()
1649
        .merge(
1650
            mastr_gdf[
1651
                [
1652
                    "capacity",
1653
                    "end_of_life",
1654
                    "building_id",
1655
                    "EinheitlicheAusrichtungUndNeigungswinkel",
1656
                    "Hauptausrichtung",
1657
                    "HauptausrichtungNeigungswinkel",
1658
                ]
1659
            ],
1660
            how="left",
1661
            left_on="id",
1662
            right_on="building_id",
1663
        )
1664
        .set_index("id")
1665
        .drop(columns="building_id")
1666
    )
1667
1668
    return gdf.assign(
1669
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1670
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1671
        bus_id=gdf.bus_id.astype(int),
1672
    )
1673
1674
1675
def calculate_building_load_factor(
1676
    mastr_gdf: gpd.GeoDataFrame,
1677
    buildings_gdf: gpd.GeoDataFrame,
1678
    rounding: int = 4,
1679
) -> gpd.GeoDataFrame:
1680
    """
1681
    Calculate the roof load factor from existing PV systems.
1682
    Parameters
1683
    -----------
1684
    mastr_gdf : geopandas.GeoDataFrame
1685
        GeoDataFrame containing geocoded MaStR data.
1686
    buildings_gdf : geopandas.GeoDataFrame
1687
        GeoDataFrame containing OSM buildings data.
1688
    rounding : int
1689
        Rounding to use for load factor.
1690
    Returns
1691
    -------
1692
    geopandas.GeoDataFrame
1693
        GeoDataFrame containing geocoded MaStR data with calculated load factor.
1694
    """
1695
    gdf = mastr_gdf.merge(
1696
        buildings_gdf[["max_cap", "building_area"]].loc[
1697
            ~buildings_gdf["max_cap"].isna()
1698
        ],
1699
        how="left",
1700
        left_on="building_id",
1701
        right_index=True,
1702
    )
1703
1704
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1705
1706
1707
def get_probability_for_property(
1708
    mastr_gdf: gpd.GeoDataFrame,
1709
    cap_range: tuple[int | float, int | float],
1710
    prop: str,
1711
) -> tuple[np.array, np.array]:
1712
    """
1713
    Calculate the probability of the different options of a property of the
1714
    existing PV plants.
1715
    Parameters
1716
    -----------
1717
    mastr_gdf : geopandas.GeoDataFrame
1718
        GeoDataFrame containing geocoded MaStR data.
1719
    cap_range : tuple(int, int)
1720
        Capacity range of PV plants to look at.
1721
    prop : str
1722
        Property to calculate probabilities for. String needs to be in columns
1723
        of mastr_gdf.
1724
    Returns
1725
    -------
1726
    tuple
1727
        numpy.array
1728
            Unique values of property.
1729
        numpy.array
1730
            Probabilties per unique value.
1731
    """
1732
    cap_range_gdf = mastr_gdf.loc[
1733
        (mastr_gdf.capacity > cap_range[0])
1734
        & (mastr_gdf.capacity <= cap_range[1])
1735
    ]
1736
1737
    if prop == "load_factor":
1738
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1739
1740
    count = Counter(
1741
        cap_range_gdf[prop].loc[
1742
            ~cap_range_gdf[prop].isna()
1743
            & ~cap_range_gdf[prop].isnull()
1744
            & ~(cap_range_gdf[prop] == "None")
1745
        ]
1746
    )
1747
1748
    values = np.array(list(count.keys()))
1749
    probabilities = np.fromiter(count.values(), dtype=float)
1750
    probabilities = probabilities / np.sum(probabilities)
1751
1752
    return values, probabilities
1753
1754
1755
@timer_func
1756
def probabilities(
1757
    mastr_gdf: gpd.GeoDataFrame,
1758
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1759
    properties: list[str] | None = None,
1760
) -> dict:
1761
    """
1762
    Calculate the probability of the different options of properties of the
1763
    existing PV plants.
1764
    Parameters
1765
    -----------
1766
    mastr_gdf : geopandas.GeoDataFrame
1767
        GeoDataFrame containing geocoded MaStR data.
1768
    cap_ranges : list(tuple(int, int))
1769
        List of capacity ranges to distinguish between. The first tuple should
1770
        start with a zero and the last one should end with infinite.
1771
    properties : list(str)
1772
        List of properties to calculate probabilities for. Strings needs to be
1773
        in columns of mastr_gdf.
1774
    Returns
1775
    -------
1776
    dict
1777
        Dictionary with values and probabilities per capacity range.
1778
    """
1779
    if cap_ranges is None:
1780
        cap_ranges = [
1781
            (0, 30),
1782
            (30, 100),
1783
            (100, float("inf")),
1784
        ]
1785
    if properties is None:
1786
        properties = [
1787
            "EinheitlicheAusrichtungUndNeigungswinkel",
1788
            "Hauptausrichtung",
1789
            "HauptausrichtungNeigungswinkel",
1790
            "load_factor",
1791
        ]
1792
1793
    prob_dict = {}
1794
1795
    for cap_range in cap_ranges:
1796
        prob_dict[cap_range] = {
1797
            "values": {},
1798
            "probabilities": {},
1799
        }
1800
1801
        for prop in properties:
1802
            v, p = get_probability_for_property(
1803
                mastr_gdf,
1804
                cap_range,
1805
                prop,
1806
            )
1807
1808
            prob_dict[cap_range]["values"][prop] = v
1809
            prob_dict[cap_range]["probabilities"][prop] = p
1810
1811
    return prob_dict
1812
1813
1814
def cap_share_per_cap_range(
1815
    mastr_gdf: gpd.GeoDataFrame,
1816
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1817
) -> dict[tuple[int | float, int | float], float]:
1818
    """
1819
    Calculate the share of PV capacity from the total PV capacity within
1820
    capacity ranges.
1821
    Parameters
1822
    -----------
1823
    mastr_gdf : geopandas.GeoDataFrame
1824
        GeoDataFrame containing geocoded MaStR data.
1825
    cap_ranges : list(tuple(int, int))
1826
        List of capacity ranges to distinguish between. The first tuple should
1827
        start with a zero and the last one should end with infinite.
1828
    Returns
1829
    -------
1830
    dict
1831
        Dictionary with share of PV capacity from the total PV capacity within
1832
        capacity ranges.
1833
    """
1834
    if cap_ranges is None:
1835
        cap_ranges = [
1836
            (0, 30),
1837
            (30, 100),
1838
            (100, float("inf")),
1839
        ]
1840
1841
    cap_share_dict = {}
1842
1843
    total_cap = mastr_gdf.capacity.sum()
1844
1845
    for cap_range in cap_ranges:
1846
        cap_share = (
1847
            mastr_gdf.loc[
1848
                (mastr_gdf.capacity > cap_range[0])
1849
                & (mastr_gdf.capacity <= cap_range[1])
1850
            ].capacity.sum()
1851
            / total_cap
1852
        )
1853
1854
        cap_share_dict[cap_range] = cap_share
1855
1856
    return cap_share_dict
1857
1858
1859
def mean_load_factor_per_cap_range(
1860
    mastr_gdf: gpd.GeoDataFrame,
1861
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1862
) -> dict[tuple[int | float, int | float], float]:
1863
    """
1864
    Calculate the mean roof load factor per capacity range from existing PV
1865
    plants.
1866
    Parameters
1867
    -----------
1868
    mastr_gdf : geopandas.GeoDataFrame
1869
        GeoDataFrame containing geocoded MaStR data.
1870
    cap_ranges : list(tuple(int, int))
1871
        List of capacity ranges to distinguish between. The first tuple should
1872
        start with a zero and the last one should end with infinite.
1873
    Returns
1874
    -------
1875
    dict
1876
        Dictionary with mean roof load factor per capacity range.
1877
    """
1878
    if cap_ranges is None:
1879
        cap_ranges = [
1880
            (0, 30),
1881
            (30, 100),
1882
            (100, float("inf")),
1883
        ]
1884
1885
    load_factor_dict = {}
1886
1887
    for cap_range in cap_ranges:
1888
        load_factor = mastr_gdf.loc[
1889
            (mastr_gdf.load_factor <= 1)
1890
            & (mastr_gdf.capacity > cap_range[0])
1891
            & (mastr_gdf.capacity <= cap_range[1])
1892
        ].load_factor.mean()
1893
1894
        load_factor_dict[cap_range] = load_factor
1895
1896
    return load_factor_dict
1897
1898
1899
def building_area_range_per_cap_range(
1900
    mastr_gdf: gpd.GeoDataFrame,
1901
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1902
    min_building_size: int | float = 10.0,
1903
    upper_quantile: float = 0.95,
1904
    lower_quantile: float = 0.05,
1905
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1906
    """
1907
    Estimate normal building area range per capacity range.
1908
    Calculate the mean roof load factor per capacity range from existing PV
1909
    plants.
1910
    Parameters
1911
    -----------
1912
    mastr_gdf : geopandas.GeoDataFrame
1913
        GeoDataFrame containing geocoded MaStR data.
1914
    cap_ranges : list(tuple(int, int))
1915
        List of capacity ranges to distinguish between. The first tuple should
1916
        start with a zero and the last one should end with infinite.
1917
    min_building_size : int, float
1918
        Minimal building size to consider for PV plants.
1919
    upper_quantile : float
1920
        Upper quantile to estimate maximum building size per capacity range.
1921
    lower_quantile : float
1922
        Lower quantile to estimate minimum building size per capacity range.
1923
    Returns
1924
    -------
1925
    dict
1926
        Dictionary with estimated normal building area range per capacity
1927
        range.
1928
    """
1929
    if cap_ranges is None:
1930
        cap_ranges = [
1931
            (0, 30),
1932
            (30, 100),
1933
            (100, float("inf")),
1934
        ]
1935
1936
    building_area_range_dict = {}
1937
1938
    n_ranges = len(cap_ranges)
1939
1940
    for count, cap_range in enumerate(cap_ranges):
1941
        cap_range_gdf = mastr_gdf.loc[
1942
            (mastr_gdf.capacity > cap_range[0])
1943
            & (mastr_gdf.capacity <= cap_range[1])
1944
        ]
1945
1946
        if count == 0:
1947
            building_area_range_dict[cap_range] = (
1948
                min_building_size,
1949
                cap_range_gdf.building_area.quantile(upper_quantile),
1950
            )
1951
        elif count == n_ranges - 1:
1952
            building_area_range_dict[cap_range] = (
1953
                cap_range_gdf.building_area.quantile(lower_quantile),
1954
                float("inf"),
1955
            )
1956
        else:
1957
            building_area_range_dict[cap_range] = (
1958
                cap_range_gdf.building_area.quantile(lower_quantile),
1959
                cap_range_gdf.building_area.quantile(upper_quantile),
1960
            )
1961
1962
    values = list(building_area_range_dict.values())
1963
1964
    building_area_range_normed_dict = {}
1965
1966
    for count, (cap_range, (min_area, max_area)) in enumerate(
1967
        building_area_range_dict.items()
1968
    ):
1969
        if count == 0:
1970
            building_area_range_normed_dict[cap_range] = (
1971
                min_area,
1972
                np.mean((values[count + 1][0], max_area)),
1973
            )
1974
        elif count == n_ranges - 1:
1975
            building_area_range_normed_dict[cap_range] = (
1976
                np.mean((values[count - 1][1], min_area)),
1977
                max_area,
1978
            )
1979
        else:
1980
            building_area_range_normed_dict[cap_range] = (
1981
                np.mean((values[count - 1][1], min_area)),
1982
                np.mean((values[count + 1][0], max_area)),
1983
            )
1984
1985
    return building_area_range_normed_dict
1986
1987
1988
@timer_func
1989
def desaggregate_pv_in_mv_grid(
1990
    buildings_gdf: gpd.GeoDataFrame,
1991
    pv_cap: float | int,
1992
    **kwargs,
1993
) -> gpd.GeoDataFrame:
1994
    """
1995
    Desaggregate PV capacity on buildings within a given grid district.
1996
    Parameters
1997
    -----------
1998
    buildings_gdf : geopandas.GeoDataFrame
1999
        GeoDataFrame containing buildings within the grid district.
2000
    pv_cap : float, int
2001
        PV capacity to desaggregate.
2002
    Other Parameters
2003
    -----------
2004
    prob_dict : dict
2005
        Dictionary with values and probabilities per capacity range.
2006
    cap_share_dict : dict
2007
        Dictionary with share of PV capacity from the total PV capacity within
2008
        capacity ranges.
2009
    building_area_range_dict : dict
2010
        Dictionary with estimated normal building area range per capacity
2011
        range.
2012
    load_factor_dict : dict
2013
        Dictionary with mean roof load factor per capacity range.
2014
    seed : int
2015
        Seed to use for random operations with NumPy and pandas.
2016
    pv_cap_per_sq_m : float, int
2017
        Average expected, installable PV capacity per square meter.
2018
    Returns
2019
    -------
2020
    geopandas.GeoDataFrame
2021
        GeoDataFrame containing OSM building data with desaggregated PV
2022
        plants.
2023
    """
2024
    bus_id = int(buildings_gdf.bus_id.iat[0])
2025
2026
    rng = default_rng(seed=kwargs["seed"])
2027
    random_state = RandomState(seed=kwargs["seed"])
2028
2029
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
2030
2031
    for cap_range, share in kwargs["cap_share_dict"].items():
2032
        pv_cap_range = pv_cap * share
2033
2034
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
2035
2036
        cap_range_buildings_gdf = buildings_gdf.loc[
2037
            ~buildings_gdf.index.isin(results_df.index)
2038
            & (buildings_gdf.building_area > b_area_min)
2039
            & (buildings_gdf.building_area <= b_area_max)
2040
        ]
2041
2042
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
2043
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2044
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2045
            load_factor=np.nan,
2046
            capacity=np.nan,
2047
        )
2048
2049
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2050
2051
        if total_mean_cap == 0:
2052
            logger.warning(
2053
                f"There are no matching roof for capacity range {cap_range} "
2054
                f"kW in grid {bus_id}. Using all buildings as fallback."
2055
            )
2056
2057
            cap_range_buildings_gdf = buildings_gdf.loc[
2058
                ~buildings_gdf.index.isin(results_df.index)
2059
            ]
2060
2061
            if len(cap_range_buildings_gdf) == 0:
2062
                logger.warning(
2063
                    "There are no roofes available for capacity range "
2064
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
2065
                )
2066
                cap_range_buildings_gdf = buildings_gdf.copy()
2067
2068
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
2069
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
2070
                load_factor=np.nan,
2071
                capacity=np.nan,
2072
            )
2073
2074
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
2075
2076
        elif total_mean_cap < pv_cap_range:
2077
            logger.warning(
2078
                f"Average roof utilization of the roof area in grid {bus_id} "
2079
                f"and capacity range {cap_range} kW is not sufficient. The "
2080
                "roof utilization will be above average."
2081
            )
2082
2083
        frac = max(
2084
            pv_cap_range / total_mean_cap,
2085
            1 / len(cap_range_buildings_gdf),
2086
        )
2087
2088
        samples_gdf = cap_range_buildings_gdf.sample(
2089
            frac=min(1, frac),
2090
            random_state=random_state,
2091
        )
2092
2093
        cap_range_dict = kwargs["prob_dict"][cap_range]
2094
2095
        values_dict = cap_range_dict["values"]
2096
        p_dict = cap_range_dict["probabilities"]
2097
2098
        load_factors = rng.choice(
2099
            a=values_dict["load_factor"],
2100
            size=len(samples_gdf),
2101
            p=p_dict["load_factor"],
2102
        )
2103
2104
        samples_gdf = samples_gdf.assign(
2105
            load_factor=load_factors,
2106
            capacity=(
2107
                samples_gdf.building_area
2108
                * load_factors
2109
                * kwargs["pv_cap_per_sq_m"]
2110
            ).clip(lower=0.4),
2111
        )
2112
2113
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
2114
2115
        samples_gdf = samples_gdf.assign(
2116
            capacity=(samples_gdf.capacity * missing_factor),
2117
            load_factor=(samples_gdf.load_factor * missing_factor),
2118
        )
2119
2120
        assert np.isclose(
2121
            samples_gdf.capacity.sum(),
2122
            pv_cap_range,
2123
            rtol=1e-03,
2124
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
2125
2126
        results_df = pd.concat(
2127
            [
2128
                results_df,
2129
                samples_gdf,
2130
            ],
2131
        )
2132
2133
    total_missing_factor = pv_cap / results_df.capacity.sum()
2134
2135
    results_df = results_df.assign(
2136
        capacity=(results_df.capacity * total_missing_factor),
2137
    )
2138
2139
    assert np.isclose(
2140
        results_df.capacity.sum(),
2141
        pv_cap,
2142
        rtol=1e-03,
2143
    ), f"{results_df.capacity.sum()} != {pv_cap}"
2144
2145
    return gpd.GeoDataFrame(
2146
        results_df,
2147
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 2031 is not entered. Are you sure this can never be the case?
Loading history...
2148
        geometry="geom",
2149
    )
2150
2151
2152
@timer_func
2153
def desaggregate_pv(
2154
    buildings_gdf: gpd.GeoDataFrame,
2155
    cap_df: pd.DataFrame,
2156
    **kwargs,
2157
) -> gpd.GeoDataFrame:
2158
    """
2159
    Desaggregate PV capacity on buildings within a given grid district.
2160
    Parameters
2161
    -----------
2162
    buildings_gdf : geopandas.GeoDataFrame
2163
        GeoDataFrame containing OSM buildings data.
2164
    cap_df : pandas.DataFrame
2165
        DataFrame with total rooftop capacity per mv grid.
2166
    Other Parameters
2167
    -----------
2168
    prob_dict : dict
2169
        Dictionary with values and probabilities per capacity range.
2170
    cap_share_dict : dict
2171
        Dictionary with share of PV capacity from the total PV capacity within
2172
        capacity ranges.
2173
    building_area_range_dict : dict
2174
        Dictionary with estimated normal building area range per capacity
2175
        range.
2176
    load_factor_dict : dict
2177
        Dictionary with mean roof load factor per capacity range.
2178
    seed : int
2179
        Seed to use for random operations with NumPy and pandas.
2180
    pv_cap_per_sq_m : float, int
2181
        Average expected, installable PV capacity per square meter.
2182
    Returns
2183
    -------
2184
    geopandas.GeoDataFrame
2185
        GeoDataFrame containing OSM building data with desaggregated PV
2186
        plants.
2187
    """
2188
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
2189
2190
    building_bus_ids = set(buildings_gdf.bus_id)
2191
    cap_bus_ids = set(cap_df.index)
2192
2193
    logger.debug(f"{len(building_bus_ids)}, {len(cap_bus_ids)}")
2194
2195
    if len(building_bus_ids) > len(cap_bus_ids):
2196
        missing = building_bus_ids - cap_bus_ids
2197
    else:
2198
        missing = cap_bus_ids - building_bus_ids
2199
2200
    logger.debug(str(missing))
2201
2202
    # assert set(buildings_gdf.bus_id.unique()) == set(cap_df.index)
2203
2204
    for bus_id in buildings_gdf.bus_id.unique():
2205
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
2206
2207
        pv_installed_gdf = buildings_grid_gdf.loc[
2208
            ~buildings_grid_gdf.end_of_life
2209
        ]
2210
2211
        pv_installed = pv_installed_gdf.capacity.sum()
2212
2213
        pot_buildings_gdf = buildings_grid_gdf.drop(
2214
            index=pv_installed_gdf.index
2215
        )
2216
2217
        if len(pot_buildings_gdf) == 0:
2218
            logger.error(
2219
                f"In grid {bus_id} there are no potential buildings to allocate "
2220
                "PV capacity to. The grid is skipped. This message should only "
2221
                "appear doing test runs with few buildings."
2222
            )
2223
2224
            continue
2225
2226
        pv_target = cap_df.at[bus_id, "capacity"] * 1000
2227
2228
        pv_missing = pv_target - pv_installed
2229
2230
        if pv_missing <= 0:
2231
            logger.info(
2232
                f"In grid {bus_id} there is more PV installed ({pv_installed: g}) in "
2233
                f"status Quo than allocated within the scenario ({pv_target: g}). No "
2234
                f"new generators are added."
2235
            )
2236
2237
            continue
2238
2239
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
2240
            logger.error(
2241
                f"In grid {bus_id} there is less PV potential ("
2242
                f"{pot_buildings_gdf.max_cap.sum():g} kW) than allocated PV "
2243
                f"capacity ({pv_missing:g} kW). The grid is skipped. This message "
2244
                "should only appear doing test runs with few buildings."
2245
            )
2246
2247
            continue
2248
2249
        gdf = desaggregate_pv_in_mv_grid(
2250
            buildings_gdf=pot_buildings_gdf,
2251
            pv_cap=pv_missing,
2252
            **kwargs,
2253
        )
2254
2255
        # init_len = len(allocated_buildings_gdf)
2256
        #
2257
        # init_cap = allocated_buildings_gdf.capacity.sum()
2258
2259
        allocated_buildings_gdf = pd.concat(
2260
            [
2261
                allocated_buildings_gdf,
2262
                gdf,
2263
            ]
2264
        )
2265
2266
        # assert np.isclose(
2267
        #     init_cap + gdf.capacity.sum(),
2268
        #     allocated_buildings_gdf.capacity.sum(),
2269
        # )
2270
        #
2271
        # assert len(allocated_buildings_gdf) == init_len + len(gdf)
2272
        #
2273
        # assert np.isclose(
2274
        #     pv_missing, gdf.capacity.sum(), rtol=1e-03
2275
        # ), f"{pv_missing} != {gdf.capacity.sum()}"
2276
2277
    # assert np.isclose(
2278
    #     cap_df.loc[buildings_gdf.bus_id.unique()].capacity.sum() * 1000,
2279
    #     allocated_buildings_gdf.capacity.sum(),
2280
    #     rtol=1e-03,
2281
    # ), (
2282
    #     f"{cap_df.loc[buildings_gdf.bus_id.unique()].capacity.sum() * 1000} != "
2283
    #     f"{allocated_buildings_gdf.capacity.sum()}"
2284
    # )
2285
2286
    logger.debug("Desaggregated scenario.")
2287
    logger.debug(f"Scenario capacity: {cap_df.capacity.sum(): g}")
2288
    logger.debug(
2289
        f"Generator capacity: {allocated_buildings_gdf.capacity.sum(): g}"
2290
    )
2291
2292
    return gpd.GeoDataFrame(
2293
        allocated_buildings_gdf,
2294
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
2295
        geometry="geom",
2296
    )
2297
2298
2299
@timer_func
2300
def add_buildings_meta_data(
2301
    buildings_gdf: gpd.GeoDataFrame,
2302
    prob_dict: dict,
2303
    seed: int,
2304
) -> gpd.GeoDataFrame:
2305
    """
2306
    Randomly add additional metadata to desaggregated PV plants.
2307
    Parameters
2308
    -----------
2309
    buildings_gdf : geopandas.GeoDataFrame
2310
        GeoDataFrame containing OSM buildings data with desaggregated PV
2311
        plants.
2312
    prob_dict : dict
2313
        Dictionary with values and probabilities per capacity range.
2314
    seed : int
2315
        Seed to use for random operations with NumPy and pandas.
2316
    Returns
2317
    -------
2318
    geopandas.GeoDataFrame
2319
        GeoDataFrame containing OSM building data with desaggregated PV
2320
        plants.
2321
    """
2322
    rng = default_rng(seed=seed)
2323
    buildings_gdf = buildings_gdf.reset_index().rename(
2324
        columns={
2325
            "index": "building_id",
2326
        }
2327
    )
2328
2329
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
2330
        cap_range_gdf = buildings_gdf.loc[
2331
            (buildings_gdf.capacity >= min_cap)
2332
            & (buildings_gdf.capacity < max_cap)
2333
        ]
2334
2335
        for key, values in cap_range_prob_dict["values"].items():
2336
            if key == "load_factor":
2337
                continue
2338
2339
            gdf = cap_range_gdf.loc[
2340
                cap_range_gdf[key].isna()
2341
                | cap_range_gdf[key].isnull()
2342
                | (cap_range_gdf[key] == "None")
2343
            ]
2344
2345
            key_vals = rng.choice(
2346
                a=values,
2347
                size=len(gdf),
2348
                p=cap_range_prob_dict["probabilities"][key],
2349
            )
2350
2351
            buildings_gdf.loc[gdf.index, key] = key_vals
2352
2353
    return buildings_gdf
2354
2355
2356
def add_voltage_level(
2357
    buildings_gdf: gpd.GeoDataFrame,
2358
) -> gpd.GeoDataFrame:
2359
    """
2360
    Add voltage level derived from generator capacity to the power plants.
2361
    Parameters
2362
    -----------
2363
    buildings_gdf : geopandas.GeoDataFrame
2364
        GeoDataFrame containing OSM buildings data with desaggregated PV
2365
        plants.
2366
    Returns
2367
    -------
2368
    geopandas.GeoDataFrame
2369
        GeoDataFrame containing OSM building data with voltage level per generator.
2370
    """
2371
2372
    def voltage_levels(p: float) -> int:
2373
        if p < 100:
2374
            return 7
2375
        elif p < 200:
2376
            return 6
2377
        elif p < 5500:
2378
            return 5
2379
        elif p < 20000:
2380
            return 4
2381
        elif p < 120000:
2382
            return 3
2383
        return 1
2384
2385
    return buildings_gdf.assign(
2386
        voltage_level=buildings_gdf.capacity.apply(voltage_levels)
2387
    )
2388
2389
2390
def add_start_up_date(
2391
    buildings_gdf: gpd.GeoDataFrame,
2392
    start: pd.Timestamp,
2393
    end: pd.Timestamp,
2394
    seed: int,
2395
):
2396
    """
2397
    Randomly and linear add start-up date to new pv generators.
2398
    Parameters
2399
    ----------
2400
    buildings_gdf : geopandas.GeoDataFrame
2401
        GeoDataFrame containing OSM buildings data with desaggregated PV
2402
        plants.
2403
    start : pandas.Timestamp
2404
        Minimum Timestamp to use.
2405
    end : pandas.Timestamp
2406
        Maximum Timestamp to use.
2407
    seed : int
2408
        Seed to use for random operations with NumPy and pandas.
2409
    Returns
2410
    -------
2411
    geopandas.GeoDataFrame
2412
        GeoDataFrame containing OSM buildings data with start-up date added.
2413
    """
2414
    rng = default_rng(seed=seed)
2415
2416
    date_range = pd.date_range(start=start, end=end, freq="1D")
2417
2418
    return buildings_gdf.assign(
2419
        start_up_date=rng.choice(date_range, size=len(buildings_gdf))
2420
    )
2421
2422
2423
@timer_func
2424
def allocate_scenarios(
2425
    mastr_gdf: gpd.GeoDataFrame,
2426
    buildings_gdf: gpd.GeoDataFrame,
2427
    last_scenario_gdf: gpd.GeoDataFrame,
2428
    scenario: str,
2429
):
2430
    """
2431
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2432
    Parameters
2433
    ----------
2434
    mastr_gdf : geopandas.GeoDataFrame
2435
        GeoDataFrame containing geocoded MaStR data.
2436
    buildings_gdf : geopandas.GeoDataFrame
2437
        GeoDataFrame containing OSM buildings data.
2438
    last_scenario_gdf : geopandas.GeoDataFrame
2439
        GeoDataFrame containing OSM buildings matched with pv generators from temporal
2440
        preceding scenario.
2441
    scenario : str
2442
        Scenario to desaggrgate and allocate.
2443
    Returns
2444
    -------
2445
    tuple
2446
        geopandas.GeoDataFrame
2447
            GeoDataFrame containing OSM buildings matched with pv generators.
2448
        pandas.DataFrame
2449
            DataFrame containing pv rooftop capacity per grid id.
2450
    """
2451
    grid_districts_gdf = grid_districts(EPSG)
2452
2453
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
2454
2455
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
2456
        grid_districts_gdf,
2457
        federal_state_gdf,
2458
    )
2459
2460
    buildings_overlay_gdf = add_overlay_id_to_buildings(
2461
        buildings_gdf,
2462
        grid_federal_state_gdf,
2463
    )
2464
2465
    valid_buildings_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
2466
2467
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2468
2469
    logger.debug(
2470
        f"cap_per_bus_id_df total capacity: {cap_per_bus_id_df.capacity.sum()}"
2471
    )
2472
2473
    last_scenario_gdf = determine_end_of_life_gens(
2474
        last_scenario_gdf,
2475
        SCENARIO_TIMESTAMP[scenario],
2476
        PV_ROOFTOP_LIFETIME,
2477
    )
2478
2479
    buildings_gdf = calculate_max_pv_cap_per_building(
2480
        valid_buildings_gdf,
2481
        last_scenario_gdf,
2482
        PV_CAP_PER_SQ_M,
2483
        ROOF_FACTOR,
2484
    )
2485
2486
    mastr_gdf = calculate_building_load_factor(
2487
        mastr_gdf,
2488
        buildings_gdf,
2489
    )
2490
2491
    probabilities_dict = probabilities(
2492
        mastr_gdf,
2493
        cap_ranges=CAP_RANGES,
2494
    )
2495
2496
    cap_share_dict = cap_share_per_cap_range(
2497
        mastr_gdf,
2498
        cap_ranges=CAP_RANGES,
2499
    )
2500
2501
    load_factor_dict = mean_load_factor_per_cap_range(
2502
        mastr_gdf,
2503
        cap_ranges=CAP_RANGES,
2504
    )
2505
2506
    building_area_range_dict = building_area_range_per_cap_range(
2507
        mastr_gdf,
2508
        cap_ranges=CAP_RANGES,
2509
        min_building_size=MIN_BUILDING_SIZE,
2510
        upper_quantile=UPPER_QUNATILE,
2511
        lower_quantile=LOWER_QUANTILE,
2512
    )
2513
2514
    allocated_buildings_gdf = desaggregate_pv(
2515
        buildings_gdf=buildings_gdf,
2516
        cap_df=cap_per_bus_id_df,
2517
        prob_dict=probabilities_dict,
2518
        cap_share_dict=cap_share_dict,
2519
        building_area_range_dict=building_area_range_dict,
2520
        load_factor_dict=load_factor_dict,
2521
        seed=SEED,
2522
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2523
    )
2524
2525
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2526
2527
    meta_buildings_gdf = frame_to_numeric(
2528
        add_buildings_meta_data(
2529
            allocated_buildings_gdf,
2530
            probabilities_dict,
2531
            SEED,
2532
        )
2533
    )
2534
2535
    return (
2536
        add_start_up_date(
2537
            meta_buildings_gdf,
2538
            start=last_scenario_gdf.start_up_date.max(),
2539
            end=SCENARIO_TIMESTAMP[scenario],
2540
            seed=SEED,
2541
        ),
2542
        cap_per_bus_id_df,
2543
    )
2544
2545
2546
class EgonPowerPlantPvRoofBuildingScenario(Base):
2547
    __tablename__ = "egon_power_plants_pv_roof_building"
2548
    __table_args__ = {"schema": "supply"}
2549
2550
    index = Column(Integer, primary_key=True, index=True)
2551
    scenario = Column(String)
2552
    building_id = Column(Integer)
2553
    gens_id = Column(String, nullable=True)
2554
    capacity = Column(Float)
2555
    einheitliche_ausrichtung_und_neigungswinkel = Column(Float)
2556
    hauptausrichtung = Column(String)
2557
    hauptausrichtung_neigungswinkel = Column(String)
2558
    voltage_level = Column(Integer)
2559
2560
2561
def create_scenario_table(buildings_gdf):
2562
    """Create mapping table pv_unit <-> building for scenario"""
2563
    EgonPowerPlantPvRoofBuildingScenario.__table__.drop(
2564
        bind=engine, checkfirst=True
2565
    )
2566
    EgonPowerPlantPvRoofBuildingScenario.__table__.create(
2567
        bind=engine, checkfirst=True
2568
    )
2569
2570
    buildings_gdf.rename(columns=COLS_TO_RENAME)[
2571
        COLS_TO_EXPORT
2572
    ].reset_index().to_sql(
2573
        name=EgonPowerPlantPvRoofBuildingScenario.__table__.name,
2574
        schema=EgonPowerPlantPvRoofBuildingScenario.__table__.schema,
2575
        con=db.engine(),
2576
        if_exists="append",
2577
        index=False,
2578
    )
2579
2580
2581
def geocode_mastr_data():
2582
    """
2583
    Read PV rooftop data from MaStR CSV
2584
    TODO: the source will be replaced as soon as the MaStR data is available
2585
     in DB.
2586
    """
2587
    mastr_df = mastr_data(
2588
        MASTR_INDEX_COL,
2589
        MASTR_RELEVANT_COLS,
2590
        MASTR_DTYPES,
2591
        MASTR_PARSE_DATES,
2592
    )
2593
2594
    clean_mastr_df = clean_mastr_data(
2595
        mastr_df,
2596
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
2597
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
2598
        seed=SEED,
2599
        rounding=ROUNDING,
2600
    )
2601
2602
    geocoding_df = geocoding_data(clean_mastr_df)
2603
2604
    ratelimiter = geocoder(USER_AGENT, MIN_DELAY_SECONDS)
2605
2606
    geocode_gdf = geocode_data(geocoding_df, ratelimiter, EPSG)
2607
2608
    create_geocoded_table(geocode_gdf)
2609
2610
2611
def pv_rooftop_to_buildings():
2612
    """Main script, executed as task"""
2613
2614
    mastr_gdf = load_mastr_data()
2615
2616
    buildings_gdf = load_building_data()
2617
2618
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2619
        mastr_gdf, buildings_gdf
2620
    )
2621
2622
    all_buildings_gdf = (
2623
        desagg_mastr_gdf.assign(scenario="status_quo")
2624
        .reset_index()
2625
        .rename(columns={"geometry": "geom", "EinheitMastrNummer": "gens_id"})
2626
    )
2627
2628
    scenario_buildings_gdf = all_buildings_gdf.copy()
2629
2630
    cap_per_bus_id_df = pd.DataFrame()
2631
2632
    for scenario in SCENARIOS:
2633
        logger.debug(f"Desaggregating scenario {scenario}.")
2634
        (
2635
            scenario_buildings_gdf,
2636
            cap_per_bus_id_scenario_df,
2637
        ) = allocate_scenarios(  # noqa: F841
2638
            desagg_mastr_gdf,
2639
            desagg_buildings_gdf,
2640
            scenario_buildings_gdf,
2641
            scenario,
2642
        )
2643
2644
        all_buildings_gdf = gpd.GeoDataFrame(
2645
            pd.concat(
2646
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2647
            ),
2648
            crs=scenario_buildings_gdf.crs,
2649
            geometry="geom",
2650
        )
2651
2652
        cap_per_bus_id_df = pd.concat(
2653
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2654
        )
2655
2656
    # export scenario
2657
    create_scenario_table(add_voltage_level(all_buildings_gdf))
2658