Completed
Push — dev ( c38e3d...5d805f )
by
unknown
22s queued 18s
created

clean_mastr_data()   A

Complexity

Conditions 2

Size

Total Lines 83
Code Lines 34

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 34
dl 0
loc 83
rs 9.064
c 0
b 0
f 0
cc 2
nop 4

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate
3
new PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
5
Data cleaning and inference:
6
* Drop duplicates and entries with missing critical data.
7
* Determine most plausible capacity from multiple values given in MaStR data.
8
* Drop generators which don't have any plausible capacity data
9
  (23.5MW > P > 0.1).
10
* Randomly and weighted add a start-up date if it is missing.
11
* Extract zip and municipality from 'site' given in MaStR data.
12
* Geocode unique zip and municipality combinations with Nominatim (1 sec
13
  delay). Drop generators for which geocoding failed or which are located
14
  outside the municipalities of Germany.
15
* Add some visual sanity checks for cleaned data.
16
17
Allocation of MaStR data:
18
* Allocate each generator to an existing building from OSM.
19
* Determine the quantile each generator and building is in depending on the
20
  capacity of the generator and the area of the polygon of the building.
21
* Randomly distribute generators within each municipality preferably within
22
  the same building area quantile as the generators are capacity wise.
23
* If not enough buildings exists within a municipality and quantile additional
24
  buildings from other quantiles are chosen randomly.
25
26
Desegregation of pv rooftop scenarios:
27
* The scenario data per federal state is linearly distributed to the mv grid
28
  districts according to the pv rooftop potential per mv grid district.
29
* The rooftop potential is estimated from the building area given from the OSM
30
  buildings.
31
* Grid districts, which are located in several federal states, are allocated
32
  PV capacity according to their respective roof potential in the individual
33
  federal states.
34
* The desegregation of PV plants within a grid districts respects existing
35
  plants from MaStR, which did not reach their end of life.
36
* New PV plants are randomly and weighted generated using a breakdown of MaStR
37
  data as generator basis.
38
* Plant metadata (e.g. plant orientation) is also added random and weighted
39
  from MaStR data as basis.
40
"""
41
from __future__ import annotations
42
43
from collections import Counter
44
from functools import wraps
45
from time import perf_counter
46
47
from geoalchemy2 import Geometry
48
from loguru import logger
49
from numpy.random import RandomState, default_rng
50
from pyproj.crs.crs import CRS
51
from sqlalchemy import BigInteger, Column, Float, Integer, String
52
from sqlalchemy.dialects.postgresql import HSTORE
53
from sqlalchemy.ext.declarative import declarative_base
54
import geopandas as gpd
55
import numpy as np
56
import pandas as pd
57
58
from egon.data import config, db
59
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
60
    OsmBuildingsSynthetic,
61
)
62
from egon.data.datasets.power_plants.mastr_db_classes import EgonPowerPlantsPv
63
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
64
from egon.data.datasets.zensus_vg250 import Vg250Gem
65
66
engine = db.engine()
67
Base = declarative_base()
68
SEED = int(config.settings()["egon-data"]["--random-seed"])
69
70
# TODO: move to yml
71
MASTR_INDEX_COL = "gens_id"
72
73
EPSG = 4326
74
SRID = 3035
75
76
# data cleaning
77
MAX_REALISTIC_PV_CAP = 23500 / 10**3
78
MIN_REALISTIC_PV_CAP = 0.1 / 10**3
79
80
# show additional logging information
81
VERBOSE = False
82
83
# Number of quantiles
84
Q = 5
85
86
# Scenario Data
87
SCENARIOS = ["eGon2035", "eGon100RE"]
88
SCENARIO_TIMESTAMP = {
89
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
90
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
91
}
92
PV_ROOFTOP_LIFETIME = pd.Timedelta(20 * 365, unit="D")
93
94
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
95
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
96
MODUL_CAP = 0.4 / 10**3  # MWp
97
MODUL_SIZE = 1.096 * 1.754  # m²
98
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
99
100
# Estimation of usable roof area
101
# Factor for the conversion of building area to roof area
102
# estimation mean roof pitch: 35°
103
# estimation usable roof share: 80%
104
# estimation that only the south side of the building is used for pv
105
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
106
# AREA_FACTOR = 1.221
107
# USABLE_ROOF_SHARE = 0.8
108
# SOUTH_SHARE = 0.5
109
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
110
ROOF_FACTOR = 0.5
111
112
CAP_RANGES = [
113
    (0, 30 / 10**3),
114
    (30 / 10**3, 100 / 10**3),
115
    (100 / 10**3, float("inf")),
116
]
117
118
MIN_BUILDING_SIZE = 10.0
119
UPPER_QUANTILE = 0.95
120
LOWER_QUANTILE = 0.05
121
122
COLS_TO_EXPORT = [
123
    "scenario",
124
    "bus_id",
125
    "building_id",
126
    "gens_id",
127
    "capacity",
128
    "orientation_uniform",
129
    "orientation_primary",
130
    "orientation_primary_angle",
131
    "voltage_level",
132
    "weather_cell_id",
133
]
134
135
# TODO
136
INCLUDE_SYNTHETIC_BUILDINGS = True
137
ONLY_BUILDINGS_WITH_DEMAND = True
138
TEST_RUN = False
139
140
141
def timer_func(func):
142
    @wraps(func)
143
    def timeit_wrapper(*args, **kwargs):
144
        start_time = perf_counter()
145
        result = func(*args, **kwargs)
146
        end_time = perf_counter()
147
        total_time = end_time - start_time
148
        logger.debug(
149
            f"Function {func.__name__} took {total_time:.4f} seconds."
150
        )
151
        return result
152
153
    return timeit_wrapper
154
155
156
@timer_func
157
def mastr_data(
158
    index_col: str | int | list[str] | list[int],
159
) -> gpd.GeoDataFrame:
160
    """
161
    Read MaStR data from database.
162
163
    Parameters
164
    -----------
165
    index_col : str, int or list of str or int
166
        Column(s) to use as the row labels of the DataFrame.
167
    Returns
168
    -------
169
    pandas.DataFrame
170
        DataFrame containing MaStR data.
171
    """
172
    with db.session_scope() as session:
173
        query = session.query(EgonPowerPlantsPv).filter(
174
            EgonPowerPlantsPv.status == "InBetrieb",
175
            EgonPowerPlantsPv.site_type
176
            == ("Bauliche Anlagen (Hausdach, Gebäude und Fassade)"),
177
        )
178
179
    gdf = gpd.read_postgis(
180
        query.statement, query.session.bind, index_col=index_col
181
    ).drop(columns="id")
182
183
    logger.debug("MaStR data loaded.")
184
185
    return gdf
186
187
188
@timer_func
189
def clean_mastr_data(
190
    mastr_gdf: gpd.GeoDataFrame,
191
    max_realistic_pv_cap: int | float,
192
    min_realistic_pv_cap: int | float,
193
    seed: int,
194
) -> gpd.GeoDataFrame:
195
    """
196
    Clean the MaStR data from implausible data.
197
198
    * Drop MaStR ID duplicates.
199
    * Drop generators with implausible capacities.
200
201
    Parameters
202
    -----------
203
    mastr_gdf : pandas.DataFrame
204
        DataFrame containing MaStR data.
205
    max_realistic_pv_cap : int or float
206
        Maximum capacity, which is considered to be realistic.
207
    min_realistic_pv_cap : int or float
208
        Minimum capacity, which is considered to be realistic.
209
    seed : int
210
        Seed to use for random operations with NumPy and pandas.
211
    Returns
212
    -------
213
    pandas.DataFrame
214
        DataFrame containing cleaned MaStR data.
215
    """
216
    init_len = len(mastr_gdf)
217
218
    # drop duplicates
219
    mastr_gdf = mastr_gdf.loc[~mastr_gdf.index.duplicated()]
220
221
    # drop generators without any capacity info
222
    # and capacity of zero
223
    # and if the capacity is > 23.5 MW, because
224
    # Germanies largest rooftop PV is 23 MW
225
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
226
    mastr_gdf = mastr_gdf.loc[
227
        ~mastr_gdf.capacity.isna()
228
        & (mastr_gdf.capacity <= max_realistic_pv_cap)
229
        & (mastr_gdf.capacity > min_realistic_pv_cap)
230
    ]
231
232
    # get consistent start-up date
233
    # randomly and weighted fill missing start-up dates
234
    pool = mastr_gdf.loc[
235
        ~mastr_gdf.commissioning_date.isna()
236
    ].commissioning_date.to_numpy()
237
238
    size = len(mastr_gdf) - len(pool)
239
240
    if size > 0:
241
        rng = default_rng(seed=seed)
242
243
        choice = rng.choice(
244
            pool,
245
            size=size,
246
            replace=False,
247
        )
248
249
        mastr_gdf.loc[mastr_gdf.commissioning_date.isna()] = mastr_gdf.loc[
250
            mastr_gdf.commissioning_date.isna()
251
        ].assign(commissioning_date=choice)
252
253
        logger.info(
254
            f"Randomly and weigthed added start-up date to {size} generators."
255
        )
256
257
    mastr_gdf = mastr_gdf.assign(
258
        commissioning_date=pd.to_datetime(
259
            mastr_gdf.commissioning_date, utc=True
260
        )
261
    )
262
263
    end_len = len(mastr_gdf)
264
    logger.debug(
265
        f"Dropped {init_len - end_len} "
266
        f"({((init_len - end_len) / init_len) * 100:g}%)"
267
        f" of {init_len} rows from MaStR DataFrame."
268
    )
269
270
    return mastr_gdf
271
272
273
@timer_func
274
def municipality_data() -> gpd.GeoDataFrame:
275
    """
276
    Get municipality data from eGo^n Database.
277
    Returns
278
    -------
279
    gepandas.GeoDataFrame
280
        GeoDataFrame with municipality data.
281
    """
282
    with db.session_scope() as session:
283
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
284
285
    return gpd.read_postgis(
286
        query.statement, query.session.bind, index_col="ags"
287
    )
288
289
290
@timer_func
291
def add_ags_to_gens(
292
    mastr_gdf: gpd.GeoDataFrame,
293
    municipalities_gdf: gpd.GeoDataFrame,
294
) -> gpd.GeoDataFrame:
295
    """
296
    Add information about AGS ID to generators.
297
    Parameters
298
    -----------
299
    mastr_gdf : geopandas.GeoDataFrame
300
        GeoDataFrame with valid and cleaned MaStR data.
301
    municipalities_gdf : geopandas.GeoDataFrame
302
        GeoDataFrame with municipality data.
303
    Returns
304
    -------
305
    gepandas.GeoDataFrame
306
        GeoDataFrame with valid and cleaned MaStR data
307
        with AGS ID added.
308
    """
309
    return mastr_gdf.sjoin(
310
        municipalities_gdf,
311
        how="left",
312
        predicate="intersects",
313
    ).rename(columns={"index_right": "ags"})
314
315
316
def drop_gens_outside_muns(
317
    mastr_gdf: gpd.GeoDataFrame,
318
) -> gpd.GeoDataFrame:
319
    """
320
    Drop all generators outside of municipalities.
321
    Parameters
322
    -----------
323
    mastr_gdf : geopandas.GeoDataFrame
324
        GeoDataFrame with valid and cleaned MaStR data.
325
    Returns
326
    -------
327
    gepandas.GeoDataFrame
328
        GeoDataFrame with valid and cleaned MaStR data
329
        with generatos without an AGS ID dropped.
330
    """
331
    gdf = mastr_gdf.loc[~mastr_gdf.ags.isna()]
332
333
    logger.debug(
334
        f"{len(mastr_gdf) - len(gdf)} ("
335
        f"{(len(mastr_gdf) - len(gdf)) / len(mastr_gdf) * 100:g}%)"
336
        f" of {len(mastr_gdf)} values are outside of the municipalities"
337
        " and are therefore dropped."
338
    )
339
340
    return gdf
341
342
343
def load_mastr_data():
344
    """Read PV rooftop data from MaStR CSV
345
    Note: the source will be replaced as soon as the MaStR data is available
346
    in DB.
347
    Returns
348
    -------
349
    geopandas.GeoDataFrame
350
        GeoDataFrame containing MaStR data with geocoded locations.
351
    """
352
    mastr_gdf = mastr_data(
353
        MASTR_INDEX_COL,
354
    )
355
356
    clean_mastr_gdf = clean_mastr_data(
357
        mastr_gdf,
358
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
359
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
360
        seed=SEED,
361
    )
362
363
    municipalities_gdf = municipality_data()
364
365
    clean_mastr_gdf = add_ags_to_gens(clean_mastr_gdf, municipalities_gdf)
366
367
    return drop_gens_outside_muns(clean_mastr_gdf)
368
369
370
class OsmBuildingsFiltered(Base):
371
    __tablename__ = "osm_buildings_filtered"
372
    __table_args__ = {"schema": "openstreetmap"}
373
374
    osm_id = Column(BigInteger)
375
    amenity = Column(String)
376
    building = Column(String)
377
    name = Column(String)
378
    geom = Column(Geometry(srid=SRID), index=True)
379
    area = Column(Float)
380
    geom_point = Column(Geometry(srid=SRID), index=True)
381
    tags = Column(HSTORE)
382
    id = Column(BigInteger, primary_key=True, index=True)
383
384
385
@timer_func
386
def osm_buildings(
387
    to_crs: CRS,
388
) -> gpd.GeoDataFrame:
389
    """
390
    Read OSM buildings data from eGo^n Database.
391
    Parameters
392
    -----------
393
    to_crs : pyproj.crs.crs.CRS
394
        CRS to transform geometries to.
395
    Returns
396
    -------
397
    geopandas.GeoDataFrame
398
        GeoDataFrame containing OSM buildings data.
399
    """
400
    with db.session_scope() as session:
401
        query = session.query(
402
            OsmBuildingsFiltered.id,
403
            OsmBuildingsFiltered.area,
404
            OsmBuildingsFiltered.geom_point.label("geom"),
405
        )
406
407
    return gpd.read_postgis(
408
        query.statement, query.session.bind, index_col="id"
409
    ).to_crs(to_crs)
410
411
412
@timer_func
413
def synthetic_buildings(
414
    to_crs: CRS,
415
) -> gpd.GeoDataFrame:
416
    """
417
    Read synthetic buildings data from eGo^n Database.
418
    Parameters
419
    -----------
420
    to_crs : pyproj.crs.crs.CRS
421
        CRS to transform geometries to.
422
    Returns
423
    -------
424
    geopandas.GeoDataFrame
425
        GeoDataFrame containing OSM buildings data.
426
    """
427
    with db.session_scope() as session:
428
        query = session.query(
429
            OsmBuildingsSynthetic.id,
430
            OsmBuildingsSynthetic.area,
431
            OsmBuildingsSynthetic.geom_point.label("geom"),
432
        )
433
434
    return gpd.read_postgis(
435
        query.statement, query.session.bind, index_col="id"
436
    ).to_crs(to_crs)
437
438
439
@timer_func
440
def add_ags_to_buildings(
441
    buildings_gdf: gpd.GeoDataFrame,
442
    municipalities_gdf: gpd.GeoDataFrame,
443
) -> gpd.GeoDataFrame:
444
    """
445
    Add information about AGS ID to buildings.
446
    Parameters
447
    -----------
448
    buildings_gdf : geopandas.GeoDataFrame
449
        GeoDataFrame containing OSM buildings data.
450
    municipalities_gdf : geopandas.GeoDataFrame
451
        GeoDataFrame with municipality data.
452
    Returns
453
    -------
454
    gepandas.GeoDataFrame
455
        GeoDataFrame containing OSM buildings data
456
        with AGS ID added.
457
    """
458
    return buildings_gdf.sjoin(
459
        municipalities_gdf,
460
        how="left",
461
        predicate="intersects",
462
    ).rename(columns={"index_right": "ags"})
463
464
465
def drop_buildings_outside_muns(
466
    buildings_gdf: gpd.GeoDataFrame,
467
) -> gpd.GeoDataFrame:
468
    """
469
    Drop all buildings outside of municipalities.
470
    Parameters
471
    -----------
472
    buildings_gdf : geopandas.GeoDataFrame
473
        GeoDataFrame containing OSM buildings data.
474
    Returns
475
    -------
476
    gepandas.GeoDataFrame
477
        GeoDataFrame containing OSM buildings data
478
        with buildings without an AGS ID dropped.
479
    """
480
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
481
482
    logger.debug(
483
        f"{len(buildings_gdf) - len(gdf)} "
484
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
485
        f"of {len(buildings_gdf)} values are outside of the municipalities "
486
        "and are therefore dropped."
487
    )
488
489
    return gdf
490
491
492
def egon_building_peak_loads():
493
    sql = """
494
    SELECT building_id
495
    FROM demand.egon_building_electricity_peak_loads
496
    WHERE scenario = 'eGon2035'
497
    """
498
499
    return (
500
        db.select_dataframe(sql).building_id.astype(int).sort_values().unique()
501
    )
502
503
504
@timer_func
505
def load_building_data():
506
    """
507
    Read buildings from DB
508
    Tables:
509
510
    * `openstreetmap.osm_buildings_filtered` (from OSM)
511
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
512
513
    Use column `id` for both as it is unique hence you concat both datasets.
514
    If INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be
515
    loaded.
516
517
    Returns
518
    -------
519
    gepandas.GeoDataFrame
520
        GeoDataFrame containing OSM buildings data with buildings without an
521
        AGS ID dropped.
522
    """
523
524
    municipalities_gdf = municipality_data()
525
526
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
527
528
    if INCLUDE_SYNTHETIC_BUILDINGS:
529
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
530
531
        buildings_gdf = gpd.GeoDataFrame(
532
            pd.concat(
533
                [
534
                    osm_buildings_gdf,
535
                    synthetic_buildings_gdf,
536
                ]
537
            ),
538
            geometry="geom",
539
            crs=osm_buildings_gdf.crs,
540
        ).rename(columns={"area": "building_area"})
541
542
        buildings_gdf.index = buildings_gdf.index.astype(int)
543
544
    else:
545
        buildings_gdf = osm_buildings_gdf.rename(
546
            columns={"area": "building_area"}
547
        )
548
549
    if ONLY_BUILDINGS_WITH_DEMAND:
550
        building_ids = egon_building_peak_loads()
551
552
        init_len = len(building_ids)
553
554
        building_ids = np.intersect1d(
555
            list(map(int, building_ids)),
556
            list(map(int, buildings_gdf.index.to_numpy())),
557
        )
558
559
        end_len = len(building_ids)
560
561
        logger.debug(
562
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) "
563
            f"of buildings have peak load."
564
        )
565
566
        buildings_gdf = buildings_gdf.loc[building_ids]
567
568
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
569
570
    buildings_ags_gdf = drop_buildings_outside_muns(buildings_ags_gdf)
571
572
    grid_districts_gdf = grid_districts(EPSG)
573
574
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
575
576
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
577
        grid_districts_gdf,
578
        federal_state_gdf,
579
    )
580
581
    buildings_overlay_gdf = add_overlay_id_to_buildings(
582
        buildings_ags_gdf,
583
        grid_federal_state_gdf,
584
    )
585
586
    logger.debug("Loaded buildings.")
587
588
    buildings_overlay_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
589
590
    # overwrite bus_id with data from new table
591
    sql = (
592
        "SELECT building_id, bus_id FROM "
593
        "boundaries.egon_map_zensus_mvgd_buildings"
594
    )
595
    map_building_bus_df = db.select_dataframe(sql)
596
597
    building_ids = np.intersect1d(
598
        list(map(int, map_building_bus_df.building_id.unique())),
599
        list(map(int, buildings_overlay_gdf.index.to_numpy())),
600
    )
601
602
    buildings_within_gdf = buildings_overlay_gdf.loc[building_ids]
603
604
    gdf = (
605
        buildings_within_gdf.reset_index()
606
        .drop(columns=["bus_id"])
607
        .merge(
608
            how="left",
609
            right=map_building_bus_df,
610
            left_on="id",
611
            right_on="building_id",
612
        )
613
        .drop(columns=["building_id"])
614
        .set_index("id")
615
        .sort_index()
616
    )
617
618
    return gdf[~gdf.index.duplicated(keep="first")]
619
620
621
@timer_func
622
def sort_and_qcut_df(
623
    df: pd.DataFrame | gpd.GeoDataFrame,
624
    col: str,
625
    q: int,
626
) -> pd.DataFrame | gpd.GeoDataFrame:
627
    """
628
    Determine the quantile of a given attribute in a (Geo)DataFrame.
629
    Sort the (Geo)DataFrame in ascending order for the given attribute.
630
    Parameters
631
    -----------
632
    df : pandas.DataFrame or geopandas.GeoDataFrame
633
        (Geo)DataFrame to sort and qcut.
634
    col : str
635
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
636
    q : int
637
        Number of quantiles.
638
    Returns
639
    -------
640
    pandas.DataFrame or gepandas.GeoDataFrame
641
        Sorted and qcut (Geo)DataFrame.
642
    """
643
    df = df.sort_values(col, ascending=True)
644
645
    return df.assign(
646
        quant=pd.qcut(
647
            df[col],
648
            q=q,
649
            labels=range(q),
650
        )
651
    )
652
653
654
@timer_func
655
def allocate_pv(
656
    q_mastr_gdf: gpd.GeoDataFrame,
657
    q_buildings_gdf: gpd.GeoDataFrame,
658
    seed: int,
659
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
660
    """
661
    Allocate the MaStR pv generators to the OSM buildings.
662
    This will determine a building for each pv generator if there are more
663
    buildings than generators within a given AGS. Primarily generators are
664
    distributed with the same qunatile as the buildings. Multiple assignment
665
    is excluded.
666
    Parameters
667
    -----------
668
    q_mastr_gdf : geopandas.GeoDataFrame
669
        GeoDataFrame containing geocoded and qcut MaStR data.
670
    q_buildings_gdf : geopandas.GeoDataFrame
671
        GeoDataFrame containing qcut OSM buildings data.
672
    seed : int
673
        Seed to use for random operations with NumPy and pandas.
674
    Returns
675
    -------
676
    tuple with two geopandas.GeoDataFrame s
677
        GeoDataFrame containing MaStR data allocated to building IDs.
678
        GeoDataFrame containing building data allocated to MaStR IDs.
679
    """
680
    rng = default_rng(seed=seed)
681
682
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan).sort_values(
683
        by=["ags", "quant"]
684
    )
685
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan).sort_values(
686
        by=["ags", "quant"]
687
    )
688
689
    ags_list = q_buildings_gdf.ags.unique()
690
691
    if TEST_RUN:
692
        ags_list = ags_list[:250]
693
694
    num_ags = len(ags_list)
695
696
    t0 = perf_counter()
697
698
    for count, ags in enumerate(ags_list):
699
700
        buildings = q_buildings_gdf.loc[q_buildings_gdf.ags == ags]
701
        gens = q_mastr_gdf.loc[q_mastr_gdf.ags == ags]
702
703
        len_build = len(buildings)
704
        len_gens = len(gens)
705
706
        if len_build < len_gens:
707
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
708
            logger.error(
709
                f"There are {len_gens} generators and only {len_build}"
710
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
711
                "generators were truncated to match the amount of buildings."
712
            )
713
714
            assert len_build == len(gens)
715
716
        for quant in gens.quant.unique():
717
            q_buildings = buildings.loc[buildings.quant == quant]
718
            q_gens = gens.loc[gens.quant == quant]
719
720
            len_build = len(q_buildings)
721
            len_gens = len(q_gens)
722
723
            if len_build < len_gens:
724
                delta = len_gens - len_build
725
726
                logger.warning(
727
                    f"There are {len_gens} generators and only {len_build} "
728
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
729
                    f"buildings from AGS {ags} will be added randomly."
730
                )
731
732
                add_buildings = pd.Index(
733
                    rng.choice(
734
                        list(set(buildings.index) - set(q_buildings.index)),
735
                        size=delta,
736
                        replace=False,
737
                    )
738
                )
739
740
                chosen_buildings = q_buildings.index.append(add_buildings)
741
742
            else:
743
                chosen_buildings = rng.choice(
744
                    q_buildings.index,
745
                    size=len_gens,
746
                    replace=False,
747
                )
748
749
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
750
            buildings = buildings.drop(chosen_buildings)
751
752
        if count % 500 == 0:
753
            logger.debug(
754
                f"Allocation of {count / num_ags * 100:g} % of AGS done. "
755
                f"It took {perf_counter() - t0:g} seconds."
756
            )
757
758
            t0 = perf_counter()
759
760
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
761
762
    assert len(assigned_buildings) == len(assigned_buildings.gens_id.unique())
763
764
    q_mastr_gdf.loc[
765
        assigned_buildings.gens_id, "building_id"
766
    ] = assigned_buildings.index
767
768
    assigned_gens = q_mastr_gdf.loc[~q_mastr_gdf.building_id.isna()]
769
770
    assert len(assigned_buildings) == len(assigned_gens)
771
772
    logger.debug("Allocated status quo generators to buildings.")
773
774
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
775
776
777
def frame_to_numeric(
778
    df: pd.DataFrame | gpd.GeoDataFrame,
779
) -> pd.DataFrame | gpd.GeoDataFrame:
780
    """
781
    Try to convert all columns of a DataFrame to numeric ignoring errors.
782
    Parameters
783
    ----------
784
    df : pandas.DataFrame or geopandas.GeoDataFrame
785
    Returns
786
    -------
787
    pandas.DataFrame or geopandas.GeoDataFrame
788
    """
789
    if str(df.index.dtype) == "object":
790
        df.index = pd.to_numeric(df.index, errors="ignore")
791
792
    for col in df.columns:
793
        if str(df[col].dtype) == "object":
794
            df[col] = pd.to_numeric(df[col], errors="ignore")
795
796
    return df
797
798
799
def validate_output(
800
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
801
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
802
) -> None:
803
    """
804
    Validate output.
805
806
    * Validate that there are exactly as many buildings with a pv system as
807
      there are pv systems with a building
808
    * Validate that the building IDs with a pv system are the same building
809
      IDs as assigned to the pv systems
810
    * Validate that the pv system IDs with a building are the same pv system
811
      IDs as assigned to the buildings
812
813
    Parameters
814
    -----------
815
    desagg_mastr_gdf : geopandas.GeoDataFrame
816
        GeoDataFrame containing MaStR data allocated to building IDs.
817
    desagg_buildings_gdf : geopandas.GeoDataFrame
818
        GeoDataFrame containing building data allocated to MaStR IDs.
819
    """
820
    assert len(
821
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
822
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
823
    assert (
824
        np.sort(
825
            desagg_mastr_gdf.loc[
826
                ~desagg_mastr_gdf.building_id.isna()
827
            ].building_id.unique()
828
        )
829
        == np.sort(
830
            desagg_buildings_gdf.loc[
831
                ~desagg_buildings_gdf.gens_id.isna()
832
            ].index.unique()
833
        )
834
    ).all()
835
    assert (
836
        np.sort(
837
            desagg_mastr_gdf.loc[
838
                ~desagg_mastr_gdf.building_id.isna()
839
            ].index.unique()
840
        )
841
        == np.sort(
842
            desagg_buildings_gdf.loc[
843
                ~desagg_buildings_gdf.gens_id.isna()
844
            ].gens_id.unique()
845
        )
846
    ).all()
847
848
    logger.debug("Validated output.")
849
850
851
def drop_unallocated_gens(
852
    gdf: gpd.GeoDataFrame,
853
) -> gpd.GeoDataFrame:
854
    """
855
    Drop generators which did not get allocated.
856
857
    Parameters
858
    -----------
859
    gdf : geopandas.GeoDataFrame
860
        GeoDataFrame containing MaStR data allocated to building IDs.
861
    Returns
862
    -------
863
    geopandas.GeoDataFrame
864
        GeoDataFrame containing MaStR data with generators dropped which did
865
        not get allocated.
866
    """
867
    init_len = len(gdf)
868
    gdf = gdf.loc[~gdf.building_id.isna()]
869
    end_len = len(gdf)
870
871
    logger.debug(
872
        f"Dropped {init_len - end_len} "
873
        f"({((init_len - end_len) / init_len) * 100:g}%)"
874
        f" of {init_len} unallocated rows from MaStR DataFrame."
875
    )
876
877
    return gdf
878
879
880
@timer_func
881
def allocate_to_buildings(
882
    mastr_gdf: gpd.GeoDataFrame,
883
    buildings_gdf: gpd.GeoDataFrame,
884
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
885
    """
886
    Allocate status quo pv rooftop generators to buildings.
887
    Parameters
888
    -----------
889
    mastr_gdf : geopandas.GeoDataFrame
890
        GeoDataFrame containing MaStR data with geocoded locations.
891
    buildings_gdf : geopandas.GeoDataFrame
892
        GeoDataFrame containing OSM buildings data with buildings without an
893
        AGS ID dropped.
894
    Returns
895
    -------
896
    tuple with two geopandas.GeoDataFrame s
897
        GeoDataFrame containing MaStR data allocated to building IDs.
898
        GeoDataFrame containing building data allocated to MaStR IDs.
899
    """
900
    logger.debug("Starting allocation of status quo.")
901
902
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
903
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
904
905
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
906
        q_mastr_gdf, q_buildings_gdf, SEED
907
    )
908
909
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
910
911
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
912
913
914
@timer_func
915
def grid_districts(
916
    epsg: int,
917
) -> gpd.GeoDataFrame:
918
    """
919
    Load mv grid district geo data from eGo^n Database as
920
    geopandas.GeoDataFrame.
921
    Parameters
922
    -----------
923
    epsg : int
924
        EPSG ID to use as CRS.
925
    Returns
926
    -------
927
    geopandas.GeoDataFrame
928
        GeoDataFrame containing mv grid district ID and geo shapes data.
929
    """
930
    gdf = db.select_geodataframe(
931
        """
932
        SELECT bus_id, geom
933
        FROM grid.egon_mv_grid_district
934
        ORDER BY bus_id
935
        """,
936
        index_col="bus_id",
937
        geom_col="geom",
938
        epsg=epsg,
939
    )
940
941
    gdf.index = gdf.index.astype(int)
942
943
    logger.debug("Grid districts loaded.")
944
945
    return gdf
946
947
948
def scenario_data(
949
    carrier: str = "solar_rooftop",
950
    scenario: str = "eGon2035",
951
) -> pd.DataFrame:
952
    """
953
    Get scenario capacity data from eGo^n Database.
954
    Parameters
955
    -----------
956
    carrier : str
957
        Carrier type to filter table by.
958
    scenario : str
959
        Scenario to filter table by.
960
    Returns
961
    -------
962
    geopandas.GeoDataFrame
963
        GeoDataFrame with scenario capacity data in GW.
964
    """
965
    with db.session_scope() as session:
966
        query = session.query(EgonScenarioCapacities).filter(
967
            EgonScenarioCapacities.carrier == carrier,
968
            EgonScenarioCapacities.scenario_name == scenario,
969
        )
970
971
    df = pd.read_sql(
972
        query.statement, query.session.bind, index_col="index"
973
    ).sort_index()
974
975
    logger.debug("Scenario capacity data loaded.")
976
977
    return df
978
979
980 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
981
    __tablename__ = "vg250_lan"
982
    __table_args__ = {"schema": "boundaries"}
983
984
    id = Column(BigInteger, primary_key=True, index=True)
985
    ade = Column(BigInteger)
986
    gf = Column(BigInteger)
987
    bsg = Column(BigInteger)
988
    ars = Column(String)
989
    ags = Column(String)
990
    sdv_ars = Column(String)
991
    gen = Column(String)
992
    bez = Column(String)
993
    ibz = Column(BigInteger)
994
    bem = Column(String)
995
    nbd = Column(String)
996
    sn_l = Column(String)
997
    sn_r = Column(String)
998
    sn_k = Column(String)
999
    sn_v1 = Column(String)
1000
    sn_v2 = Column(String)
1001
    sn_g = Column(String)
1002
    fk_s3 = Column(String)
1003
    nuts = Column(String)
1004
    ars_0 = Column(String)
1005
    ags_0 = Column(String)
1006
    wsk = Column(String)
1007
    debkg_id = Column(String)
1008
    rs = Column(String)
1009
    sdv_rs = Column(String)
1010
    rs_0 = Column(String)
1011
    geometry = Column(Geometry(srid=EPSG), index=True)
1012
1013
1014
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1015
    """
1016
    Get feder state data from eGo^n Database.
1017
    Parameters
1018
    -----------
1019
    to_crs : pyproj.crs.crs.CRS
1020
        CRS to transform geometries to.
1021
    Returns
1022
    -------
1023
    geopandas.GeoDataFrame
1024
        GeoDataFrame with federal state data.
1025
    """
1026
    with db.session_scope() as session:
1027
        query = session.query(
1028
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1029
        )
1030
1031
        gdf = gpd.read_postgis(
1032
            query.statement, session.connection(), index_col="id"
1033
        ).to_crs(to_crs)
1034
1035
    logger.debug("Federal State data loaded.")
1036
1037
    return gdf
1038
1039
1040
@timer_func
1041
def overlay_grid_districts_with_counties(
1042
    mv_grid_district_gdf: gpd.GeoDataFrame,
1043
    federal_state_gdf: gpd.GeoDataFrame,
1044
) -> gpd.GeoDataFrame:
1045
    """
1046
    Calculate the intersections of mv grid districts and counties.
1047
    Parameters
1048
    -----------
1049
    mv_grid_district_gdf : gpd.GeoDataFrame
1050
        GeoDataFrame containing mv grid district ID and geo shapes data.
1051
    federal_state_gdf : gpd.GeoDataFrame
1052
        GeoDataFrame with federal state data.
1053
    Returns
1054
    -------
1055
    geopandas.GeoDataFrame
1056
        GeoDataFrame containing OSM buildings data.
1057
    """
1058
    logger.debug(
1059
        "Calculating intersection overlay between mv grid districts and "
1060
        "counties. This may take a while..."
1061
    )
1062
1063
    gdf = gpd.overlay(
1064
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1065
        mv_grid_district_gdf.reset_index(),
1066
        how="intersection",
1067
        keep_geom_type=True,
1068
    )
1069
1070
    logger.debug("Done!")
1071
1072
    return gdf
1073
1074
1075
@timer_func
1076
def add_overlay_id_to_buildings(
1077
    buildings_gdf: gpd.GeoDataFrame,
1078
    grid_federal_state_gdf: gpd.GeoDataFrame,
1079
) -> gpd.GeoDataFrame:
1080
    """
1081
    Add information about overlay ID to buildings.
1082
    Parameters
1083
    -----------
1084
    buildings_gdf : geopandas.GeoDataFrame
1085
        GeoDataFrame containing OSM buildings data.
1086
    grid_federal_state_gdf : geopandas.GeoDataFrame
1087
        GeoDataFrame with intersection shapes between counties and grid
1088
        districts.
1089
    Returns
1090
    -------
1091
    geopandas.GeoDataFrame
1092
        GeoDataFrame containing OSM buildings data with overlay ID added.
1093
    """
1094
    gdf = (
1095
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1096
        .sjoin(
1097
            grid_federal_state_gdf,
1098
            how="left",
1099
            predicate="intersects",
1100
        )
1101
        .rename(columns={"index_right": "overlay_id"})
1102
    )
1103
1104
    logger.debug("Added overlay ID to OSM buildings.")
1105
1106
    return gdf
1107
1108
1109
def drop_buildings_outside_grids(
1110
    buildings_gdf: gpd.GeoDataFrame,
1111
) -> gpd.GeoDataFrame:
1112
    """
1113
    Drop all buildings outside of grid areas.
1114
    Parameters
1115
    -----------
1116
    buildings_gdf : geopandas.GeoDataFrame
1117
        GeoDataFrame containing OSM buildings data.
1118
    Returns
1119
    -------
1120
    gepandas.GeoDataFrame
1121
        GeoDataFrame containing OSM buildings data
1122
        with buildings without an bus ID dropped.
1123
    """
1124
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1125
1126
    logger.debug(
1127
        f"{len(buildings_gdf) - len(gdf)} "
1128
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1129
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1130
        "and are therefore dropped."
1131
    )
1132
1133
    return gdf
1134
1135
1136
def cap_per_bus_id(
1137
    scenario: str,
1138
) -> pd.DataFrame:
1139
    """
1140
    Get table with total pv rooftop capacity per grid district.
1141
1142
    Parameters
1143
    -----------
1144
    scenario : str
1145
        Scenario name.
1146
    Returns
1147
    -------
1148
    pandas.DataFrame
1149
        DataFrame with total rooftop capacity per mv grid.
1150
    """
1151
    targets = config.datasets()["solar_rooftop"]["targets"]
1152
1153
    sql = f"""
1154
    SELECT bus as bus_id, control, p_nom as capacity
1155
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1156
    WHERE carrier = 'solar_rooftop'
1157
    AND scn_name = '{scenario}'
1158
    """
1159
1160
    df = db.select_dataframe(sql, index_col="bus_id")
1161
1162
    return df.loc[df.control != "Slack"]
1163
1164
1165
def determine_end_of_life_gens(
1166
    mastr_gdf: gpd.GeoDataFrame,
1167
    scenario_timestamp: pd.Timestamp,
1168
    pv_rooftop_lifetime: pd.Timedelta,
1169
) -> gpd.GeoDataFrame:
1170
    """
1171
    Determine if an old PV system has reached its end of life.
1172
    Parameters
1173
    -----------
1174
    mastr_gdf : geopandas.GeoDataFrame
1175
        GeoDataFrame containing geocoded MaStR data.
1176
    scenario_timestamp : pandas.Timestamp
1177
        Timestamp at which the scenario takes place.
1178
    pv_rooftop_lifetime : pandas.Timedelta
1179
        Average expected lifetime of PV rooftop systems.
1180
    Returns
1181
    -------
1182
    geopandas.GeoDataFrame
1183
        GeoDataFrame containing geocoded MaStR data and info if the system
1184
        has reached its end of life.
1185
    """
1186
    before = mastr_gdf.capacity.sum()
1187
1188
    mastr_gdf = mastr_gdf.assign(
1189
        age=scenario_timestamp - mastr_gdf.commissioning_date
1190
    )
1191
1192
    mastr_gdf = mastr_gdf.assign(
1193
        end_of_life=pv_rooftop_lifetime < mastr_gdf.age
1194
    )
1195
1196
    after = mastr_gdf.loc[~mastr_gdf.end_of_life].capacity.sum()
1197
1198
    logger.debug(
1199
        f"Determined if pv rooftop systems reached their end of life.\nTotal "
1200
        f"capacity: {before}\nActive capacity: {after}"
1201
    )
1202
1203
    return mastr_gdf
1204
1205
1206
def calculate_max_pv_cap_per_building(
1207
    buildings_gdf: gpd.GeoDataFrame,
1208
    mastr_gdf: gpd.GeoDataFrame,
1209
    pv_cap_per_sq_m: float | int,
1210
    roof_factor: float | int,
1211
) -> gpd.GeoDataFrame:
1212
    """
1213
    Calculate the estimated maximum possible PV capacity per building.
1214
1215
    Parameters
1216
    -----------
1217
    buildings_gdf : geopandas.GeoDataFrame
1218
        GeoDataFrame containing OSM buildings data.
1219
    mastr_gdf : geopandas.GeoDataFrame
1220
        GeoDataFrame containing geocoded MaStR data.
1221
    pv_cap_per_sq_m : float, int
1222
        Average expected, installable PV capacity per square meter.
1223
    roof_factor : float, int
1224
        Average for PV usable roof area share.
1225
    Returns
1226
    -------
1227
    geopandas.GeoDataFrame
1228
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1229
        capacity.
1230
    """
1231
    gdf = (
1232
        buildings_gdf.reset_index()
1233
        .rename(columns={"index": "id"})
1234
        .merge(
1235
            mastr_gdf[
1236
                [
1237
                    "capacity",
1238
                    "end_of_life",
1239
                    "building_id",
1240
                    "orientation_uniform",
1241
                    "orientation_primary",
1242
                    "orientation_primary_angle",
1243
                ]
1244
            ],
1245
            how="left",
1246
            left_on="id",
1247
            right_on="building_id",
1248
        )
1249
        .set_index("id")
1250
        .drop(columns="building_id")
1251
    )
1252
1253
    return gdf.assign(
1254
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1255
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1256
        bus_id=gdf.bus_id.astype(int),
1257
    )
1258
1259
1260
def calculate_building_load_factor(
1261
    mastr_gdf: gpd.GeoDataFrame,
1262
    buildings_gdf: gpd.GeoDataFrame,
1263
    rounding: int = 4,
1264
) -> gpd.GeoDataFrame:
1265
    """
1266
    Calculate the roof load factor from existing PV systems.
1267
    Parameters
1268
    -----------
1269
    mastr_gdf : geopandas.GeoDataFrame
1270
        GeoDataFrame containing geocoded MaStR data.
1271
    buildings_gdf : geopandas.GeoDataFrame
1272
        GeoDataFrame containing OSM buildings data.
1273
    rounding : int
1274
        Rounding to use for load factor.
1275
    Returns
1276
    -------
1277
    geopandas.GeoDataFrame
1278
        GeoDataFrame containing geocoded MaStR data with calculated load
1279
        factor.
1280
    """
1281
    gdf = mastr_gdf.merge(
1282
        buildings_gdf[["max_cap", "building_area"]]
1283
        .loc[~buildings_gdf["max_cap"].isna()]
1284
        .reset_index(),
1285
        how="left",
1286
        left_on="building_id",
1287
        right_on="id",
1288
    ).set_index("id")
1289
1290
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1291
1292
1293
def get_probability_for_property(
1294
    mastr_gdf: gpd.GeoDataFrame,
1295
    cap_range: tuple[int | float, int | float],
1296
    prop: str,
1297
) -> tuple[np.array, np.array]:
1298
    """
1299
    Calculate the probability of the different options of a property of the
1300
    existing PV plants.
1301
    Parameters
1302
    -----------
1303
    mastr_gdf : geopandas.GeoDataFrame
1304
        GeoDataFrame containing geocoded MaStR data.
1305
    cap_range : tuple(int, int)
1306
        Capacity range of PV plants to look at.
1307
    prop : str
1308
        Property to calculate probabilities for. String needs to be in columns
1309
        of mastr_gdf.
1310
    Returns
1311
    -------
1312
    tuple
1313
        numpy.array
1314
            Unique values of property.
1315
        numpy.array
1316
            Probabilties per unique value.
1317
    """
1318
    cap_range_gdf = mastr_gdf.loc[
1319
        (mastr_gdf.capacity > cap_range[0])
1320
        & (mastr_gdf.capacity <= cap_range[1])
1321
    ]
1322
1323
    if prop == "load_factor":
1324
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1325
1326
    count = Counter(
1327
        cap_range_gdf[prop].loc[
1328
            ~cap_range_gdf[prop].isna()
1329
            & ~cap_range_gdf[prop].isnull()
1330
            & ~(cap_range_gdf[prop] == "None")
1331
        ]
1332
    )
1333
1334
    values = np.array(list(count.keys()))
1335
    probabilities = np.fromiter(count.values(), dtype=float)
1336
    probabilities = probabilities / np.sum(probabilities)
1337
1338
    return values, probabilities
1339
1340
1341
@timer_func
1342
def probabilities(
1343
    mastr_gdf: gpd.GeoDataFrame,
1344
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1345
    properties: list[str] | None = None,
1346
) -> dict:
1347
    """
1348
    Calculate the probability of the different options of properties of the
1349
    existing PV plants.
1350
    Parameters
1351
    -----------
1352
    mastr_gdf : geopandas.GeoDataFrame
1353
        GeoDataFrame containing geocoded MaStR data.
1354
    cap_ranges : list(tuple(int, int))
1355
        List of capacity ranges to distinguish between. The first tuple should
1356
        start with a zero and the last one should end with infinite.
1357
    properties : list(str)
1358
        List of properties to calculate probabilities for. Strings need to be
1359
        in columns of mastr_gdf.
1360
    Returns
1361
    -------
1362
    dict
1363
        Dictionary with values and probabilities per capacity range.
1364
    """
1365
    if cap_ranges is None:
1366
        cap_ranges = [
1367
            (0, 30 / 10**3),
1368
            (30 / 10**3, 100 / 10**3),
1369
            (100 / 10**3, float("inf")),
1370
        ]
1371
    if properties is None:
1372
        properties = [
1373
            "orientation_uniform",
1374
            "orientation_primary",
1375
            "orientation_primary_angle",
1376
            "load_factor",
1377
        ]
1378
1379
    prob_dict = {}
1380
1381
    for cap_range in cap_ranges:
1382
        prob_dict[cap_range] = {
1383
            "values": {},
1384
            "probabilities": {},
1385
        }
1386
1387
        for prop in properties:
1388
            v, p = get_probability_for_property(
1389
                mastr_gdf,
1390
                cap_range,
1391
                prop,
1392
            )
1393
1394
            prob_dict[cap_range]["values"][prop] = v
1395
            prob_dict[cap_range]["probabilities"][prop] = p
1396
1397
    return prob_dict
1398
1399
1400
def cap_share_per_cap_range(
1401
    mastr_gdf: gpd.GeoDataFrame,
1402
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1403
) -> dict[tuple[int | float, int | float], float]:
1404
    """
1405
    Calculate the share of PV capacity from the total PV capacity within
1406
    capacity ranges.
1407
1408
    Parameters
1409
    -----------
1410
    mastr_gdf : geopandas.GeoDataFrame
1411
        GeoDataFrame containing geocoded MaStR data.
1412
    cap_ranges : list(tuple(int, int))
1413
        List of capacity ranges to distinguish between. The first tuple should
1414
        start with a zero and the last one should end with infinite.
1415
    Returns
1416
    -------
1417
    dict
1418
        Dictionary with share of PV capacity from the total PV capacity within
1419
        capacity ranges.
1420
    """
1421
    if cap_ranges is None:
1422
        cap_ranges = [
1423
            (0, 30 / 10**3),
1424
            (30 / 10**3, 100 / 10**3),
1425
            (100 / 10**3, float("inf")),
1426
        ]
1427
1428
    cap_share_dict = {}
1429
1430
    total_cap = mastr_gdf.capacity.sum()
1431
1432
    for cap_range in cap_ranges:
1433
        cap_share = (
1434
            mastr_gdf.loc[
1435
                (mastr_gdf.capacity > cap_range[0])
1436
                & (mastr_gdf.capacity <= cap_range[1])
1437
            ].capacity.sum()
1438
            / total_cap
1439
        )
1440
1441
        cap_share_dict[cap_range] = cap_share
1442
1443
    return cap_share_dict
1444
1445
1446
def mean_load_factor_per_cap_range(
1447
    mastr_gdf: gpd.GeoDataFrame,
1448
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1449
) -> dict[tuple[int | float, int | float], float]:
1450
    """
1451
    Calculate the mean roof load factor per capacity range from existing PV
1452
    plants.
1453
    Parameters
1454
    -----------
1455
    mastr_gdf : geopandas.GeoDataFrame
1456
        GeoDataFrame containing geocoded MaStR data.
1457
    cap_ranges : list(tuple(int, int))
1458
        List of capacity ranges to distinguish between. The first tuple should
1459
        start with a zero and the last one should end with infinite.
1460
    Returns
1461
    -------
1462
    dict
1463
        Dictionary with mean roof load factor per capacity range.
1464
    """
1465
    if cap_ranges is None:
1466
        cap_ranges = [
1467
            (0, 30 / 10**3),
1468
            (30 / 10**3, 100 / 10**3),
1469
            (100 / 10**3, float("inf")),
1470
        ]
1471
1472
    load_factor_dict = {}
1473
1474
    for cap_range in cap_ranges:
1475
        load_factor = mastr_gdf.loc[
1476
            (mastr_gdf.load_factor <= 1)
1477
            & (mastr_gdf.capacity > cap_range[0])
1478
            & (mastr_gdf.capacity <= cap_range[1])
1479
        ].load_factor.mean()
1480
1481
        load_factor_dict[cap_range] = load_factor
1482
1483
    return load_factor_dict
1484
1485
1486
def building_area_range_per_cap_range(
1487
    mastr_gdf: gpd.GeoDataFrame,
1488
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1489
    min_building_size: int | float = 10.0,
1490
    upper_quantile: float = 0.95,
1491
    lower_quantile: float = 0.05,
1492
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1493
    """
1494
    Estimate normal building area range per capacity range.
1495
    Calculate the mean roof load factor per capacity range from existing PV
1496
    plants.
1497
    Parameters
1498
    -----------
1499
    mastr_gdf : geopandas.GeoDataFrame
1500
        GeoDataFrame containing geocoded MaStR data.
1501
    cap_ranges : list(tuple(int, int))
1502
        List of capacity ranges to distinguish between. The first tuple should
1503
        start with a zero and the last one should end with infinite.
1504
    min_building_size : int, float
1505
        Minimal building size to consider for PV plants.
1506
    upper_quantile : float
1507
        Upper quantile to estimate maximum building size per capacity range.
1508
    lower_quantile : float
1509
        Lower quantile to estimate minimum building size per capacity range.
1510
    Returns
1511
    -------
1512
    dict
1513
        Dictionary with estimated normal building area range per capacity
1514
        range.
1515
    """
1516
    if cap_ranges is None:
1517
        cap_ranges = [
1518
            (0, 30 / 10**3),
1519
            (30 / 10**3, 100 / 10**3),
1520
            (100 / 10**3, float("inf")),
1521
        ]
1522
1523
    building_area_range_dict = {}
1524
1525
    n_ranges = len(cap_ranges)
1526
1527
    for count, cap_range in enumerate(cap_ranges):
1528
        cap_range_gdf = mastr_gdf.loc[
1529
            (mastr_gdf.capacity > cap_range[0])
1530
            & (mastr_gdf.capacity <= cap_range[1])
1531
        ]
1532
1533
        if count == 0:
1534
            building_area_range_dict[cap_range] = (
1535
                min_building_size,
1536
                cap_range_gdf.building_area.quantile(upper_quantile),
1537
            )
1538
        elif count == n_ranges - 1:
1539
            building_area_range_dict[cap_range] = (
1540
                cap_range_gdf.building_area.quantile(lower_quantile),
1541
                float("inf"),
1542
            )
1543
        else:
1544
            building_area_range_dict[cap_range] = (
1545
                cap_range_gdf.building_area.quantile(lower_quantile),
1546
                cap_range_gdf.building_area.quantile(upper_quantile),
1547
            )
1548
1549
    values = list(building_area_range_dict.values())
1550
1551
    building_area_range_normed_dict = {}
1552
1553
    for count, (cap_range, (min_area, max_area)) in enumerate(
1554
        building_area_range_dict.items()
1555
    ):
1556
        if count == 0:
1557
            building_area_range_normed_dict[cap_range] = (
1558
                min_area,
1559
                np.mean((values[count + 1][0], max_area)),
1560
            )
1561
        elif count == n_ranges - 1:
1562
            building_area_range_normed_dict[cap_range] = (
1563
                np.mean((values[count - 1][1], min_area)),
1564
                max_area,
1565
            )
1566
        else:
1567
            building_area_range_normed_dict[cap_range] = (
1568
                np.mean((values[count - 1][1], min_area)),
1569
                np.mean((values[count + 1][0], max_area)),
1570
            )
1571
1572
    return building_area_range_normed_dict
1573
1574
1575
@timer_func
1576
def desaggregate_pv_in_mv_grid(
1577
    buildings_gdf: gpd.GeoDataFrame,
1578
    pv_cap: float | int,
1579
    **kwargs,
1580
) -> gpd.GeoDataFrame:
1581
    """
1582
    Desaggregate PV capacity on buildings within a given grid district.
1583
    Parameters
1584
    -----------
1585
    buildings_gdf : geopandas.GeoDataFrame
1586
        GeoDataFrame containing buildings within the grid district.
1587
    pv_cap : float, int
1588
        PV capacity to desaggregate.
1589
    Other Parameters
1590
    -----------
1591
    prob_dict : dict
1592
        Dictionary with values and probabilities per capacity range.
1593
    cap_share_dict : dict
1594
        Dictionary with share of PV capacity from the total PV capacity within
1595
        capacity ranges.
1596
    building_area_range_dict : dict
1597
        Dictionary with estimated normal building area range per capacity
1598
        range.
1599
    load_factor_dict : dict
1600
        Dictionary with mean roof load factor per capacity range.
1601
    seed : int
1602
        Seed to use for random operations with NumPy and pandas.
1603
    pv_cap_per_sq_m : float, int
1604
        Average expected, installable PV capacity per square meter.
1605
    Returns
1606
    -------
1607
    geopandas.GeoDataFrame
1608
        GeoDataFrame containing OSM building data with desaggregated PV
1609
        plants.
1610
    """
1611
    bus_id = int(buildings_gdf.bus_id.iat[0])
1612
1613
    rng = default_rng(seed=kwargs["seed"])
1614
    random_state = RandomState(seed=kwargs["seed"])
1615
1616
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
1617
1618
    for cap_range, share in kwargs["cap_share_dict"].items():
1619
        pv_cap_range = pv_cap * share
1620
1621
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
1622
1623
        cap_range_buildings_gdf = buildings_gdf.loc[
1624
            ~buildings_gdf.index.isin(results_df.index)
1625
            & (buildings_gdf.building_area > b_area_min)
1626
            & (buildings_gdf.building_area <= b_area_max)
1627
        ]
1628
1629
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
1630
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
1631
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
1632
            load_factor=np.nan,
1633
            capacity=np.nan,
1634
        )
1635
1636
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
1637
1638
        if total_mean_cap == 0:
1639
            logger.warning(
1640
                f"There are no matching roof for capacity range {cap_range} "
1641
                f"kW in grid {bus_id}. Using all buildings as fallback."
1642
            )
1643
1644
            cap_range_buildings_gdf = buildings_gdf.loc[
1645
                ~buildings_gdf.index.isin(results_df.index)
1646
            ]
1647
1648
            if len(cap_range_buildings_gdf) == 0:
1649
                logger.warning(
1650
                    "There are no roofes available for capacity range "
1651
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
1652
                )
1653
                cap_range_buildings_gdf = buildings_gdf.copy()
1654
1655
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
1656
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
1657
                load_factor=np.nan,
1658
                capacity=np.nan,
1659
            )
1660
1661
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
1662
1663
        elif total_mean_cap < pv_cap_range:
1664
            logger.warning(
1665
                f"Average roof utilization of the roof area in grid {bus_id} "
1666
                f"and capacity range {cap_range} kW is not sufficient. The "
1667
                "roof utilization will be above average."
1668
            )
1669
1670
        frac = max(
1671
            pv_cap_range / total_mean_cap,
1672
            1 / len(cap_range_buildings_gdf),
1673
        )
1674
1675
        samples_gdf = cap_range_buildings_gdf.sample(
1676
            frac=min(1, frac),
1677
            random_state=random_state,
1678
        )
1679
1680
        cap_range_dict = kwargs["prob_dict"][cap_range]
1681
1682
        values_dict = cap_range_dict["values"]
1683
        p_dict = cap_range_dict["probabilities"]
1684
1685
        load_factors = rng.choice(
1686
            a=values_dict["load_factor"],
1687
            size=len(samples_gdf),
1688
            p=p_dict["load_factor"],
1689
        )
1690
1691
        samples_gdf = samples_gdf.assign(
1692
            load_factor=load_factors,
1693
            capacity=(
1694
                samples_gdf.building_area
1695
                * load_factors
1696
                * kwargs["pv_cap_per_sq_m"]
1697
            ).clip(lower=0.4),
1698
        )
1699
1700
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
1701
1702
        samples_gdf = samples_gdf.assign(
1703
            capacity=(samples_gdf.capacity * missing_factor),
1704
            load_factor=(samples_gdf.load_factor * missing_factor),
1705
        )
1706
1707
        assert np.isclose(
1708
            samples_gdf.capacity.sum(),
1709
            pv_cap_range,
1710
            rtol=1e-03,
1711
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
1712
1713
        results_df = pd.concat(
1714
            [
1715
                results_df,
1716
                samples_gdf,
1717
            ],
1718
        )
1719
1720
    total_missing_factor = pv_cap / results_df.capacity.sum()
1721
1722
    results_df = results_df.assign(
1723
        capacity=(results_df.capacity * total_missing_factor),
1724
    )
1725
1726
    assert np.isclose(
1727
        results_df.capacity.sum(),
1728
        pv_cap,
1729
        rtol=1e-03,
1730
    ), f"{results_df.capacity.sum()} != {pv_cap}"
1731
1732
    return gpd.GeoDataFrame(
1733
        results_df,
1734
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 1618 is not entered. Are you sure this can never be the case?
Loading history...
1735
        geometry="geom",
1736
    )
1737
1738
1739
@timer_func
1740
def desaggregate_pv(
1741
    buildings_gdf: gpd.GeoDataFrame,
1742
    cap_df: pd.DataFrame,
1743
    **kwargs,
1744
) -> gpd.GeoDataFrame:
1745
    """
1746
    Desaggregate PV capacity on buildings within a given grid district.
1747
1748
    Parameters
1749
    -----------
1750
    buildings_gdf : geopandas.GeoDataFrame
1751
        GeoDataFrame containing OSM buildings data.
1752
    cap_df : pandas.DataFrame
1753
        DataFrame with total rooftop capacity per mv grid.
1754
    Other Parameters
1755
    -----------
1756
    prob_dict : dict
1757
        Dictionary with values and probabilities per capacity range.
1758
    cap_share_dict : dict
1759
        Dictionary with share of PV capacity from the total PV capacity within
1760
        capacity ranges.
1761
    building_area_range_dict : dict
1762
        Dictionary with estimated normal building area range per capacity
1763
        range.
1764
    load_factor_dict : dict
1765
        Dictionary with mean roof load factor per capacity range.
1766
    seed : int
1767
        Seed to use for random operations with NumPy and pandas.
1768
    pv_cap_per_sq_m : float, int
1769
        Average expected, installable PV capacity per square meter.
1770
    Returns
1771
    -------
1772
    geopandas.GeoDataFrame
1773
        GeoDataFrame containing OSM building data with desaggregated PV
1774
        plants.
1775
    """
1776
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
1777
1778
    building_bus_ids = set(buildings_gdf.bus_id)
1779
    cap_bus_ids = set(cap_df.index)
1780
1781
    logger.debug(
1782
        f"Bus IDs from buildings: {len(building_bus_ids)}\nBus IDs from "
1783
        f"capacity: {len(cap_bus_ids)}"
1784
    )
1785
1786
    if len(building_bus_ids) > len(cap_bus_ids):
1787
        missing = building_bus_ids - cap_bus_ids
1788
    else:
1789
        missing = cap_bus_ids - building_bus_ids
1790
1791
    logger.debug(str(missing))
1792
1793
    bus_ids = np.intersect1d(list(building_bus_ids), list(cap_bus_ids))
1794
1795
    # assert set(buildings_gdf.bus_id.unique()) == set(cap_df.index)
1796
1797
    for bus_id in bus_ids:
1798
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
1799
1800
        pv_installed_gdf = buildings_grid_gdf.loc[
1801
            ~buildings_grid_gdf.end_of_life
1802
        ]
1803
1804
        pv_installed = pv_installed_gdf.capacity.sum()
1805
1806
        pot_buildings_gdf = buildings_grid_gdf.drop(
1807
            index=pv_installed_gdf.index
1808
        )
1809
1810
        if len(pot_buildings_gdf) == 0:
1811
            logger.error(
1812
                f"In grid {bus_id} there are no potential buildings to "
1813
                f"allocate PV capacity to. The grid is skipped. This message "
1814
                f"should only appear doing test runs with few buildings."
1815
            )
1816
1817
            continue
1818
1819
        pv_target = cap_df.at[bus_id, "capacity"]
1820
1821
        logger.debug(f"pv_target: {pv_target}")
1822
1823
        pv_missing = pv_target - pv_installed
1824
1825
        if pv_missing <= 0:
1826
            logger.warning(
1827
                f"In grid {bus_id} there is more PV installed "
1828
                f"({pv_installed: g} kW) in status Quo than allocated within "
1829
                f"the scenario ({pv_target: g} kW). "
1830
                f"No new generators are added."
1831
            )
1832
1833
            continue
1834
1835
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
1836
            logger.error(
1837
                f"In grid {bus_id} there is less PV potential ("
1838
                f"{pot_buildings_gdf.max_cap.sum():g} MW) than allocated PV "
1839
                f"capacity ({pv_missing:g} MW). The average roof utilization "
1840
                f"will be very high."
1841
            )
1842
1843
        gdf = desaggregate_pv_in_mv_grid(
1844
            buildings_gdf=pot_buildings_gdf,
1845
            pv_cap=pv_missing,
1846
            **kwargs,
1847
        )
1848
1849
        logger.debug(f"New cap in grid {bus_id}: {gdf.capacity.sum()}")
1850
        logger.debug(f"Installed cap in grid {bus_id}: {pv_installed}")
1851
        logger.debug(
1852
            f"Total cap in grid {bus_id}: {gdf.capacity.sum() + pv_installed}"
1853
        )
1854
1855
        if not np.isclose(
1856
            gdf.capacity.sum() + pv_installed, pv_target, rtol=1e-3
1857
        ):
1858
            logger.warning(
1859
                f"The desired capacity and actual capacity in grid {bus_id} "
1860
                f"differ.\n"
1861
                f"Desired cap: {pv_target}\nActual cap: "
1862
                f"{gdf.capacity.sum() + pv_installed}"
1863
            )
1864
1865
        pre_cap = allocated_buildings_gdf.capacity.sum()
1866
        new_cap = gdf.capacity.sum()
1867
1868
        allocated_buildings_gdf = pd.concat(
1869
            [
1870
                allocated_buildings_gdf,
1871
                gdf,
1872
            ]
1873
        )
1874
1875
        total_cap = allocated_buildings_gdf.capacity.sum()
1876
1877
        assert np.isclose(pre_cap + new_cap, total_cap)
1878
1879
    logger.debug("Desaggregated scenario.")
1880
    logger.debug(f"Scenario capacity: {cap_df.capacity.sum(): g}")
1881
    logger.debug(
1882
        f"Generator capacity: " f"{allocated_buildings_gdf.capacity.sum(): g}"
1883
    )
1884
1885
    return gpd.GeoDataFrame(
1886
        allocated_buildings_gdf,
1887
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
1888
        geometry="geom",
1889
    )
1890
1891
1892
@timer_func
1893
def add_buildings_meta_data(
1894
    buildings_gdf: gpd.GeoDataFrame,
1895
    prob_dict: dict,
1896
    seed: int,
1897
) -> gpd.GeoDataFrame:
1898
    """
1899
    Randomly add additional metadata to desaggregated PV plants.
1900
    Parameters
1901
    -----------
1902
    buildings_gdf : geopandas.GeoDataFrame
1903
        GeoDataFrame containing OSM buildings data with desaggregated PV
1904
        plants.
1905
    prob_dict : dict
1906
        Dictionary with values and probabilities per capacity range.
1907
    seed : int
1908
        Seed to use for random operations with NumPy and pandas.
1909
    Returns
1910
    -------
1911
    geopandas.GeoDataFrame
1912
        GeoDataFrame containing OSM building data with desaggregated PV
1913
        plants.
1914
    """
1915
    rng = default_rng(seed=seed)
1916
    buildings_gdf = buildings_gdf.reset_index().rename(
1917
        columns={
1918
            "index": "building_id",
1919
        }
1920
    )
1921
1922
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
1923
        cap_range_gdf = buildings_gdf.loc[
1924
            (buildings_gdf.capacity >= min_cap)
1925
            & (buildings_gdf.capacity < max_cap)
1926
        ]
1927
1928
        for key, values in cap_range_prob_dict["values"].items():
1929
            if key == "load_factor":
1930
                continue
1931
1932
            gdf = cap_range_gdf.loc[
1933
                cap_range_gdf[key].isna()
1934
                | cap_range_gdf[key].isnull()
1935
                | (cap_range_gdf[key] == "None")
1936
            ]
1937
1938
            key_vals = rng.choice(
1939
                a=values,
1940
                size=len(gdf),
1941
                p=cap_range_prob_dict["probabilities"][key],
1942
            )
1943
1944
            buildings_gdf.loc[gdf.index, key] = key_vals
1945
1946
    return buildings_gdf
1947
1948
1949
def add_commissioning_date(
1950
    buildings_gdf: gpd.GeoDataFrame,
1951
    start: pd.Timestamp,
1952
    end: pd.Timestamp,
1953
    seed: int,
1954
):
1955
    """
1956
    Randomly and linear add start-up date to new pv generators.
1957
    Parameters
1958
    ----------
1959
    buildings_gdf : geopandas.GeoDataFrame
1960
        GeoDataFrame containing OSM buildings data with desaggregated PV
1961
        plants.
1962
    start : pandas.Timestamp
1963
        Minimum Timestamp to use.
1964
    end : pandas.Timestamp
1965
        Maximum Timestamp to use.
1966
    seed : int
1967
        Seed to use for random operations with NumPy and pandas.
1968
    Returns
1969
    -------
1970
    geopandas.GeoDataFrame
1971
        GeoDataFrame containing OSM buildings data with start-up date added.
1972
    """
1973
    rng = default_rng(seed=seed)
1974
1975
    date_range = pd.date_range(start=start, end=end, freq="1D")
1976
1977
    return buildings_gdf.assign(
1978
        commissioning_date=rng.choice(date_range, size=len(buildings_gdf))
1979
    )
1980
1981
1982
@timer_func
1983
def allocate_scenarios(
1984
    mastr_gdf: gpd.GeoDataFrame,
1985
    valid_buildings_gdf: gpd.GeoDataFrame,
1986
    last_scenario_gdf: gpd.GeoDataFrame,
1987
    scenario: str,
1988
):
1989
    """
1990
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
1991
    Parameters
1992
    ----------
1993
    mastr_gdf : geopandas.GeoDataFrame
1994
        GeoDataFrame containing geocoded MaStR data.
1995
    valid_buildings_gdf : geopandas.GeoDataFrame
1996
        GeoDataFrame containing OSM buildings data.
1997
    last_scenario_gdf : geopandas.GeoDataFrame
1998
        GeoDataFrame containing OSM buildings matched with pv generators from
1999
        temporally preceding scenario.
2000
    scenario : str
2001
        Scenario to desaggrgate and allocate.
2002
    Returns
2003
    -------
2004
    tuple
2005
        geopandas.GeoDataFrame
2006
            GeoDataFrame containing OSM buildings matched with pv generators.
2007
        pandas.DataFrame
2008
            DataFrame containing pv rooftop capacity per grid id.
2009
    """
2010
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2011
2012
    logger.debug(
2013
        f"cap_per_bus_id_df total capacity: {cap_per_bus_id_df.capacity.sum()}"
2014
    )
2015
2016
    last_scenario_gdf = determine_end_of_life_gens(
2017
        last_scenario_gdf,
2018
        SCENARIO_TIMESTAMP[scenario],
2019
        PV_ROOFTOP_LIFETIME,
2020
    )
2021
2022
    buildings_gdf = calculate_max_pv_cap_per_building(
2023
        valid_buildings_gdf,
2024
        last_scenario_gdf,
2025
        PV_CAP_PER_SQ_M,
2026
        ROOF_FACTOR,
2027
    )
2028
2029
    mastr_gdf = calculate_building_load_factor(
2030
        mastr_gdf,
2031
        buildings_gdf,
2032
    )
2033
2034
    probabilities_dict = probabilities(
2035
        mastr_gdf,
2036
        cap_ranges=CAP_RANGES,
2037
    )
2038
2039
    cap_share_dict = cap_share_per_cap_range(
2040
        mastr_gdf,
2041
        cap_ranges=CAP_RANGES,
2042
    )
2043
2044
    load_factor_dict = mean_load_factor_per_cap_range(
2045
        mastr_gdf,
2046
        cap_ranges=CAP_RANGES,
2047
    )
2048
2049
    building_area_range_dict = building_area_range_per_cap_range(
2050
        mastr_gdf,
2051
        cap_ranges=CAP_RANGES,
2052
        min_building_size=MIN_BUILDING_SIZE,
2053
        upper_quantile=UPPER_QUANTILE,
2054
        lower_quantile=LOWER_QUANTILE,
2055
    )
2056
2057
    allocated_buildings_gdf = desaggregate_pv(
2058
        buildings_gdf=buildings_gdf,
2059
        cap_df=cap_per_bus_id_df,
2060
        prob_dict=probabilities_dict,
2061
        cap_share_dict=cap_share_dict,
2062
        building_area_range_dict=building_area_range_dict,
2063
        load_factor_dict=load_factor_dict,
2064
        seed=SEED,
2065
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2066
    )
2067
2068
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2069
2070
    meta_buildings_gdf = frame_to_numeric(
2071
        add_buildings_meta_data(
2072
            allocated_buildings_gdf,
2073
            probabilities_dict,
2074
            SEED,
2075
        )
2076
    )
2077
2078
    return (
2079
        add_commissioning_date(
2080
            meta_buildings_gdf,
2081
            start=last_scenario_gdf.commissioning_date.max(),
2082
            end=SCENARIO_TIMESTAMP[scenario],
2083
            seed=SEED,
2084
        ),
2085
        cap_per_bus_id_df,
2086
    )
2087
2088
2089
class EgonPowerPlantPvRoofBuilding(Base):
2090
    __tablename__ = "egon_power_plants_pv_roof_building"
2091
    __table_args__ = {"schema": "supply"}
2092
2093
    index = Column(Integer, primary_key=True, index=True)
2094
    scenario = Column(String)
2095
    bus_id = Column(Integer, nullable=True)
2096
    building_id = Column(Integer)
2097
    gens_id = Column(String, nullable=True)
2098
    capacity = Column(Float)
2099
    orientation_uniform = Column(Float)
2100
    orientation_primary = Column(String)
2101
    orientation_primary_angle = Column(String)
2102
    voltage_level = Column(Integer)
2103
    weather_cell_id = Column(Integer)
2104
2105
2106
def create_scenario_table(buildings_gdf):
2107
    """Create mapping table pv_unit <-> building for scenario"""
2108
    EgonPowerPlantPvRoofBuilding.__table__.drop(bind=engine, checkfirst=True)
2109
    EgonPowerPlantPvRoofBuilding.__table__.create(bind=engine, checkfirst=True)
2110
2111
    buildings_gdf[COLS_TO_EXPORT].reset_index().to_sql(
2112
        name=EgonPowerPlantPvRoofBuilding.__table__.name,
2113
        schema=EgonPowerPlantPvRoofBuilding.__table__.schema,
2114
        con=db.engine(),
2115
        if_exists="append",
2116
        index=False,
2117
    )
2118
2119
2120
def add_weather_cell_id(buildings_gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
2121
    sql = """
2122
    SELECT building_id, zensus_population_id
2123
    FROM boundaries.egon_map_zensus_mvgd_buildings
2124
    """
2125
2126
    buildings_gdf = buildings_gdf.merge(
2127
        right=db.select_dataframe(sql).drop_duplicates(subset="building_id"),
2128
        how="left",
2129
        on="building_id",
2130
    )
2131
2132
    sql = """
2133
    SELECT zensus_population_id, w_id as weather_cell_id
2134
    FROM boundaries.egon_map_zensus_weather_cell
2135
    """
2136
2137
    buildings_gdf = buildings_gdf.merge(
2138
        right=db.select_dataframe(sql).drop_duplicates(
2139
            subset="zensus_population_id"
2140
        ),
2141
        how="left",
2142
        on="zensus_population_id",
2143
    )
2144
2145
    if buildings_gdf.weather_cell_id.isna().any():
2146
        missing = buildings_gdf.loc[
2147
            buildings_gdf.weather_cell_id.isna(), "building_id"
2148
        ].tolist()
2149
2150
        raise ValueError(
2151
            f"Following buildings don't have a weather cell id: {missing}"
2152
        )
2153
2154
    return buildings_gdf
2155
2156
2157
def add_bus_ids_sq(
2158
    buildings_gdf: gpd.GeoDataFrame,
2159
) -> gpd.GeoDataFrame:
2160
    """Add bus ids for status_quo units
2161
2162
    Parameters
2163
    -----------
2164
    buildings_gdf : geopandas.GeoDataFrame
2165
        GeoDataFrame containing OSM buildings data with desaggregated PV
2166
        plants.
2167
    Returns
2168
    -------
2169
    geopandas.GeoDataFrame
2170
        GeoDataFrame containing OSM building data with bus_id per
2171
        generator.
2172
    """
2173
    grid_districts_gdf = grid_districts(EPSG)
2174
2175
    mask = buildings_gdf.scenario == "status_quo"
2176
    buildings_gdf.loc[mask, "bus_id"] = (
2177
        buildings_gdf.loc[mask]
2178
        .sjoin(grid_districts_gdf, how="left")
2179
        .index_right
2180
    )
2181
2182
    return buildings_gdf
2183
2184
2185 View Code Duplication
def infer_voltage_level(
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
2186
    units_gdf: gpd.GeoDataFrame,
2187
) -> gpd.GeoDataFrame:
2188
    """
2189
    Infer nan values in voltage level derived from generator capacity to
2190
    the power plants.
2191
2192
    Parameters
2193
    -----------
2194
    units_gdf : geopandas.GeoDataFrame
2195
        GeoDataFrame containing units with voltage levels from MaStR
2196
    Returnsunits_gdf: gpd.GeoDataFrame
2197
    -------
2198
    geopandas.GeoDataFrame
2199
        GeoDataFrame containing units all having assigned a voltage level.
2200
    """
2201
2202
    def voltage_levels(p: float) -> int:
2203
        if p <= 0.1:
2204
            return 7
2205
        elif p <= 0.2:
2206
            return 6
2207
        elif p <= 5.5:
2208
            return 5
2209
        elif p <= 20:
2210
            return 4
2211
        elif p <= 120:
2212
            return 3
2213
        return 1
2214
2215
    units_gdf["voltage_level_inferred"] = False
2216
    mask = units_gdf.voltage_level.isna()
2217
    units_gdf.loc[mask, "voltage_level_inferred"] = True
2218
    units_gdf.loc[mask, "voltage_level"] = units_gdf.loc[mask].capacity.apply(
2219
        voltage_levels
2220
    )
2221
2222
    return units_gdf
2223
2224
2225
def pv_rooftop_to_buildings():
2226
    """Main script, executed as task"""
2227
2228
    mastr_gdf = load_mastr_data()
2229
2230
    buildings_gdf = load_building_data()
2231
2232
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2233
        mastr_gdf, buildings_gdf
2234
    )
2235
2236
    all_buildings_gdf = (
2237
        desagg_mastr_gdf.assign(scenario="status_quo")
2238
        .reset_index()
2239
        .rename(columns={"geometry": "geom"})
2240
    )
2241
2242
    scenario_buildings_gdf = all_buildings_gdf.copy()
2243
2244
    cap_per_bus_id_df = pd.DataFrame()
2245
2246
    for scenario in SCENARIOS:
2247
        logger.debug(f"Desaggregating scenario {scenario}.")
2248
        (
2249
            scenario_buildings_gdf,
2250
            cap_per_bus_id_scenario_df,
2251
        ) = allocate_scenarios(  # noqa: F841
2252
            desagg_mastr_gdf,
2253
            desagg_buildings_gdf,
2254
            scenario_buildings_gdf,
2255
            scenario,
2256
        )
2257
2258
        all_buildings_gdf = gpd.GeoDataFrame(
2259
            pd.concat(
2260
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2261
            ),
2262
            crs=scenario_buildings_gdf.crs,
2263
            geometry="geom",
2264
        )
2265
2266
        cap_per_bus_id_df = pd.concat(
2267
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2268
        )
2269
2270
    # add weather cell
2271
    all_buildings_gdf = add_weather_cell_id(all_buildings_gdf)
2272
2273
    # add bus IDs for status quo scenario
2274
    all_buildings_gdf = add_bus_ids_sq(all_buildings_gdf)
2275
2276
    # export scenario
2277
    create_scenario_table(infer_voltage_level(all_buildings_gdf))
2278