Passed
Pull Request — dev (#1122)
by
unknown
04:34
created

add_metadata()   B

Complexity

Conditions 1

Size

Total Lines 129
Code Lines 85

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
eloc 85
dl 0
loc 129
rs 7.4909
c 0
b 0
f 0
cc 1
nop 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
Distribute MaStR PV rooftop capacities to OSM and synthetic buildings. Generate
3
new PV rooftop generators for scenarios eGon2035 and eGon100RE.
4
5
Data cleaning and inference:
6
* Drop duplicates and entries with missing critical data.
7
* Determine most plausible capacity from multiple values given in MaStR data.
8
* Drop generators which don't have any plausible capacity data
9
  (23.5MW > P > 0.1).
10
* Randomly and weighted add a start-up date if it is missing.
11
* Extract zip and municipality from 'site' given in MaStR data.
12
* Geocode unique zip and municipality combinations with Nominatim (1 sec
13
  delay). Drop generators for which geocoding failed or which are located
14
  outside the municipalities of Germany.
15
* Add some visual sanity checks for cleaned data.
16
17
Allocation of MaStR data:
18
* Allocate each generator to an existing building from OSM.
19
* Determine the quantile each generator and building is in depending on the
20
  capacity of the generator and the area of the polygon of the building.
21
* Randomly distribute generators within each municipality preferably within
22
  the same building area quantile as the generators are capacity wise.
23
* If not enough buildings exists within a municipality and quantile additional
24
  buildings from other quantiles are chosen randomly.
25
26
Desegregation of pv rooftop scenarios:
27
* The scenario data per federal state is linearly distributed to the mv grid
28
  districts according to the pv rooftop potential per mv grid district.
29
* The rooftop potential is estimated from the building area given from the OSM
30
  buildings.
31
* Grid districts, which are located in several federal states, are allocated
32
  PV capacity according to their respective roof potential in the individual
33
  federal states.
34
* The desegregation of PV plants within a grid districts respects existing
35
  plants from MaStR, which did not reach their end of life.
36
* New PV plants are randomly and weighted generated using a breakdown of MaStR
37
  data as generator basis.
38
* Plant metadata (e.g. plant orientation) is also added random and weighted
39
  from MaStR data as basis.
40
"""
41
from __future__ import annotations
42
43
from collections import Counter
44
from functools import wraps
45
from time import perf_counter
46
import datetime
47
import json
48
49
from geoalchemy2 import Geometry
50
from loguru import logger
51
from numpy.random import RandomState, default_rng
52
from omi.dialects import get_dialect
53
from pyproj.crs.crs import CRS
54
from sqlalchemy import BigInteger, Column, Float, Integer, String
55
from sqlalchemy.dialects.postgresql import HSTORE
56
from sqlalchemy.ext.declarative import declarative_base
57
import geopandas as gpd
58
import numpy as np
59
import pandas as pd
60
61
from egon.data import config, db
62
from egon.data.datasets.electricity_demand_timeseries.hh_buildings import (
63
    OsmBuildingsSynthetic,
64
)
65
from egon.data.datasets.power_plants.mastr_db_classes import EgonPowerPlantsPv
66
from egon.data.datasets.scenario_capacities import EgonScenarioCapacities
67
from egon.data.datasets.zensus_vg250 import Vg250Gem
68
from egon.data.metadata import (
69
    context,
70
    contributors,
71
    generate_resource_fields_from_db_table,
72
    license_dedl,
73
    license_odbl,
74
    meta_metadata,
75
    oep_metadata_version,
76
    sources,
77
)
78
79
engine = db.engine()
80
Base = declarative_base()
81
SEED = int(config.settings()["egon-data"]["--random-seed"])
82
83
# TODO: move to yml
84
MASTR_INDEX_COL = "gens_id"
85
86
EPSG = 4326
87
SRID = 3035
88
89
# data cleaning
90
MAX_REALISTIC_PV_CAP = 23500 / 10**3
91
MIN_REALISTIC_PV_CAP = 0.1 / 10**3
92
93
# show additional logging information
94
VERBOSE = False
95
96
# Number of quantiles
97
Q = 5
98
99
# Scenario Data
100
SCENARIOS = ["eGon2035", "eGon100RE"]
101
SCENARIO_TIMESTAMP = {
102
    "eGon2035": pd.Timestamp("2035-01-01", tz="UTC"),
103
    "eGon100RE": pd.Timestamp("2050-01-01", tz="UTC"),
104
}
105
PV_ROOFTOP_LIFETIME = pd.Timedelta(20 * 365, unit="D")
106
107
# Example Modul Trina Vertex S TSM-400DE09M.08 400 Wp
108
# https://www.photovoltaik4all.de/media/pdf/92/64/68/Trina_Datasheet_VertexS_DE09-08_2021_A.pdf
109
MODUL_CAP = 0.4 / 10**3  # MWp
110
MODUL_SIZE = 1.096 * 1.754  # m²
111
PV_CAP_PER_SQ_M = MODUL_CAP / MODUL_SIZE
112
113
# Estimation of usable roof area
114
# Factor for the conversion of building area to roof area
115
# estimation mean roof pitch: 35°
116
# estimation usable roof share: 80%
117
# estimation that only the south side of the building is used for pv
118
# see https://mediatum.ub.tum.de/doc/%20969497/969497.pdf
119
# AREA_FACTOR = 1.221
120
# USABLE_ROOF_SHARE = 0.8
121
# SOUTH_SHARE = 0.5
122
# ROOF_FACTOR = AREA_FACTOR * USABLE_ROOF_SHARE * SOUTH_SHARE
123
ROOF_FACTOR = 0.5
124
125
CAP_RANGES = [
126
    (0, 30 / 10**3),
127
    (30 / 10**3, 100 / 10**3),
128
    (100 / 10**3, float("inf")),
129
]
130
131
MIN_BUILDING_SIZE = 10.0
132
UPPER_QUANTILE = 0.95
133
LOWER_QUANTILE = 0.05
134
135
COLS_TO_EXPORT = [
136
    "scenario",
137
    "bus_id",
138
    "building_id",
139
    "gens_id",
140
    "capacity",
141
    "orientation_uniform",
142
    "orientation_primary",
143
    "orientation_primary_angle",
144
    "voltage_level",
145
    "weather_cell_id",
146
]
147
148
# TODO
149
INCLUDE_SYNTHETIC_BUILDINGS = True
150
ONLY_BUILDINGS_WITH_DEMAND = True
151
TEST_RUN = False
152
153
154
def timer_func(func):
155
    @wraps(func)
156
    def timeit_wrapper(*args, **kwargs):
157
        start_time = perf_counter()
158
        result = func(*args, **kwargs)
159
        end_time = perf_counter()
160
        total_time = end_time - start_time
161
        logger.debug(
162
            f"Function {func.__name__} took {total_time:.4f} seconds."
163
        )
164
        return result
165
166
    return timeit_wrapper
167
168
169
@timer_func
170
def mastr_data(
171
    index_col: str | int | list[str] | list[int],
172
) -> gpd.GeoDataFrame:
173
    """
174
    Read MaStR data from database.
175
176
    Parameters
177
    -----------
178
    index_col : str, int or list of str or int
179
        Column(s) to use as the row labels of the DataFrame.
180
    Returns
181
    -------
182
    pandas.DataFrame
183
        DataFrame containing MaStR data.
184
    """
185
    with db.session_scope() as session:
186
        query = session.query(EgonPowerPlantsPv).filter(
187
            EgonPowerPlantsPv.status == "InBetrieb",
188
            EgonPowerPlantsPv.site_type
189
            == ("Bauliche Anlagen (Hausdach, Gebäude und Fassade)"),
190
        )
191
192
    gdf = gpd.read_postgis(
193
        query.statement, query.session.bind, index_col=index_col
194
    ).drop(columns="id")
195
196
    logger.debug("MaStR data loaded.")
197
198
    return gdf
199
200
201
@timer_func
202
def clean_mastr_data(
203
    mastr_gdf: gpd.GeoDataFrame,
204
    max_realistic_pv_cap: int | float,
205
    min_realistic_pv_cap: int | float,
206
    seed: int,
207
) -> gpd.GeoDataFrame:
208
    """
209
    Clean the MaStR data from implausible data.
210
211
    * Drop MaStR ID duplicates.
212
    * Drop generators with implausible capacities.
213
214
    Parameters
215
    -----------
216
    mastr_gdf : pandas.DataFrame
217
        DataFrame containing MaStR data.
218
    max_realistic_pv_cap : int or float
219
        Maximum capacity, which is considered to be realistic.
220
    min_realistic_pv_cap : int or float
221
        Minimum capacity, which is considered to be realistic.
222
    seed : int
223
        Seed to use for random operations with NumPy and pandas.
224
    Returns
225
    -------
226
    pandas.DataFrame
227
        DataFrame containing cleaned MaStR data.
228
    """
229
    init_len = len(mastr_gdf)
230
231
    # drop duplicates
232
    mastr_gdf = mastr_gdf.loc[~mastr_gdf.index.duplicated()]
233
234
    # drop generators without any capacity info
235
    # and capacity of zero
236
    # and if the capacity is > 23.5 MW, because
237
    # Germanies largest rooftop PV is 23 MW
238
    # https://www.iwr.de/news/groesste-pv-dachanlage-europas-wird-in-sachsen-anhalt-gebaut-news37379
239
    mastr_gdf = mastr_gdf.loc[
240
        ~mastr_gdf.capacity.isna()
241
        & (mastr_gdf.capacity <= max_realistic_pv_cap)
242
        & (mastr_gdf.capacity > min_realistic_pv_cap)
243
    ]
244
245
    # get consistent start-up date
246
    # randomly and weighted fill missing start-up dates
247
    pool = mastr_gdf.loc[
248
        ~mastr_gdf.commissioning_date.isna()
249
    ].commissioning_date.to_numpy()
250
251
    size = len(mastr_gdf) - len(pool)
252
253
    if size > 0:
254
        rng = default_rng(seed=seed)
255
256
        choice = rng.choice(
257
            pool,
258
            size=size,
259
            replace=False,
260
        )
261
262
        mastr_gdf.loc[mastr_gdf.commissioning_date.isna()] = mastr_gdf.loc[
263
            mastr_gdf.commissioning_date.isna()
264
        ].assign(commissioning_date=choice)
265
266
        logger.info(
267
            f"Randomly and weigthed added start-up date to {size} generators."
268
        )
269
270
    mastr_gdf = mastr_gdf.assign(
271
        commissioning_date=pd.to_datetime(
272
            mastr_gdf.commissioning_date, utc=True
273
        )
274
    )
275
276
    end_len = len(mastr_gdf)
277
    logger.debug(
278
        f"Dropped {init_len - end_len} "
279
        f"({((init_len - end_len) / init_len) * 100:g}%)"
280
        f" of {init_len} rows from MaStR DataFrame."
281
    )
282
283
    return mastr_gdf
284
285
286
@timer_func
287
def municipality_data() -> gpd.GeoDataFrame:
288
    """
289
    Get municipality data from eGo^n Database.
290
    Returns
291
    -------
292
    gepandas.GeoDataFrame
293
        GeoDataFrame with municipality data.
294
    """
295
    with db.session_scope() as session:
296
        query = session.query(Vg250Gem.ags, Vg250Gem.geometry.label("geom"))
297
298
    return gpd.read_postgis(
299
        query.statement, query.session.bind, index_col="ags"
300
    )
301
302
303
@timer_func
304
def add_ags_to_gens(
305
    mastr_gdf: gpd.GeoDataFrame,
306
    municipalities_gdf: gpd.GeoDataFrame,
307
) -> gpd.GeoDataFrame:
308
    """
309
    Add information about AGS ID to generators.
310
    Parameters
311
    -----------
312
    mastr_gdf : geopandas.GeoDataFrame
313
        GeoDataFrame with valid and cleaned MaStR data.
314
    municipalities_gdf : geopandas.GeoDataFrame
315
        GeoDataFrame with municipality data.
316
    Returns
317
    -------
318
    gepandas.GeoDataFrame
319
        GeoDataFrame with valid and cleaned MaStR data
320
        with AGS ID added.
321
    """
322
    return mastr_gdf.sjoin(
323
        municipalities_gdf,
324
        how="left",
325
        predicate="intersects",
326
    ).rename(columns={"index_right": "ags"})
327
328
329
def drop_gens_outside_muns(
330
    mastr_gdf: gpd.GeoDataFrame,
331
) -> gpd.GeoDataFrame:
332
    """
333
    Drop all generators outside of municipalities.
334
    Parameters
335
    -----------
336
    mastr_gdf : geopandas.GeoDataFrame
337
        GeoDataFrame with valid and cleaned MaStR data.
338
    Returns
339
    -------
340
    gepandas.GeoDataFrame
341
        GeoDataFrame with valid and cleaned MaStR data
342
        with generatos without an AGS ID dropped.
343
    """
344
    gdf = mastr_gdf.loc[~mastr_gdf.ags.isna()]
345
346
    logger.debug(
347
        f"{len(mastr_gdf) - len(gdf)} ("
348
        f"{(len(mastr_gdf) - len(gdf)) / len(mastr_gdf) * 100:g}%)"
349
        f" of {len(mastr_gdf)} values are outside of the municipalities"
350
        " and are therefore dropped."
351
    )
352
353
    return gdf
354
355
356
def load_mastr_data():
357
    """Read PV rooftop data from MaStR CSV
358
    Note: the source will be replaced as soon as the MaStR data is available
359
    in DB.
360
    Returns
361
    -------
362
    geopandas.GeoDataFrame
363
        GeoDataFrame containing MaStR data with geocoded locations.
364
    """
365
    mastr_gdf = mastr_data(
366
        MASTR_INDEX_COL,
367
    )
368
369
    clean_mastr_gdf = clean_mastr_data(
370
        mastr_gdf,
371
        max_realistic_pv_cap=MAX_REALISTIC_PV_CAP,
372
        min_realistic_pv_cap=MIN_REALISTIC_PV_CAP,
373
        seed=SEED,
374
    )
375
376
    municipalities_gdf = municipality_data()
377
378
    clean_mastr_gdf = add_ags_to_gens(clean_mastr_gdf, municipalities_gdf)
379
380
    return drop_gens_outside_muns(clean_mastr_gdf)
381
382
383
class OsmBuildingsFiltered(Base):
384
    __tablename__ = "osm_buildings_filtered"
385
    __table_args__ = {"schema": "openstreetmap"}
386
387
    osm_id = Column(BigInteger)
388
    amenity = Column(String)
389
    building = Column(String)
390
    name = Column(String)
391
    geom = Column(Geometry(srid=SRID), index=True)
392
    area = Column(Float)
393
    geom_point = Column(Geometry(srid=SRID), index=True)
394
    tags = Column(HSTORE)
395
    id = Column(BigInteger, primary_key=True, index=True)
396
397
398
@timer_func
399
def osm_buildings(
400
    to_crs: CRS,
401
) -> gpd.GeoDataFrame:
402
    """
403
    Read OSM buildings data from eGo^n Database.
404
    Parameters
405
    -----------
406
    to_crs : pyproj.crs.crs.CRS
407
        CRS to transform geometries to.
408
    Returns
409
    -------
410
    geopandas.GeoDataFrame
411
        GeoDataFrame containing OSM buildings data.
412
    """
413
    with db.session_scope() as session:
414
        query = session.query(
415
            OsmBuildingsFiltered.id,
416
            OsmBuildingsFiltered.area,
417
            OsmBuildingsFiltered.geom_point.label("geom"),
418
        )
419
420
    return gpd.read_postgis(
421
        query.statement, query.session.bind, index_col="id"
422
    ).to_crs(to_crs)
423
424
425
@timer_func
426
def synthetic_buildings(
427
    to_crs: CRS,
428
) -> gpd.GeoDataFrame:
429
    """
430
    Read synthetic buildings data from eGo^n Database.
431
    Parameters
432
    -----------
433
    to_crs : pyproj.crs.crs.CRS
434
        CRS to transform geometries to.
435
    Returns
436
    -------
437
    geopandas.GeoDataFrame
438
        GeoDataFrame containing OSM buildings data.
439
    """
440
    with db.session_scope() as session:
441
        query = session.query(
442
            OsmBuildingsSynthetic.id,
443
            OsmBuildingsSynthetic.area,
444
            OsmBuildingsSynthetic.geom_point.label("geom"),
445
        )
446
447
    return gpd.read_postgis(
448
        query.statement, query.session.bind, index_col="id"
449
    ).to_crs(to_crs)
450
451
452
@timer_func
453
def add_ags_to_buildings(
454
    buildings_gdf: gpd.GeoDataFrame,
455
    municipalities_gdf: gpd.GeoDataFrame,
456
) -> gpd.GeoDataFrame:
457
    """
458
    Add information about AGS ID to buildings.
459
    Parameters
460
    -----------
461
    buildings_gdf : geopandas.GeoDataFrame
462
        GeoDataFrame containing OSM buildings data.
463
    municipalities_gdf : geopandas.GeoDataFrame
464
        GeoDataFrame with municipality data.
465
    Returns
466
    -------
467
    gepandas.GeoDataFrame
468
        GeoDataFrame containing OSM buildings data
469
        with AGS ID added.
470
    """
471
    return buildings_gdf.sjoin(
472
        municipalities_gdf,
473
        how="left",
474
        predicate="intersects",
475
    ).rename(columns={"index_right": "ags"})
476
477
478
def drop_buildings_outside_muns(
479
    buildings_gdf: gpd.GeoDataFrame,
480
) -> gpd.GeoDataFrame:
481
    """
482
    Drop all buildings outside of municipalities.
483
    Parameters
484
    -----------
485
    buildings_gdf : geopandas.GeoDataFrame
486
        GeoDataFrame containing OSM buildings data.
487
    Returns
488
    -------
489
    gepandas.GeoDataFrame
490
        GeoDataFrame containing OSM buildings data
491
        with buildings without an AGS ID dropped.
492
    """
493
    gdf = buildings_gdf.loc[~buildings_gdf.ags.isna()]
494
495
    logger.debug(
496
        f"{len(buildings_gdf) - len(gdf)} "
497
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
498
        f"of {len(buildings_gdf)} values are outside of the municipalities "
499
        "and are therefore dropped."
500
    )
501
502
    return gdf
503
504
505
def egon_building_peak_loads():
506
    sql = """
507
    SELECT building_id
508
    FROM demand.egon_building_electricity_peak_loads
509
    WHERE scenario = 'eGon2035'
510
    """
511
512
    return (
513
        db.select_dataframe(sql).building_id.astype(int).sort_values().unique()
514
    )
515
516
517
@timer_func
518
def load_building_data():
519
    """
520
    Read buildings from DB
521
    Tables:
522
523
    * `openstreetmap.osm_buildings_filtered` (from OSM)
524
    * `openstreetmap.osm_buildings_synthetic` (synthetic, created by us)
525
526
    Use column `id` for both as it is unique hence you concat both datasets.
527
    If INCLUDE_SYNTHETIC_BUILDINGS is False synthetic buildings will not be
528
    loaded.
529
530
    Returns
531
    -------
532
    gepandas.GeoDataFrame
533
        GeoDataFrame containing OSM buildings data with buildings without an
534
        AGS ID dropped.
535
    """
536
537
    municipalities_gdf = municipality_data()
538
539
    osm_buildings_gdf = osm_buildings(municipalities_gdf.crs)
540
541
    if INCLUDE_SYNTHETIC_BUILDINGS:
542
        synthetic_buildings_gdf = synthetic_buildings(municipalities_gdf.crs)
543
544
        buildings_gdf = gpd.GeoDataFrame(
545
            pd.concat(
546
                [
547
                    osm_buildings_gdf,
548
                    synthetic_buildings_gdf,
549
                ]
550
            ),
551
            geometry="geom",
552
            crs=osm_buildings_gdf.crs,
553
        ).rename(columns={"area": "building_area"})
554
555
        buildings_gdf.index = buildings_gdf.index.astype(int)
556
557
    else:
558
        buildings_gdf = osm_buildings_gdf.rename(
559
            columns={"area": "building_area"}
560
        )
561
562
    if ONLY_BUILDINGS_WITH_DEMAND:
563
        building_ids = egon_building_peak_loads()
564
565
        init_len = len(building_ids)
566
567
        building_ids = np.intersect1d(
568
            list(map(int, building_ids)),
569
            list(map(int, buildings_gdf.index.to_numpy())),
570
        )
571
572
        end_len = len(building_ids)
573
574
        logger.debug(
575
            f"{end_len/init_len * 100: g} % ({end_len} / {init_len}) "
576
            f"of buildings have peak load."
577
        )
578
579
        buildings_gdf = buildings_gdf.loc[building_ids]
580
581
    buildings_ags_gdf = add_ags_to_buildings(buildings_gdf, municipalities_gdf)
582
583
    buildings_ags_gdf = drop_buildings_outside_muns(buildings_ags_gdf)
584
585
    grid_districts_gdf = grid_districts(EPSG)
586
587
    federal_state_gdf = federal_state_data(grid_districts_gdf.crs)
588
589
    grid_federal_state_gdf = overlay_grid_districts_with_counties(
590
        grid_districts_gdf,
591
        federal_state_gdf,
592
    )
593
594
    buildings_overlay_gdf = add_overlay_id_to_buildings(
595
        buildings_ags_gdf,
596
        grid_federal_state_gdf,
597
    )
598
599
    logger.debug("Loaded buildings.")
600
601
    buildings_overlay_gdf = drop_buildings_outside_grids(buildings_overlay_gdf)
602
603
    # overwrite bus_id with data from new table
604
    sql = (
605
        "SELECT building_id, bus_id FROM "
606
        "boundaries.egon_map_zensus_mvgd_buildings"
607
    )
608
    map_building_bus_df = db.select_dataframe(sql)
609
610
    building_ids = np.intersect1d(
611
        list(map(int, map_building_bus_df.building_id.unique())),
612
        list(map(int, buildings_overlay_gdf.index.to_numpy())),
613
    )
614
615
    buildings_within_gdf = buildings_overlay_gdf.loc[building_ids]
616
617
    gdf = (
618
        buildings_within_gdf.reset_index()
619
        .drop(columns=["bus_id"])
620
        .merge(
621
            how="left",
622
            right=map_building_bus_df,
623
            left_on="id",
624
            right_on="building_id",
625
        )
626
        .drop(columns=["building_id"])
627
        .set_index("id")
628
        .sort_index()
629
    )
630
631
    return gdf[~gdf.index.duplicated(keep="first")]
632
633
634
@timer_func
635
def sort_and_qcut_df(
636
    df: pd.DataFrame | gpd.GeoDataFrame,
637
    col: str,
638
    q: int,
639
) -> pd.DataFrame | gpd.GeoDataFrame:
640
    """
641
    Determine the quantile of a given attribute in a (Geo)DataFrame.
642
    Sort the (Geo)DataFrame in ascending order for the given attribute.
643
    Parameters
644
    -----------
645
    df : pandas.DataFrame or geopandas.GeoDataFrame
646
        (Geo)DataFrame to sort and qcut.
647
    col : str
648
        Name of the attribute to sort and qcut the (Geo)DataFrame on.
649
    q : int
650
        Number of quantiles.
651
    Returns
652
    -------
653
    pandas.DataFrame or gepandas.GeoDataFrame
654
        Sorted and qcut (Geo)DataFrame.
655
    """
656
    df = df.sort_values(col, ascending=True)
657
658
    return df.assign(
659
        quant=pd.qcut(
660
            df[col],
661
            q=q,
662
            labels=range(q),
663
        )
664
    )
665
666
667
@timer_func
668
def allocate_pv(
669
    q_mastr_gdf: gpd.GeoDataFrame,
670
    q_buildings_gdf: gpd.GeoDataFrame,
671
    seed: int,
672
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
673
    """
674
    Allocate the MaStR pv generators to the OSM buildings.
675
    This will determine a building for each pv generator if there are more
676
    buildings than generators within a given AGS. Primarily generators are
677
    distributed with the same qunatile as the buildings. Multiple assignment
678
    is excluded.
679
    Parameters
680
    -----------
681
    q_mastr_gdf : geopandas.GeoDataFrame
682
        GeoDataFrame containing geocoded and qcut MaStR data.
683
    q_buildings_gdf : geopandas.GeoDataFrame
684
        GeoDataFrame containing qcut OSM buildings data.
685
    seed : int
686
        Seed to use for random operations with NumPy and pandas.
687
    Returns
688
    -------
689
    tuple with two geopandas.GeoDataFrame s
690
        GeoDataFrame containing MaStR data allocated to building IDs.
691
        GeoDataFrame containing building data allocated to MaStR IDs.
692
    """
693
    rng = default_rng(seed=seed)
694
695
    q_buildings_gdf = q_buildings_gdf.assign(gens_id=np.nan).sort_values(
696
        by=["ags", "quant"]
697
    )
698
    q_mastr_gdf = q_mastr_gdf.assign(building_id=np.nan).sort_values(
699
        by=["ags", "quant"]
700
    )
701
702
    ags_list = q_buildings_gdf.ags.unique()
703
704
    if TEST_RUN:
705
        ags_list = ags_list[:250]
706
707
    num_ags = len(ags_list)
708
709
    t0 = perf_counter()
710
711
    for count, ags in enumerate(ags_list):
712
713
        buildings = q_buildings_gdf.loc[q_buildings_gdf.ags == ags]
714
        gens = q_mastr_gdf.loc[q_mastr_gdf.ags == ags]
715
716
        len_build = len(buildings)
717
        len_gens = len(gens)
718
719
        if len_build < len_gens:
720
            gens = gens.sample(len_build, random_state=RandomState(seed=seed))
721
            logger.error(
722
                f"There are {len_gens} generators and only {len_build}"
723
                f" buildings in AGS {ags}. {len_gens - len(gens)} "
724
                "generators were truncated to match the amount of buildings."
725
            )
726
727
            assert len_build == len(gens)
728
729
        for quant in gens.quant.unique():
730
            q_buildings = buildings.loc[buildings.quant == quant]
731
            q_gens = gens.loc[gens.quant == quant]
732
733
            len_build = len(q_buildings)
734
            len_gens = len(q_gens)
735
736
            if len_build < len_gens:
737
                delta = len_gens - len_build
738
739
                logger.warning(
740
                    f"There are {len_gens} generators and only {len_build} "
741
                    f"buildings in AGS {ags} and quantile {quant}. {delta} "
742
                    f"buildings from AGS {ags} will be added randomly."
743
                )
744
745
                add_buildings = pd.Index(
746
                    rng.choice(
747
                        list(set(buildings.index) - set(q_buildings.index)),
748
                        size=delta,
749
                        replace=False,
750
                    )
751
                )
752
753
                chosen_buildings = q_buildings.index.append(add_buildings)
754
755
            else:
756
                chosen_buildings = rng.choice(
757
                    q_buildings.index,
758
                    size=len_gens,
759
                    replace=False,
760
                )
761
762
            q_buildings_gdf.loc[chosen_buildings, "gens_id"] = q_gens.index
763
            buildings = buildings.drop(chosen_buildings)
764
765
        if count % 500 == 0:
766
            logger.debug(
767
                f"Allocation of {count / num_ags * 100:g} % of AGS done. "
768
                f"It took {perf_counter() - t0:g} seconds."
769
            )
770
771
            t0 = perf_counter()
772
773
    assigned_buildings = q_buildings_gdf.loc[~q_buildings_gdf.gens_id.isna()]
774
775
    assert len(assigned_buildings) == len(assigned_buildings.gens_id.unique())
776
777
    q_mastr_gdf.loc[
778
        assigned_buildings.gens_id, "building_id"
779
    ] = assigned_buildings.index
780
781
    assigned_gens = q_mastr_gdf.loc[~q_mastr_gdf.building_id.isna()]
782
783
    assert len(assigned_buildings) == len(assigned_gens)
784
785
    logger.debug("Allocated status quo generators to buildings.")
786
787
    return frame_to_numeric(q_mastr_gdf), frame_to_numeric(q_buildings_gdf)
788
789
790
def frame_to_numeric(
791
    df: pd.DataFrame | gpd.GeoDataFrame,
792
) -> pd.DataFrame | gpd.GeoDataFrame:
793
    """
794
    Try to convert all columns of a DataFrame to numeric ignoring errors.
795
    Parameters
796
    ----------
797
    df : pandas.DataFrame or geopandas.GeoDataFrame
798
    Returns
799
    -------
800
    pandas.DataFrame or geopandas.GeoDataFrame
801
    """
802
    if str(df.index.dtype) == "object":
803
        df.index = pd.to_numeric(df.index, errors="ignore")
804
805
    for col in df.columns:
806
        if str(df[col].dtype) == "object":
807
            df[col] = pd.to_numeric(df[col], errors="ignore")
808
809
    return df
810
811
812
def validate_output(
813
    desagg_mastr_gdf: pd.DataFrame | gpd.GeoDataFrame,
814
    desagg_buildings_gdf: pd.DataFrame | gpd.GeoDataFrame,
815
) -> None:
816
    """
817
    Validate output.
818
819
    * Validate that there are exactly as many buildings with a pv system as
820
      there are pv systems with a building
821
    * Validate that the building IDs with a pv system are the same building
822
      IDs as assigned to the pv systems
823
    * Validate that the pv system IDs with a building are the same pv system
824
      IDs as assigned to the buildings
825
826
    Parameters
827
    -----------
828
    desagg_mastr_gdf : geopandas.GeoDataFrame
829
        GeoDataFrame containing MaStR data allocated to building IDs.
830
    desagg_buildings_gdf : geopandas.GeoDataFrame
831
        GeoDataFrame containing building data allocated to MaStR IDs.
832
    """
833
    assert len(
834
        desagg_mastr_gdf.loc[~desagg_mastr_gdf.building_id.isna()]
835
    ) == len(desagg_buildings_gdf.loc[~desagg_buildings_gdf.gens_id.isna()])
836
    assert (
837
        np.sort(
838
            desagg_mastr_gdf.loc[
839
                ~desagg_mastr_gdf.building_id.isna()
840
            ].building_id.unique()
841
        )
842
        == np.sort(
843
            desagg_buildings_gdf.loc[
844
                ~desagg_buildings_gdf.gens_id.isna()
845
            ].index.unique()
846
        )
847
    ).all()
848
    assert (
849
        np.sort(
850
            desagg_mastr_gdf.loc[
851
                ~desagg_mastr_gdf.building_id.isna()
852
            ].index.unique()
853
        )
854
        == np.sort(
855
            desagg_buildings_gdf.loc[
856
                ~desagg_buildings_gdf.gens_id.isna()
857
            ].gens_id.unique()
858
        )
859
    ).all()
860
861
    logger.debug("Validated output.")
862
863
864
def drop_unallocated_gens(
865
    gdf: gpd.GeoDataFrame,
866
) -> gpd.GeoDataFrame:
867
    """
868
    Drop generators which did not get allocated.
869
870
    Parameters
871
    -----------
872
    gdf : geopandas.GeoDataFrame
873
        GeoDataFrame containing MaStR data allocated to building IDs.
874
    Returns
875
    -------
876
    geopandas.GeoDataFrame
877
        GeoDataFrame containing MaStR data with generators dropped which did
878
        not get allocated.
879
    """
880
    init_len = len(gdf)
881
    gdf = gdf.loc[~gdf.building_id.isna()]
882
    end_len = len(gdf)
883
884
    logger.debug(
885
        f"Dropped {init_len - end_len} "
886
        f"({((init_len - end_len) / init_len) * 100:g}%)"
887
        f" of {init_len} unallocated rows from MaStR DataFrame."
888
    )
889
890
    return gdf
891
892
893
@timer_func
894
def allocate_to_buildings(
895
    mastr_gdf: gpd.GeoDataFrame,
896
    buildings_gdf: gpd.GeoDataFrame,
897
) -> tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
898
    """
899
    Allocate status quo pv rooftop generators to buildings.
900
    Parameters
901
    -----------
902
    mastr_gdf : geopandas.GeoDataFrame
903
        GeoDataFrame containing MaStR data with geocoded locations.
904
    buildings_gdf : geopandas.GeoDataFrame
905
        GeoDataFrame containing OSM buildings data with buildings without an
906
        AGS ID dropped.
907
    Returns
908
    -------
909
    tuple with two geopandas.GeoDataFrame s
910
        GeoDataFrame containing MaStR data allocated to building IDs.
911
        GeoDataFrame containing building data allocated to MaStR IDs.
912
    """
913
    logger.debug("Starting allocation of status quo.")
914
915
    q_mastr_gdf = sort_and_qcut_df(mastr_gdf, col="capacity", q=Q)
916
    q_buildings_gdf = sort_and_qcut_df(buildings_gdf, col="building_area", q=Q)
917
918
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_pv(
919
        q_mastr_gdf, q_buildings_gdf, SEED
920
    )
921
922
    validate_output(desagg_mastr_gdf, desagg_buildings_gdf)
923
924
    return drop_unallocated_gens(desagg_mastr_gdf), desagg_buildings_gdf
925
926
927
@timer_func
928
def grid_districts(
929
    epsg: int,
930
) -> gpd.GeoDataFrame:
931
    """
932
    Load mv grid district geo data from eGo^n Database as
933
    geopandas.GeoDataFrame.
934
    Parameters
935
    -----------
936
    epsg : int
937
        EPSG ID to use as CRS.
938
    Returns
939
    -------
940
    geopandas.GeoDataFrame
941
        GeoDataFrame containing mv grid district ID and geo shapes data.
942
    """
943
    gdf = db.select_geodataframe(
944
        """
945
        SELECT bus_id, geom
946
        FROM grid.egon_mv_grid_district
947
        ORDER BY bus_id
948
        """,
949
        index_col="bus_id",
950
        geom_col="geom",
951
        epsg=epsg,
952
    )
953
954
    gdf.index = gdf.index.astype(int)
955
956
    logger.debug("Grid districts loaded.")
957
958
    return gdf
959
960
961
def scenario_data(
962
    carrier: str = "solar_rooftop",
963
    scenario: str = "eGon2035",
964
) -> pd.DataFrame:
965
    """
966
    Get scenario capacity data from eGo^n Database.
967
    Parameters
968
    -----------
969
    carrier : str
970
        Carrier type to filter table by.
971
    scenario : str
972
        Scenario to filter table by.
973
    Returns
974
    -------
975
    geopandas.GeoDataFrame
976
        GeoDataFrame with scenario capacity data in GW.
977
    """
978
    with db.session_scope() as session:
979
        query = session.query(EgonScenarioCapacities).filter(
980
            EgonScenarioCapacities.carrier == carrier,
981
            EgonScenarioCapacities.scenario_name == scenario,
982
        )
983
984
    df = pd.read_sql(
985
        query.statement, query.session.bind, index_col="index"
986
    ).sort_index()
987
988
    logger.debug("Scenario capacity data loaded.")
989
990
    return df
991
992
993 View Code Duplication
class Vg250Lan(Base):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
994
    __tablename__ = "vg250_lan"
995
    __table_args__ = {"schema": "boundaries"}
996
997
    id = Column(BigInteger, primary_key=True, index=True)
998
    ade = Column(BigInteger)
999
    gf = Column(BigInteger)
1000
    bsg = Column(BigInteger)
1001
    ars = Column(String)
1002
    ags = Column(String)
1003
    sdv_ars = Column(String)
1004
    gen = Column(String)
1005
    bez = Column(String)
1006
    ibz = Column(BigInteger)
1007
    bem = Column(String)
1008
    nbd = Column(String)
1009
    sn_l = Column(String)
1010
    sn_r = Column(String)
1011
    sn_k = Column(String)
1012
    sn_v1 = Column(String)
1013
    sn_v2 = Column(String)
1014
    sn_g = Column(String)
1015
    fk_s3 = Column(String)
1016
    nuts = Column(String)
1017
    ars_0 = Column(String)
1018
    ags_0 = Column(String)
1019
    wsk = Column(String)
1020
    debkg_id = Column(String)
1021
    rs = Column(String)
1022
    sdv_rs = Column(String)
1023
    rs_0 = Column(String)
1024
    geometry = Column(Geometry(srid=EPSG), index=True)
1025
1026
1027
def federal_state_data(to_crs: CRS) -> gpd.GeoDataFrame:
1028
    """
1029
    Get feder state data from eGo^n Database.
1030
    Parameters
1031
    -----------
1032
    to_crs : pyproj.crs.crs.CRS
1033
        CRS to transform geometries to.
1034
    Returns
1035
    -------
1036
    geopandas.GeoDataFrame
1037
        GeoDataFrame with federal state data.
1038
    """
1039
    with db.session_scope() as session:
1040
        query = session.query(
1041
            Vg250Lan.id, Vg250Lan.nuts, Vg250Lan.geometry.label("geom")
1042
        )
1043
1044
        gdf = gpd.read_postgis(
1045
            query.statement, session.connection(), index_col="id"
1046
        ).to_crs(to_crs)
1047
1048
    logger.debug("Federal State data loaded.")
1049
1050
    return gdf
1051
1052
1053
@timer_func
1054
def overlay_grid_districts_with_counties(
1055
    mv_grid_district_gdf: gpd.GeoDataFrame,
1056
    federal_state_gdf: gpd.GeoDataFrame,
1057
) -> gpd.GeoDataFrame:
1058
    """
1059
    Calculate the intersections of mv grid districts and counties.
1060
    Parameters
1061
    -----------
1062
    mv_grid_district_gdf : gpd.GeoDataFrame
1063
        GeoDataFrame containing mv grid district ID and geo shapes data.
1064
    federal_state_gdf : gpd.GeoDataFrame
1065
        GeoDataFrame with federal state data.
1066
    Returns
1067
    -------
1068
    geopandas.GeoDataFrame
1069
        GeoDataFrame containing OSM buildings data.
1070
    """
1071
    logger.debug(
1072
        "Calculating intersection overlay between mv grid districts and "
1073
        "counties. This may take a while..."
1074
    )
1075
1076
    gdf = gpd.overlay(
1077
        federal_state_gdf.to_crs(mv_grid_district_gdf.crs),
1078
        mv_grid_district_gdf.reset_index(),
1079
        how="intersection",
1080
        keep_geom_type=True,
1081
    )
1082
1083
    logger.debug("Done!")
1084
1085
    return gdf
1086
1087
1088
@timer_func
1089
def add_overlay_id_to_buildings(
1090
    buildings_gdf: gpd.GeoDataFrame,
1091
    grid_federal_state_gdf: gpd.GeoDataFrame,
1092
) -> gpd.GeoDataFrame:
1093
    """
1094
    Add information about overlay ID to buildings.
1095
    Parameters
1096
    -----------
1097
    buildings_gdf : geopandas.GeoDataFrame
1098
        GeoDataFrame containing OSM buildings data.
1099
    grid_federal_state_gdf : geopandas.GeoDataFrame
1100
        GeoDataFrame with intersection shapes between counties and grid
1101
        districts.
1102
    Returns
1103
    -------
1104
    geopandas.GeoDataFrame
1105
        GeoDataFrame containing OSM buildings data with overlay ID added.
1106
    """
1107
    gdf = (
1108
        buildings_gdf.to_crs(grid_federal_state_gdf.crs)
1109
        .sjoin(
1110
            grid_federal_state_gdf,
1111
            how="left",
1112
            predicate="intersects",
1113
        )
1114
        .rename(columns={"index_right": "overlay_id"})
1115
    )
1116
1117
    logger.debug("Added overlay ID to OSM buildings.")
1118
1119
    return gdf
1120
1121
1122
def drop_buildings_outside_grids(
1123
    buildings_gdf: gpd.GeoDataFrame,
1124
) -> gpd.GeoDataFrame:
1125
    """
1126
    Drop all buildings outside of grid areas.
1127
    Parameters
1128
    -----------
1129
    buildings_gdf : geopandas.GeoDataFrame
1130
        GeoDataFrame containing OSM buildings data.
1131
    Returns
1132
    -------
1133
    gepandas.GeoDataFrame
1134
        GeoDataFrame containing OSM buildings data
1135
        with buildings without an bus ID dropped.
1136
    """
1137
    gdf = buildings_gdf.loc[~buildings_gdf.bus_id.isna()]
1138
1139
    logger.debug(
1140
        f"{len(buildings_gdf) - len(gdf)} "
1141
        f"({(len(buildings_gdf) - len(gdf)) / len(buildings_gdf) * 100:g}%) "
1142
        f"of {len(buildings_gdf)} values are outside of the grid areas "
1143
        "and are therefore dropped."
1144
    )
1145
1146
    return gdf
1147
1148
1149
def cap_per_bus_id(
1150
    scenario: str,
1151
) -> pd.DataFrame:
1152
    """
1153
    Get table with total pv rooftop capacity per grid district.
1154
1155
    Parameters
1156
    -----------
1157
    scenario : str
1158
        Scenario name.
1159
    Returns
1160
    -------
1161
    pandas.DataFrame
1162
        DataFrame with total rooftop capacity per mv grid.
1163
    """
1164
    targets = config.datasets()["solar_rooftop"]["targets"]
1165
1166
    sql = f"""
1167
    SELECT bus as bus_id, control, p_nom as capacity
1168
    FROM {targets['generators']['schema']}.{targets['generators']['table']}
1169
    WHERE carrier = 'solar_rooftop'
1170
    AND scn_name = '{scenario}'
1171
    """
1172
1173
    df = db.select_dataframe(sql, index_col="bus_id")
1174
1175
    return df.loc[df.control != "Slack"]
1176
1177
1178
def determine_end_of_life_gens(
1179
    mastr_gdf: gpd.GeoDataFrame,
1180
    scenario_timestamp: pd.Timestamp,
1181
    pv_rooftop_lifetime: pd.Timedelta,
1182
) -> gpd.GeoDataFrame:
1183
    """
1184
    Determine if an old PV system has reached its end of life.
1185
    Parameters
1186
    -----------
1187
    mastr_gdf : geopandas.GeoDataFrame
1188
        GeoDataFrame containing geocoded MaStR data.
1189
    scenario_timestamp : pandas.Timestamp
1190
        Timestamp at which the scenario takes place.
1191
    pv_rooftop_lifetime : pandas.Timedelta
1192
        Average expected lifetime of PV rooftop systems.
1193
    Returns
1194
    -------
1195
    geopandas.GeoDataFrame
1196
        GeoDataFrame containing geocoded MaStR data and info if the system
1197
        has reached its end of life.
1198
    """
1199
    before = mastr_gdf.capacity.sum()
1200
1201
    mastr_gdf = mastr_gdf.assign(
1202
        age=scenario_timestamp - mastr_gdf.commissioning_date
1203
    )
1204
1205
    mastr_gdf = mastr_gdf.assign(
1206
        end_of_life=pv_rooftop_lifetime < mastr_gdf.age
1207
    )
1208
1209
    after = mastr_gdf.loc[~mastr_gdf.end_of_life].capacity.sum()
1210
1211
    logger.debug(
1212
        f"Determined if pv rooftop systems reached their end of life.\nTotal "
1213
        f"capacity: {before}\nActive capacity: {after}"
1214
    )
1215
1216
    return mastr_gdf
1217
1218
1219
def calculate_max_pv_cap_per_building(
1220
    buildings_gdf: gpd.GeoDataFrame,
1221
    mastr_gdf: gpd.GeoDataFrame,
1222
    pv_cap_per_sq_m: float | int,
1223
    roof_factor: float | int,
1224
) -> gpd.GeoDataFrame:
1225
    """
1226
    Calculate the estimated maximum possible PV capacity per building.
1227
1228
    Parameters
1229
    -----------
1230
    buildings_gdf : geopandas.GeoDataFrame
1231
        GeoDataFrame containing OSM buildings data.
1232
    mastr_gdf : geopandas.GeoDataFrame
1233
        GeoDataFrame containing geocoded MaStR data.
1234
    pv_cap_per_sq_m : float, int
1235
        Average expected, installable PV capacity per square meter.
1236
    roof_factor : float, int
1237
        Average for PV usable roof area share.
1238
    Returns
1239
    -------
1240
    geopandas.GeoDataFrame
1241
        GeoDataFrame containing OSM buildings data with estimated maximum PV
1242
        capacity.
1243
    """
1244
    gdf = (
1245
        buildings_gdf.reset_index()
1246
        .rename(columns={"index": "id"})
1247
        .merge(
1248
            mastr_gdf[
1249
                [
1250
                    "capacity",
1251
                    "end_of_life",
1252
                    "building_id",
1253
                    "orientation_uniform",
1254
                    "orientation_primary",
1255
                    "orientation_primary_angle",
1256
                ]
1257
            ],
1258
            how="left",
1259
            left_on="id",
1260
            right_on="building_id",
1261
        )
1262
        .set_index("id")
1263
        .drop(columns="building_id")
1264
    )
1265
1266
    return gdf.assign(
1267
        max_cap=gdf.building_area.multiply(roof_factor * pv_cap_per_sq_m),
1268
        end_of_life=gdf.end_of_life.fillna(True).astype(bool),
1269
        bus_id=gdf.bus_id.astype(int),
1270
    )
1271
1272
1273
def calculate_building_load_factor(
1274
    mastr_gdf: gpd.GeoDataFrame,
1275
    buildings_gdf: gpd.GeoDataFrame,
1276
    rounding: int = 4,
1277
) -> gpd.GeoDataFrame:
1278
    """
1279
    Calculate the roof load factor from existing PV systems.
1280
    Parameters
1281
    -----------
1282
    mastr_gdf : geopandas.GeoDataFrame
1283
        GeoDataFrame containing geocoded MaStR data.
1284
    buildings_gdf : geopandas.GeoDataFrame
1285
        GeoDataFrame containing OSM buildings data.
1286
    rounding : int
1287
        Rounding to use for load factor.
1288
    Returns
1289
    -------
1290
    geopandas.GeoDataFrame
1291
        GeoDataFrame containing geocoded MaStR data with calculated load
1292
        factor.
1293
    """
1294
    gdf = mastr_gdf.merge(
1295
        buildings_gdf[["max_cap", "building_area"]]
1296
        .loc[~buildings_gdf["max_cap"].isna()]
1297
        .reset_index(),
1298
        how="left",
1299
        left_on="building_id",
1300
        right_on="id",
1301
    ).set_index("id")
1302
1303
    return gdf.assign(load_factor=(gdf.capacity / gdf.max_cap).round(rounding))
1304
1305
1306
def get_probability_for_property(
1307
    mastr_gdf: gpd.GeoDataFrame,
1308
    cap_range: tuple[int | float, int | float],
1309
    prop: str,
1310
) -> tuple[np.array, np.array]:
1311
    """
1312
    Calculate the probability of the different options of a property of the
1313
    existing PV plants.
1314
    Parameters
1315
    -----------
1316
    mastr_gdf : geopandas.GeoDataFrame
1317
        GeoDataFrame containing geocoded MaStR data.
1318
    cap_range : tuple(int, int)
1319
        Capacity range of PV plants to look at.
1320
    prop : str
1321
        Property to calculate probabilities for. String needs to be in columns
1322
        of mastr_gdf.
1323
    Returns
1324
    -------
1325
    tuple
1326
        numpy.array
1327
            Unique values of property.
1328
        numpy.array
1329
            Probabilties per unique value.
1330
    """
1331
    cap_range_gdf = mastr_gdf.loc[
1332
        (mastr_gdf.capacity > cap_range[0])
1333
        & (mastr_gdf.capacity <= cap_range[1])
1334
    ]
1335
1336
    if prop == "load_factor":
1337
        cap_range_gdf = cap_range_gdf.loc[cap_range_gdf[prop] <= 1]
1338
1339
    count = Counter(
1340
        cap_range_gdf[prop].loc[
1341
            ~cap_range_gdf[prop].isna()
1342
            & ~cap_range_gdf[prop].isnull()
1343
            & ~(cap_range_gdf[prop] == "None")
1344
        ]
1345
    )
1346
1347
    values = np.array(list(count.keys()))
1348
    probabilities = np.fromiter(count.values(), dtype=float)
1349
    probabilities = probabilities / np.sum(probabilities)
1350
1351
    return values, probabilities
1352
1353
1354
@timer_func
1355
def probabilities(
1356
    mastr_gdf: gpd.GeoDataFrame,
1357
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1358
    properties: list[str] | None = None,
1359
) -> dict:
1360
    """
1361
    Calculate the probability of the different options of properties of the
1362
    existing PV plants.
1363
    Parameters
1364
    -----------
1365
    mastr_gdf : geopandas.GeoDataFrame
1366
        GeoDataFrame containing geocoded MaStR data.
1367
    cap_ranges : list(tuple(int, int))
1368
        List of capacity ranges to distinguish between. The first tuple should
1369
        start with a zero and the last one should end with infinite.
1370
    properties : list(str)
1371
        List of properties to calculate probabilities for. Strings need to be
1372
        in columns of mastr_gdf.
1373
    Returns
1374
    -------
1375
    dict
1376
        Dictionary with values and probabilities per capacity range.
1377
    """
1378
    if cap_ranges is None:
1379
        cap_ranges = [
1380
            (0, 30 / 10**3),
1381
            (30 / 10**3, 100 / 10**3),
1382
            (100 / 10**3, float("inf")),
1383
        ]
1384
    if properties is None:
1385
        properties = [
1386
            "orientation_uniform",
1387
            "orientation_primary",
1388
            "orientation_primary_angle",
1389
            "load_factor",
1390
        ]
1391
1392
    prob_dict = {}
1393
1394
    for cap_range in cap_ranges:
1395
        prob_dict[cap_range] = {
1396
            "values": {},
1397
            "probabilities": {},
1398
        }
1399
1400
        for prop in properties:
1401
            v, p = get_probability_for_property(
1402
                mastr_gdf,
1403
                cap_range,
1404
                prop,
1405
            )
1406
1407
            prob_dict[cap_range]["values"][prop] = v
1408
            prob_dict[cap_range]["probabilities"][prop] = p
1409
1410
    return prob_dict
1411
1412
1413
def cap_share_per_cap_range(
1414
    mastr_gdf: gpd.GeoDataFrame,
1415
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1416
) -> dict[tuple[int | float, int | float], float]:
1417
    """
1418
    Calculate the share of PV capacity from the total PV capacity within
1419
    capacity ranges.
1420
1421
    Parameters
1422
    -----------
1423
    mastr_gdf : geopandas.GeoDataFrame
1424
        GeoDataFrame containing geocoded MaStR data.
1425
    cap_ranges : list(tuple(int, int))
1426
        List of capacity ranges to distinguish between. The first tuple should
1427
        start with a zero and the last one should end with infinite.
1428
    Returns
1429
    -------
1430
    dict
1431
        Dictionary with share of PV capacity from the total PV capacity within
1432
        capacity ranges.
1433
    """
1434
    if cap_ranges is None:
1435
        cap_ranges = [
1436
            (0, 30 / 10**3),
1437
            (30 / 10**3, 100 / 10**3),
1438
            (100 / 10**3, float("inf")),
1439
        ]
1440
1441
    cap_share_dict = {}
1442
1443
    total_cap = mastr_gdf.capacity.sum()
1444
1445
    for cap_range in cap_ranges:
1446
        cap_share = (
1447
            mastr_gdf.loc[
1448
                (mastr_gdf.capacity > cap_range[0])
1449
                & (mastr_gdf.capacity <= cap_range[1])
1450
            ].capacity.sum()
1451
            / total_cap
1452
        )
1453
1454
        cap_share_dict[cap_range] = cap_share
1455
1456
    return cap_share_dict
1457
1458
1459
def mean_load_factor_per_cap_range(
1460
    mastr_gdf: gpd.GeoDataFrame,
1461
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1462
) -> dict[tuple[int | float, int | float], float]:
1463
    """
1464
    Calculate the mean roof load factor per capacity range from existing PV
1465
    plants.
1466
    Parameters
1467
    -----------
1468
    mastr_gdf : geopandas.GeoDataFrame
1469
        GeoDataFrame containing geocoded MaStR data.
1470
    cap_ranges : list(tuple(int, int))
1471
        List of capacity ranges to distinguish between. The first tuple should
1472
        start with a zero and the last one should end with infinite.
1473
    Returns
1474
    -------
1475
    dict
1476
        Dictionary with mean roof load factor per capacity range.
1477
    """
1478
    if cap_ranges is None:
1479
        cap_ranges = [
1480
            (0, 30 / 10**3),
1481
            (30 / 10**3, 100 / 10**3),
1482
            (100 / 10**3, float("inf")),
1483
        ]
1484
1485
    load_factor_dict = {}
1486
1487
    for cap_range in cap_ranges:
1488
        load_factor = mastr_gdf.loc[
1489
            (mastr_gdf.load_factor <= 1)
1490
            & (mastr_gdf.capacity > cap_range[0])
1491
            & (mastr_gdf.capacity <= cap_range[1])
1492
        ].load_factor.mean()
1493
1494
        load_factor_dict[cap_range] = load_factor
1495
1496
    return load_factor_dict
1497
1498
1499
def building_area_range_per_cap_range(
1500
    mastr_gdf: gpd.GeoDataFrame,
1501
    cap_ranges: list[tuple[int | float, int | float]] | None = None,
1502
    min_building_size: int | float = 10.0,
1503
    upper_quantile: float = 0.95,
1504
    lower_quantile: float = 0.05,
1505
) -> dict[tuple[int | float, int | float], tuple[int | float, int | float]]:
1506
    """
1507
    Estimate normal building area range per capacity range.
1508
    Calculate the mean roof load factor per capacity range from existing PV
1509
    plants.
1510
    Parameters
1511
    -----------
1512
    mastr_gdf : geopandas.GeoDataFrame
1513
        GeoDataFrame containing geocoded MaStR data.
1514
    cap_ranges : list(tuple(int, int))
1515
        List of capacity ranges to distinguish between. The first tuple should
1516
        start with a zero and the last one should end with infinite.
1517
    min_building_size : int, float
1518
        Minimal building size to consider for PV plants.
1519
    upper_quantile : float
1520
        Upper quantile to estimate maximum building size per capacity range.
1521
    lower_quantile : float
1522
        Lower quantile to estimate minimum building size per capacity range.
1523
    Returns
1524
    -------
1525
    dict
1526
        Dictionary with estimated normal building area range per capacity
1527
        range.
1528
    """
1529
    if cap_ranges is None:
1530
        cap_ranges = [
1531
            (0, 30 / 10**3),
1532
            (30 / 10**3, 100 / 10**3),
1533
            (100 / 10**3, float("inf")),
1534
        ]
1535
1536
    building_area_range_dict = {}
1537
1538
    n_ranges = len(cap_ranges)
1539
1540
    for count, cap_range in enumerate(cap_ranges):
1541
        cap_range_gdf = mastr_gdf.loc[
1542
            (mastr_gdf.capacity > cap_range[0])
1543
            & (mastr_gdf.capacity <= cap_range[1])
1544
        ]
1545
1546
        if count == 0:
1547
            building_area_range_dict[cap_range] = (
1548
                min_building_size,
1549
                cap_range_gdf.building_area.quantile(upper_quantile),
1550
            )
1551
        elif count == n_ranges - 1:
1552
            building_area_range_dict[cap_range] = (
1553
                cap_range_gdf.building_area.quantile(lower_quantile),
1554
                float("inf"),
1555
            )
1556
        else:
1557
            building_area_range_dict[cap_range] = (
1558
                cap_range_gdf.building_area.quantile(lower_quantile),
1559
                cap_range_gdf.building_area.quantile(upper_quantile),
1560
            )
1561
1562
    values = list(building_area_range_dict.values())
1563
1564
    building_area_range_normed_dict = {}
1565
1566
    for count, (cap_range, (min_area, max_area)) in enumerate(
1567
        building_area_range_dict.items()
1568
    ):
1569
        if count == 0:
1570
            building_area_range_normed_dict[cap_range] = (
1571
                min_area,
1572
                np.mean((values[count + 1][0], max_area)),
1573
            )
1574
        elif count == n_ranges - 1:
1575
            building_area_range_normed_dict[cap_range] = (
1576
                np.mean((values[count - 1][1], min_area)),
1577
                max_area,
1578
            )
1579
        else:
1580
            building_area_range_normed_dict[cap_range] = (
1581
                np.mean((values[count - 1][1], min_area)),
1582
                np.mean((values[count + 1][0], max_area)),
1583
            )
1584
1585
    return building_area_range_normed_dict
1586
1587
1588
@timer_func
1589
def desaggregate_pv_in_mv_grid(
1590
    buildings_gdf: gpd.GeoDataFrame,
1591
    pv_cap: float | int,
1592
    **kwargs,
1593
) -> gpd.GeoDataFrame:
1594
    """
1595
    Desaggregate PV capacity on buildings within a given grid district.
1596
    Parameters
1597
    -----------
1598
    buildings_gdf : geopandas.GeoDataFrame
1599
        GeoDataFrame containing buildings within the grid district.
1600
    pv_cap : float, int
1601
        PV capacity to desaggregate.
1602
    Other Parameters
1603
    -----------
1604
    prob_dict : dict
1605
        Dictionary with values and probabilities per capacity range.
1606
    cap_share_dict : dict
1607
        Dictionary with share of PV capacity from the total PV capacity within
1608
        capacity ranges.
1609
    building_area_range_dict : dict
1610
        Dictionary with estimated normal building area range per capacity
1611
        range.
1612
    load_factor_dict : dict
1613
        Dictionary with mean roof load factor per capacity range.
1614
    seed : int
1615
        Seed to use for random operations with NumPy and pandas.
1616
    pv_cap_per_sq_m : float, int
1617
        Average expected, installable PV capacity per square meter.
1618
    Returns
1619
    -------
1620
    geopandas.GeoDataFrame
1621
        GeoDataFrame containing OSM building data with desaggregated PV
1622
        plants.
1623
    """
1624
    bus_id = int(buildings_gdf.bus_id.iat[0])
1625
1626
    rng = default_rng(seed=kwargs["seed"])
1627
    random_state = RandomState(seed=kwargs["seed"])
1628
1629
    results_df = pd.DataFrame(columns=buildings_gdf.columns)
1630
1631
    for cap_range, share in kwargs["cap_share_dict"].items():
1632
        pv_cap_range = pv_cap * share
1633
1634
        b_area_min, b_area_max = kwargs["building_area_range_dict"][cap_range]
1635
1636
        cap_range_buildings_gdf = buildings_gdf.loc[
1637
            ~buildings_gdf.index.isin(results_df.index)
1638
            & (buildings_gdf.building_area > b_area_min)
1639
            & (buildings_gdf.building_area <= b_area_max)
1640
        ]
1641
1642
        mean_load_factor = kwargs["load_factor_dict"][cap_range]
1643
        cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
1644
            mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
1645
            load_factor=np.nan,
1646
            capacity=np.nan,
1647
        )
1648
1649
        total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
1650
1651
        if total_mean_cap == 0:
1652
            logger.warning(
1653
                f"There are no matching roof for capacity range {cap_range} "
1654
                f"kW in grid {bus_id}. Using all buildings as fallback."
1655
            )
1656
1657
            cap_range_buildings_gdf = buildings_gdf.loc[
1658
                ~buildings_gdf.index.isin(results_df.index)
1659
            ]
1660
1661
            if len(cap_range_buildings_gdf) == 0:
1662
                logger.warning(
1663
                    "There are no roofes available for capacity range "
1664
                    f"{cap_range} kW in grid {bus_id}. Allowing dual use."
1665
                )
1666
                cap_range_buildings_gdf = buildings_gdf.copy()
1667
1668
            cap_range_buildings_gdf = cap_range_buildings_gdf.assign(
1669
                mean_cap=cap_range_buildings_gdf.max_cap * mean_load_factor,
1670
                load_factor=np.nan,
1671
                capacity=np.nan,
1672
            )
1673
1674
            total_mean_cap = cap_range_buildings_gdf.mean_cap.sum()
1675
1676
        elif total_mean_cap < pv_cap_range:
1677
            logger.warning(
1678
                f"Average roof utilization of the roof area in grid {bus_id} "
1679
                f"and capacity range {cap_range} kW is not sufficient. The "
1680
                "roof utilization will be above average."
1681
            )
1682
1683
        frac = max(
1684
            pv_cap_range / total_mean_cap,
1685
            1 / len(cap_range_buildings_gdf),
1686
        )
1687
1688
        samples_gdf = cap_range_buildings_gdf.sample(
1689
            frac=min(1, frac),
1690
            random_state=random_state,
1691
        )
1692
1693
        cap_range_dict = kwargs["prob_dict"][cap_range]
1694
1695
        values_dict = cap_range_dict["values"]
1696
        p_dict = cap_range_dict["probabilities"]
1697
1698
        load_factors = rng.choice(
1699
            a=values_dict["load_factor"],
1700
            size=len(samples_gdf),
1701
            p=p_dict["load_factor"],
1702
        )
1703
1704
        samples_gdf = samples_gdf.assign(
1705
            load_factor=load_factors,
1706
            capacity=(
1707
                samples_gdf.building_area
1708
                * load_factors
1709
                * kwargs["pv_cap_per_sq_m"]
1710
            ).clip(lower=0.4),
1711
        )
1712
1713
        missing_factor = pv_cap_range / samples_gdf.capacity.sum()
1714
1715
        samples_gdf = samples_gdf.assign(
1716
            capacity=(samples_gdf.capacity * missing_factor),
1717
            load_factor=(samples_gdf.load_factor * missing_factor),
1718
        )
1719
1720
        assert np.isclose(
1721
            samples_gdf.capacity.sum(),
1722
            pv_cap_range,
1723
            rtol=1e-03,
1724
        ), f"{samples_gdf.capacity.sum()} != {pv_cap_range}"
1725
1726
        results_df = pd.concat(
1727
            [
1728
                results_df,
1729
                samples_gdf,
1730
            ],
1731
        )
1732
1733
    total_missing_factor = pv_cap / results_df.capacity.sum()
1734
1735
    results_df = results_df.assign(
1736
        capacity=(results_df.capacity * total_missing_factor),
1737
    )
1738
1739
    assert np.isclose(
1740
        results_df.capacity.sum(),
1741
        pv_cap,
1742
        rtol=1e-03,
1743
    ), f"{results_df.capacity.sum()} != {pv_cap}"
1744
1745
    return gpd.GeoDataFrame(
1746
        results_df,
1747
        crs=samples_gdf.crs,
0 ignored issues
show
introduced by
The variable samples_gdf does not seem to be defined in case the for loop on line 1631 is not entered. Are you sure this can never be the case?
Loading history...
1748
        geometry="geom",
1749
    )
1750
1751
1752
@timer_func
1753
def desaggregate_pv(
1754
    buildings_gdf: gpd.GeoDataFrame,
1755
    cap_df: pd.DataFrame,
1756
    **kwargs,
1757
) -> gpd.GeoDataFrame:
1758
    """
1759
    Desaggregate PV capacity on buildings within a given grid district.
1760
1761
    Parameters
1762
    -----------
1763
    buildings_gdf : geopandas.GeoDataFrame
1764
        GeoDataFrame containing OSM buildings data.
1765
    cap_df : pandas.DataFrame
1766
        DataFrame with total rooftop capacity per mv grid.
1767
    Other Parameters
1768
    -----------
1769
    prob_dict : dict
1770
        Dictionary with values and probabilities per capacity range.
1771
    cap_share_dict : dict
1772
        Dictionary with share of PV capacity from the total PV capacity within
1773
        capacity ranges.
1774
    building_area_range_dict : dict
1775
        Dictionary with estimated normal building area range per capacity
1776
        range.
1777
    load_factor_dict : dict
1778
        Dictionary with mean roof load factor per capacity range.
1779
    seed : int
1780
        Seed to use for random operations with NumPy and pandas.
1781
    pv_cap_per_sq_m : float, int
1782
        Average expected, installable PV capacity per square meter.
1783
    Returns
1784
    -------
1785
    geopandas.GeoDataFrame
1786
        GeoDataFrame containing OSM building data with desaggregated PV
1787
        plants.
1788
    """
1789
    allocated_buildings_gdf = buildings_gdf.loc[~buildings_gdf.end_of_life]
1790
1791
    building_bus_ids = set(buildings_gdf.bus_id)
1792
    cap_bus_ids = set(cap_df.index)
1793
1794
    logger.debug(
1795
        f"Bus IDs from buildings: {len(building_bus_ids)}\nBus IDs from "
1796
        f"capacity: {len(cap_bus_ids)}"
1797
    )
1798
1799
    if len(building_bus_ids) > len(cap_bus_ids):
1800
        missing = building_bus_ids - cap_bus_ids
1801
    else:
1802
        missing = cap_bus_ids - building_bus_ids
1803
1804
    logger.debug(str(missing))
1805
1806
    bus_ids = np.intersect1d(list(building_bus_ids), list(cap_bus_ids))
1807
1808
    # assert set(buildings_gdf.bus_id.unique()) == set(cap_df.index)
1809
1810
    for bus_id in bus_ids:
1811
        buildings_grid_gdf = buildings_gdf.loc[buildings_gdf.bus_id == bus_id]
1812
1813
        pv_installed_gdf = buildings_grid_gdf.loc[
1814
            ~buildings_grid_gdf.end_of_life
1815
        ]
1816
1817
        pv_installed = pv_installed_gdf.capacity.sum()
1818
1819
        pot_buildings_gdf = buildings_grid_gdf.drop(
1820
            index=pv_installed_gdf.index
1821
        )
1822
1823
        if len(pot_buildings_gdf) == 0:
1824
            logger.error(
1825
                f"In grid {bus_id} there are no potential buildings to "
1826
                f"allocate PV capacity to. The grid is skipped. This message "
1827
                f"should only appear doing test runs with few buildings."
1828
            )
1829
1830
            continue
1831
1832
        pv_target = cap_df.at[bus_id, "capacity"]
1833
1834
        logger.debug(f"pv_target: {pv_target}")
1835
1836
        pv_missing = pv_target - pv_installed
1837
1838
        if pv_missing <= 0:
1839
            logger.warning(
1840
                f"In grid {bus_id} there is more PV installed "
1841
                f"({pv_installed: g} kW) in status Quo than allocated within "
1842
                f"the scenario ({pv_target: g} kW). "
1843
                f"No new generators are added."
1844
            )
1845
1846
            continue
1847
1848
        if pot_buildings_gdf.max_cap.sum() < pv_missing:
1849
            logger.error(
1850
                f"In grid {bus_id} there is less PV potential ("
1851
                f"{pot_buildings_gdf.max_cap.sum():g} MW) than allocated PV "
1852
                f"capacity ({pv_missing:g} MW). The average roof utilization "
1853
                f"will be very high."
1854
            )
1855
1856
        gdf = desaggregate_pv_in_mv_grid(
1857
            buildings_gdf=pot_buildings_gdf,
1858
            pv_cap=pv_missing,
1859
            **kwargs,
1860
        )
1861
1862
        logger.debug(f"New cap in grid {bus_id}: {gdf.capacity.sum()}")
1863
        logger.debug(f"Installed cap in grid {bus_id}: {pv_installed}")
1864
        logger.debug(
1865
            f"Total cap in grid {bus_id}: {gdf.capacity.sum() + pv_installed}"
1866
        )
1867
1868
        if not np.isclose(
1869
            gdf.capacity.sum() + pv_installed, pv_target, rtol=1e-3
1870
        ):
1871
            logger.warning(
1872
                f"The desired capacity and actual capacity in grid {bus_id} "
1873
                f"differ.\n"
1874
                f"Desired cap: {pv_target}\nActual cap: "
1875
                f"{gdf.capacity.sum() + pv_installed}"
1876
            )
1877
1878
        pre_cap = allocated_buildings_gdf.capacity.sum()
1879
        new_cap = gdf.capacity.sum()
1880
1881
        allocated_buildings_gdf = pd.concat(
1882
            [
1883
                allocated_buildings_gdf,
1884
                gdf,
1885
            ]
1886
        )
1887
1888
        total_cap = allocated_buildings_gdf.capacity.sum()
1889
1890
        assert np.isclose(pre_cap + new_cap, total_cap)
1891
1892
    logger.debug("Desaggregated scenario.")
1893
    logger.debug(f"Scenario capacity: {cap_df.capacity.sum(): g}")
1894
    logger.debug(
1895
        f"Generator capacity: " f"{allocated_buildings_gdf.capacity.sum(): g}"
1896
    )
1897
1898
    return gpd.GeoDataFrame(
1899
        allocated_buildings_gdf,
1900
        crs=gdf.crs,
0 ignored issues
show
introduced by
The variable gdf does not seem to be defined for all execution paths.
Loading history...
1901
        geometry="geom",
1902
    )
1903
1904
1905
@timer_func
1906
def add_buildings_meta_data(
1907
    buildings_gdf: gpd.GeoDataFrame,
1908
    prob_dict: dict,
1909
    seed: int,
1910
) -> gpd.GeoDataFrame:
1911
    """
1912
    Randomly add additional metadata to desaggregated PV plants.
1913
    Parameters
1914
    -----------
1915
    buildings_gdf : geopandas.GeoDataFrame
1916
        GeoDataFrame containing OSM buildings data with desaggregated PV
1917
        plants.
1918
    prob_dict : dict
1919
        Dictionary with values and probabilities per capacity range.
1920
    seed : int
1921
        Seed to use for random operations with NumPy and pandas.
1922
    Returns
1923
    -------
1924
    geopandas.GeoDataFrame
1925
        GeoDataFrame containing OSM building data with desaggregated PV
1926
        plants.
1927
    """
1928
    rng = default_rng(seed=seed)
1929
    buildings_gdf = buildings_gdf.reset_index().rename(
1930
        columns={
1931
            "index": "building_id",
1932
        }
1933
    )
1934
1935
    for (min_cap, max_cap), cap_range_prob_dict in prob_dict.items():
1936
        cap_range_gdf = buildings_gdf.loc[
1937
            (buildings_gdf.capacity >= min_cap)
1938
            & (buildings_gdf.capacity < max_cap)
1939
        ]
1940
1941
        for key, values in cap_range_prob_dict["values"].items():
1942
            if key == "load_factor":
1943
                continue
1944
1945
            gdf = cap_range_gdf.loc[
1946
                cap_range_gdf[key].isna()
1947
                | cap_range_gdf[key].isnull()
1948
                | (cap_range_gdf[key] == "None")
1949
            ]
1950
1951
            key_vals = rng.choice(
1952
                a=values,
1953
                size=len(gdf),
1954
                p=cap_range_prob_dict["probabilities"][key],
1955
            )
1956
1957
            buildings_gdf.loc[gdf.index, key] = key_vals
1958
1959
    return buildings_gdf
1960
1961
1962
def add_commissioning_date(
1963
    buildings_gdf: gpd.GeoDataFrame,
1964
    start: pd.Timestamp,
1965
    end: pd.Timestamp,
1966
    seed: int,
1967
):
1968
    """
1969
    Randomly and linear add start-up date to new pv generators.
1970
    Parameters
1971
    ----------
1972
    buildings_gdf : geopandas.GeoDataFrame
1973
        GeoDataFrame containing OSM buildings data with desaggregated PV
1974
        plants.
1975
    start : pandas.Timestamp
1976
        Minimum Timestamp to use.
1977
    end : pandas.Timestamp
1978
        Maximum Timestamp to use.
1979
    seed : int
1980
        Seed to use for random operations with NumPy and pandas.
1981
    Returns
1982
    -------
1983
    geopandas.GeoDataFrame
1984
        GeoDataFrame containing OSM buildings data with start-up date added.
1985
    """
1986
    rng = default_rng(seed=seed)
1987
1988
    date_range = pd.date_range(start=start, end=end, freq="1D")
1989
1990
    return buildings_gdf.assign(
1991
        commissioning_date=rng.choice(date_range, size=len(buildings_gdf))
1992
    )
1993
1994
1995
@timer_func
1996
def allocate_scenarios(
1997
    mastr_gdf: gpd.GeoDataFrame,
1998
    valid_buildings_gdf: gpd.GeoDataFrame,
1999
    last_scenario_gdf: gpd.GeoDataFrame,
2000
    scenario: str,
2001
):
2002
    """
2003
    Desaggregate and allocate scenario pv rooftop ramp-ups onto buildings.
2004
    Parameters
2005
    ----------
2006
    mastr_gdf : geopandas.GeoDataFrame
2007
        GeoDataFrame containing geocoded MaStR data.
2008
    valid_buildings_gdf : geopandas.GeoDataFrame
2009
        GeoDataFrame containing OSM buildings data.
2010
    last_scenario_gdf : geopandas.GeoDataFrame
2011
        GeoDataFrame containing OSM buildings matched with pv generators from
2012
        temporally preceding scenario.
2013
    scenario : str
2014
        Scenario to desaggrgate and allocate.
2015
    Returns
2016
    -------
2017
    tuple
2018
        geopandas.GeoDataFrame
2019
            GeoDataFrame containing OSM buildings matched with pv generators.
2020
        pandas.DataFrame
2021
            DataFrame containing pv rooftop capacity per grid id.
2022
    """
2023
    cap_per_bus_id_df = cap_per_bus_id(scenario)
2024
2025
    logger.debug(
2026
        f"cap_per_bus_id_df total capacity: {cap_per_bus_id_df.capacity.sum()}"
2027
    )
2028
2029
    last_scenario_gdf = determine_end_of_life_gens(
2030
        last_scenario_gdf,
2031
        SCENARIO_TIMESTAMP[scenario],
2032
        PV_ROOFTOP_LIFETIME,
2033
    )
2034
2035
    buildings_gdf = calculate_max_pv_cap_per_building(
2036
        valid_buildings_gdf,
2037
        last_scenario_gdf,
2038
        PV_CAP_PER_SQ_M,
2039
        ROOF_FACTOR,
2040
    )
2041
2042
    mastr_gdf = calculate_building_load_factor(
2043
        mastr_gdf,
2044
        buildings_gdf,
2045
    )
2046
2047
    probabilities_dict = probabilities(
2048
        mastr_gdf,
2049
        cap_ranges=CAP_RANGES,
2050
    )
2051
2052
    cap_share_dict = cap_share_per_cap_range(
2053
        mastr_gdf,
2054
        cap_ranges=CAP_RANGES,
2055
    )
2056
2057
    load_factor_dict = mean_load_factor_per_cap_range(
2058
        mastr_gdf,
2059
        cap_ranges=CAP_RANGES,
2060
    )
2061
2062
    building_area_range_dict = building_area_range_per_cap_range(
2063
        mastr_gdf,
2064
        cap_ranges=CAP_RANGES,
2065
        min_building_size=MIN_BUILDING_SIZE,
2066
        upper_quantile=UPPER_QUANTILE,
2067
        lower_quantile=LOWER_QUANTILE,
2068
    )
2069
2070
    allocated_buildings_gdf = desaggregate_pv(
2071
        buildings_gdf=buildings_gdf,
2072
        cap_df=cap_per_bus_id_df,
2073
        prob_dict=probabilities_dict,
2074
        cap_share_dict=cap_share_dict,
2075
        building_area_range_dict=building_area_range_dict,
2076
        load_factor_dict=load_factor_dict,
2077
        seed=SEED,
2078
        pv_cap_per_sq_m=PV_CAP_PER_SQ_M,
2079
    )
2080
2081
    allocated_buildings_gdf = allocated_buildings_gdf.assign(scenario=scenario)
2082
2083
    meta_buildings_gdf = frame_to_numeric(
2084
        add_buildings_meta_data(
2085
            allocated_buildings_gdf,
2086
            probabilities_dict,
2087
            SEED,
2088
        )
2089
    )
2090
2091
    return (
2092
        add_commissioning_date(
2093
            meta_buildings_gdf,
2094
            start=last_scenario_gdf.commissioning_date.max(),
2095
            end=SCENARIO_TIMESTAMP[scenario],
2096
            seed=SEED,
2097
        ),
2098
        cap_per_bus_id_df,
2099
    )
2100
2101
2102
class EgonPowerPlantPvRoofBuilding(Base):
2103
    __tablename__ = "egon_power_plants_pv_roof_building"
2104
    __table_args__ = {"schema": "supply"}
2105
2106
    index = Column(Integer, primary_key=True, index=True)
2107
    scenario = Column(String)
2108
    bus_id = Column(Integer, nullable=True)
2109
    building_id = Column(Integer)
2110
    gens_id = Column(String, nullable=True)
2111
    capacity = Column(Float)
2112
    orientation_uniform = Column(Float)
2113
    orientation_primary = Column(String)
2114
    orientation_primary_angle = Column(String)
2115
    voltage_level = Column(Integer)
2116
    weather_cell_id = Column(Integer)
2117
2118
2119
def add_metadata():
2120
    schema = "supply"
2121
    table = "egon_power_plants_pv_roof_building"
2122
    name = f"{schema}.{table}"
2123
    deposit_id_mastr = config.datasets()["mastr_new"]["deposit_id"]
2124
    deposit_id_data_bundle = config.datasets()["data-bundle"]["sources"][
2125
        "zenodo"
2126
    ]["deposit_id"]
2127
2128
    contris = contributors(["kh", "kh"])
2129
2130
    contris[0]["date"] = "2023-03-16"
2131
2132
    contris[0]["object"] = "metadata"
2133
    contris[1]["object"] = "dataset"
2134
2135
    contris[0]["comment"] = "Add metadata to dataset."
2136
    contris[1]["comment"] = "Add workflow to generate dataset."
2137
2138
    meta = {
2139
        "name": name,
2140
        "title": "eGon power plants rooftop solar",
2141
        "id": "WILL_BE_SET_AT_PUBLICATION",
2142
        "description": (
2143
            "eGon power plants rooftop solar systems allocated to buildings"
2144
        ),
2145
        "language": "en-US",
2146
        "keywords": ["photovoltaik", "solar", "pv", "mastr", "status quo"],
2147
        "publicationDate": datetime.date.today().isoformat(),
2148
        "context": context(),
2149
        "spatial": {
2150
            "location": "none",
2151
            "extent": "Germany",
2152
            "resolution": "building",
2153
        },
2154
        "temporal": {
2155
            "referenceDate": (
2156
                config.datasets()["mastr_new"]["egon2021_date_max"].split(" ")[
2157
                    0
2158
                ]
2159
            ),
2160
            "timeseries": {},
2161
        },
2162
        "sources": [
2163
            {
2164
                "title": "Data bundle for egon-data",
2165
                "description": (
2166
                    "Data bundle for egon-data: A transparent and "
2167
                    "reproducible data processing pipeline for energy "
2168
                    "system modeling"
2169
                ),
2170
                "path": (
2171
                    "https://sandbox.zenodo.org/record/"
2172
                    f"{deposit_id_data_bundle}#.Y_dWM4CZMVM"
2173
                ),
2174
                "licenses": [license_dedl(attribution="© Cußmann, Ilka")],
2175
            },
2176
            {
2177
                "title": ("open-MaStR power unit registry for eGo^n project"),
2178
                "description": (
2179
                    "Data from Marktstammdatenregister (MaStR) data using "
2180
                    "the data dump from 2022-11-17 for eGon-data."
2181
                ),
2182
                "path": (
2183
                    f"https://sandbox.zenodo.org/record/{deposit_id_mastr}"
2184
                ),
2185
                "licenses": [license_dedl(attribution="© Amme, Jonathan")],
2186
            },
2187
            sources()["openstreetmap"],
2188
            sources()["era5"],
2189
            sources()["vg250"],
2190
            sources()["egon-data"],
2191
        ],
2192
        "licenses": [license_odbl("© eGon development team")],
2193
        "contributors": contris,
2194
        "resources": [
2195
            {
2196
                "profile": "tabular-data-resource",
2197
                "name": name,
2198
                "path": "None",
2199
                "format": "PostgreSQL",
2200
                "encoding": "UTF-8",
2201
                "schema": {
2202
                    "fields": generate_resource_fields_from_db_table(
2203
                        schema,
2204
                        table,
2205
                    ),
2206
                    "primaryKey": "index",
2207
                },
2208
                "dialect": {"delimiter": "", "decimalSeparator": ""},
2209
            }
2210
        ],
2211
        "review": {"path": "", "badge": ""},
2212
        "metaMetadata": meta_metadata(),
2213
        "_comment": {
2214
            "metadata": (
2215
                "Metadata documentation and explanation (https://github."
2216
                "com/OpenEnergyPlatform/oemetadata/blob/master/metadata/"
2217
                "v141/metadata_key_description.md)"
2218
            ),
2219
            "dates": (
2220
                "Dates and time must follow the ISO8601 including time "
2221
                "zone (YYYY-MM-DD or YYYY-MM-DDThh:mm:ss±hh)"
2222
            ),
2223
            "units": "Use a space between numbers and units (100 m)",
2224
            "languages": (
2225
                "Languages must follow the IETF (BCP47) format (en-GB, "
2226
                "en-US, de-DE)"
2227
            ),
2228
            "licenses": (
2229
                "License name must follow the SPDX License List "
2230
                "(https://spdx.org/licenses/)"
2231
            ),
2232
            "review": (
2233
                "Following the OEP Data Review (https://github.com/"
2234
                "OpenEnergyPlatform/data-preprocessing/wiki)"
2235
            ),
2236
            "none": "If not applicable use (none)",
2237
        },
2238
    }
2239
2240
    dialect = get_dialect(oep_metadata_version())()
2241
2242
    meta = dialect.compile_and_render(dialect.parse(json.dumps(meta)))
2243
2244
    db.submit_comment(
2245
        f"'{json.dumps(meta)}'",
2246
        schema,
2247
        table,
2248
    )
2249
2250
2251
def create_scenario_table(buildings_gdf):
2252
    """Create mapping table pv_unit <-> building for scenario"""
2253
    EgonPowerPlantPvRoofBuilding.__table__.drop(bind=engine, checkfirst=True)
2254
    EgonPowerPlantPvRoofBuilding.__table__.create(bind=engine, checkfirst=True)
2255
2256
    buildings_gdf[COLS_TO_EXPORT].reset_index().to_sql(
2257
        name=EgonPowerPlantPvRoofBuilding.__table__.name,
2258
        schema=EgonPowerPlantPvRoofBuilding.__table__.schema,
2259
        con=db.engine(),
2260
        if_exists="append",
2261
        index=False,
2262
    )
2263
2264
    add_metadata()
2265
2266
2267
def add_weather_cell_id(buildings_gdf: gpd.GeoDataFrame) -> gpd.GeoDataFrame:
2268
    sql = """
2269
    SELECT building_id, zensus_population_id
2270
    FROM boundaries.egon_map_zensus_mvgd_buildings
2271
    """
2272
2273
    buildings_gdf = buildings_gdf.merge(
2274
        right=db.select_dataframe(sql).drop_duplicates(subset="building_id"),
2275
        how="left",
2276
        on="building_id",
2277
    )
2278
2279
    sql = """
2280
    SELECT zensus_population_id, w_id as weather_cell_id
2281
    FROM boundaries.egon_map_zensus_weather_cell
2282
    """
2283
2284
    buildings_gdf = buildings_gdf.merge(
2285
        right=db.select_dataframe(sql).drop_duplicates(
2286
            subset="zensus_population_id"
2287
        ),
2288
        how="left",
2289
        on="zensus_population_id",
2290
    )
2291
2292
    if buildings_gdf.weather_cell_id.isna().any():
2293
        missing = buildings_gdf.loc[
2294
            buildings_gdf.weather_cell_id.isna(), "building_id"
2295
        ].tolist()
2296
2297
        raise ValueError(
2298
            f"Following buildings don't have a weather cell id: {missing}"
2299
        )
2300
2301
    return buildings_gdf
2302
2303
2304
def add_bus_ids_sq(
2305
    buildings_gdf: gpd.GeoDataFrame,
2306
) -> gpd.GeoDataFrame:
2307
    """Add bus ids for status_quo units
2308
2309
    Parameters
2310
    -----------
2311
    buildings_gdf : geopandas.GeoDataFrame
2312
        GeoDataFrame containing OSM buildings data with desaggregated PV
2313
        plants.
2314
    Returns
2315
    -------
2316
    geopandas.GeoDataFrame
2317
        GeoDataFrame containing OSM building data with bus_id per
2318
        generator.
2319
    """
2320
    grid_districts_gdf = grid_districts(EPSG)
2321
2322
    mask = buildings_gdf.scenario == "status_quo"
2323
    buildings_gdf.loc[mask, "bus_id"] = (
2324
        buildings_gdf.loc[mask]
2325
        .sjoin(grid_districts_gdf, how="left")
2326
        .index_right
2327
    )
2328
2329
    return buildings_gdf
2330
2331
2332 View Code Duplication
def infer_voltage_level(
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
2333
    units_gdf: gpd.GeoDataFrame,
2334
) -> gpd.GeoDataFrame:
2335
    """
2336
    Infer nan values in voltage level derived from generator capacity to
2337
    the power plants.
2338
2339
    Parameters
2340
    -----------
2341
    units_gdf : geopandas.GeoDataFrame
2342
        GeoDataFrame containing units with voltage levels from MaStR
2343
    Returnsunits_gdf: gpd.GeoDataFrame
2344
    -------
2345
    geopandas.GeoDataFrame
2346
        GeoDataFrame containing units all having assigned a voltage level.
2347
    """
2348
2349
    def voltage_levels(p: float) -> int:
2350
        if p <= 0.1:
2351
            return 7
2352
        elif p <= 0.2:
2353
            return 6
2354
        elif p <= 5.5:
2355
            return 5
2356
        elif p <= 20:
2357
            return 4
2358
        elif p <= 120:
2359
            return 3
2360
        return 1
2361
2362
    units_gdf["voltage_level_inferred"] = False
2363
    mask = units_gdf.voltage_level.isna()
2364
    units_gdf.loc[mask, "voltage_level_inferred"] = True
2365
    units_gdf.loc[mask, "voltage_level"] = units_gdf.loc[mask].capacity.apply(
2366
        voltage_levels
2367
    )
2368
2369
    return units_gdf
2370
2371
2372
def pv_rooftop_to_buildings():
2373
    """Main script, executed as task"""
2374
2375
    mastr_gdf = load_mastr_data()
2376
2377
    buildings_gdf = load_building_data()
2378
2379
    desagg_mastr_gdf, desagg_buildings_gdf = allocate_to_buildings(
2380
        mastr_gdf, buildings_gdf
2381
    )
2382
2383
    all_buildings_gdf = (
2384
        desagg_mastr_gdf.assign(scenario="status_quo")
2385
        .reset_index()
2386
        .rename(columns={"geometry": "geom"})
2387
    )
2388
2389
    scenario_buildings_gdf = all_buildings_gdf.copy()
2390
2391
    cap_per_bus_id_df = pd.DataFrame()
2392
2393
    for scenario in SCENARIOS:
2394
        logger.debug(f"Desaggregating scenario {scenario}.")
2395
        (
2396
            scenario_buildings_gdf,
2397
            cap_per_bus_id_scenario_df,
2398
        ) = allocate_scenarios(  # noqa: F841
2399
            desagg_mastr_gdf,
2400
            desagg_buildings_gdf,
2401
            scenario_buildings_gdf,
2402
            scenario,
2403
        )
2404
2405
        all_buildings_gdf = gpd.GeoDataFrame(
2406
            pd.concat(
2407
                [all_buildings_gdf, scenario_buildings_gdf], ignore_index=True
2408
            ),
2409
            crs=scenario_buildings_gdf.crs,
2410
            geometry="geom",
2411
        )
2412
2413
        cap_per_bus_id_df = pd.concat(
2414
            [cap_per_bus_id_df, cap_per_bus_id_scenario_df]
2415
        )
2416
2417
    # add weather cell
2418
    all_buildings_gdf = add_weather_cell_id(all_buildings_gdf)
2419
2420
    # add bus IDs for status quo scenario
2421
    all_buildings_gdf = add_bus_ids_sq(all_buildings_gdf)
2422
2423
    # export scenario
2424
    create_scenario_table(infer_voltage_level(all_buildings_gdf))
2425