papermill_step   A
last analyzed

Complexity

Total Complexity 0

Size/Duplication

Total Lines 98
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
wmc 0
eloc 22
dl 0
loc 98
rs 10
c 0
b 0
f 0
1
# ---
2
# jupyter:
3
#   jupytext:
4
#     cell_metadata_json: true
5
#     formats: ipynb,py:percent
6
#     notebook_metadata_filter: language_info
7
#     text_representation:
8
#       extension: .py
9
#       format_name: percent
10
#       format_version: '1.3'
11
#       jupytext_version: 1.5.2
12
#   kernelspec:
13
#     display_name: Python 3
14
#     language: python
15
#     name: python3
16
#   language_info:
17
#     codemirror_mode:
18
#       name: ipython
19
#       version: 3
20
#     file_extension: .py
21
#     mimetype: text/x-python
22
#     name: python
23
#     nbconvert_exporter: python
24
#     pygments_lexer: ipython3
25
#     version: 3.8.4
26
# ---
27
28
# %% [markdown]
29
# # A papermill example: Fitting a model
30
#
31
32
# %% [markdown]
33
# ### Specify default parameters
34
#
35
# This is a "parameters" cell, which defines default
36
37
# %% {"tags": ["parameters"]}
38
# Our default parameters
39
# This cell has a "parameters" tag, means that it defines the parameters for use in the notebook
40
start_date = "2001-08-05"
41
stop_date = "2016-01-01"
42
43
# %% [markdown]
44
# ## Set up our packages and create the data
45
#
46
# We'll run `plt.ioff()` so that we don't get double plots in the notebook
47
48
# %%
49
import matplotlib.pyplot as plt
50
import numpy as np
51
import pandas as pd
52
import scrapbook as sb
53
54
plt.ioff()
55
np.random.seed(1337)
56
57
# %%
58
# Generate some fake data by date
59
dates = pd.date_range("2010-01-01", "2020-01-01")
60
data = pd.DataFrame(np.random.randn(len(dates)), index=dates, columns=['mydata'])
61
data = data.rolling(100).mean()  # Smooth it so it looks purdy
62
63
# %% [markdown]
64
# ## Choose a subset of data to highlight
65
#
66
# Here we use the **start_date** and **stop_date** parameters, which are defined above by default, but can
67
# be overwritten at runtime by papermill.
68
69
# %%
70
data_highlight = data.loc[start_date: stop_date]
71
72
# %% [markdown]
73
# We use the `pm.record()` function to keep track of how many records were included in the
74
# highlighted section. This lets us inspect this value after running the notebook with papermill.
75
#
76
# We also include a ValueError if we've got a but in the start/stop times, which will be captured
77
# and displayed by papermill if it's triggered.
78
79
# %%
80
num_records = len(data_highlight)
81
sb.glue('num_records', num_records, display=True)
82
if num_records == 0:
83
    raise ValueError("I have no data to highlight! Check that your dates are correct!")
84
85
# %% [markdown]
86
# ## Make our plot
87
#
88
# Below we'll generate a matplotlib figure with our highlighted dates. By calling `pm.display()`, papermill
89
# will store the figure to the key that we've specified (`highlight_dates_fig`). This will let us inspect the
90
# output later on.
91
92
# %%
93
fig, ax = plt.subplots()
94
ax.plot(data.index, data['mydata'], c='k', alpha=.5)
95
ax.plot(data_highlight.index, data_highlight['mydata'], c='r', lw=3)
96
ax.set(title="Start: {}\nStop: {}".format(start_date, stop_date))
97
sb.glue('highlight_dates_fig', fig, display=True)
98
99
# %%
100
101
102
103