Passed
Push — master ( 5927b1...ccc1cd )
by Christophe
01:43 queued 38s
created

papermill_step   A

Complexity

Total Complexity 0

Size/Duplication

Total Lines 94
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
wmc 0
eloc 22
dl 0
loc 94
rs 10
c 0
b 0
f 0
1
# ---
2
# jupyter:
3
#   jupytext:
4
#     text_representation:
5
#       extension: .py
6
#       format_name: percent
7
#       format_version: '1.1'
8
#       jupytext_version: 0.8.5
9
#   kernelspec:
10
#     display_name: Python 3
11
#     language: python
12
#     name: python3
13
#   language_info:
14
#     codemirror_mode:
15
#       name: ipython
16
#       version: 3
17
#     file_extension: .py
18
#     mimetype: text/x-python
19
#     name: python
20
#     nbconvert_exporter: python
21
#     pygments_lexer: ipython3
22
#     version: 3.6.6
23
# ---
24
25
# %% [markdown]
26
# # A papermill example: Fitting a model
27
#
28
29
# %% [markdown]
30
# ### Specify default parameters
31
#
32
# This is a "parameters" cell, which defines default
33
34
# %% {"tags": ["parameters"]}
35
# Our default parameters
36
# This cell has a "parameters" tag, means that it defines the parameters for use in the notebook
37
start_date = "2001-08-05"
38
stop_date = "2016-01-01"
39
40
# %% [markdown]
41
# ## Set up our packages and create the data
42
#
43
# We'll run `plt.ioff()` so that we don't get double plots in the notebook
44
45
# %%
46
import pandas as pd
47
import numpy as np
48
import matplotlib.pyplot as plt
49
import papermill as pm
50
plt.ioff()
51
np.random.seed(1337)
52
53
# %%
54
# Generate some fake data by date
55
dates = pd.date_range("2010-01-01", "2020-01-01")
56
data = pd.DataFrame(np.random.randn(len(dates)), index=dates, columns=['mydata'])
57
data = data.rolling(100).mean()  # Smooth it so it looks purdy
58
59
# %% [markdown]
60
# ## Choose a subset of data to highlight
61
#
62
# Here we use the **start_date** and **stop_date** parameters, which are defined above by default, but can
63
# be overwritten at runtime by papermill.
64
65
# %%
66
data_highlight = data.loc[start_date: stop_date]
67
68
# %% [markdown]
69
# We use the `pm.record()` function to keep track of how many records were included in the
70
# highlighted section. This lets us inspect this value after running the notebook with papermill.
71
#
72
# We also include a ValueError if we've got a but in the start/stop times, which will be captured
73
# and displayed by papermill if it's triggered.
74
75
# %%
76
num_records = len(data_highlight)
77
pm.record('num_records', num_records)
78
if num_records == 0:
79
    raise ValueError("I have no data to highlight! Check that your dates are correct!")
80
81
# %% [markdown]
82
# ## Make our plot
83
#
84
# Below we'll generate a matplotlib figure with our highlighted dates. By calling `pm.display()`, papermill
85
# will store the figure to the key that we've specified (`highlight_dates_fig`). This will let us inspect the
86
# output later on.
87
88
# %%
89
fig, ax = plt.subplots()
90
ax.plot(data.index, data['mydata'], c='k', alpha=.5)
91
ax.plot(data_highlight.index, data_highlight['mydata'], c='r', lw=3)
92
ax.set(title="Start: {}\nStop: {}".format(start_date, stop_date))
93
pm.display('highlight_dates_fig', fig)
94
95
# %%
96
97
98
99