.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto/integrations/flytekit_plugins/whylogs_examples/whylogs_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_integrations_flytekit_plugins_whylogs_examples_whylogs_example.py: whylogs Example --------------- This examples shows users how to profile pandas DataFrames with whylogs, pass them within tasks and also use our renderers to create a SummaryDriftReport and a ConstraintsReport with failed and passed constraints. .. GENERATED FROM PYTHON SOURCE LINES 11-12 First, let's make all the necessary imports for our example to run properly .. GENERATED FROM PYTHON SOURCE LINES 12-32 .. code-block:: default import os import flytekit import numpy as np import pandas as pd import whylogs as why from flytekit import conditional, task, workflow from flytekitplugins.whylogs.renderer import WhylogsConstraintsRenderer, WhylogsSummaryDriftRenderer from flytekitplugins.whylogs.schema import WhylogsDatasetProfileTransformer from sklearn.datasets import load_diabetes from whylogs.core import DatasetProfileView from whylogs.core.constraints import ConstraintsBuilder from whylogs.core.constraints.factories import ( greater_than_number, mean_between_range, null_percentage_below_number, smaller_than_number, ) .. GENERATED FROM PYTHON SOURCE LINES 33-35 Next thing is defining a task to read our reference dataset. For this, we will take scikit-learn's entire example Diabetes dataset .. GENERATED FROM PYTHON SOURCE LINES 35-43 .. code-block:: default @task def get_reference_data() -> pd.DataFrame: diabetes = load_diabetes() df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names) df["target"] = pd.DataFrame(diabetes.target) return df .. GENERATED FROM PYTHON SOURCE LINES 44-47 To some extent, we wanted to show kinds of drift in our example, so in order to reproduce some of what real-life data behaves we will take an arbitrary subset of the reference dataset .. GENERATED FROM PYTHON SOURCE LINES 47-55 .. code-block:: default @task def get_target_data() -> pd.DataFrame: diabetes = load_diabetes() df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names) df["target"] = pd.DataFrame(diabetes.target) return df.mask(df["age"] < 0.0).dropna(axis=0) .. GENERATED FROM PYTHON SOURCE LINES 56-60 Now we will define a task that can take in any pandas DataFrame and return a ``DatasetProfileView``, which is our data profile. With it, users can either visualize and check overall statistics or even run a constraint suite on top of it. .. GENERATED FROM PYTHON SOURCE LINES 60-66 .. code-block:: default @task def create_profile_view(df: pd.DataFrame) -> DatasetProfileView: result = why.log(df) return result.view() .. GENERATED FROM PYTHON SOURCE LINES 67-69 And we will also define a constraints report task that will run some checks in our existing profile. .. GENERATED FROM PYTHON SOURCE LINES 69-85 .. code-block:: default @task def constraints_report(profile_view: DatasetProfileView) -> bool: builder = ConstraintsBuilder(dataset_profile_view=profile_view) builder.add_constraint(greater_than_number(column_name="age", number=-11.0)) builder.add_constraint(smaller_than_number(column_name="bp", number=20.0)) builder.add_constraint(mean_between_range(column_name="s3", lower=-1.5, upper=1.5)) builder.add_constraint(null_percentage_below_number(column_name="sex", number=0.0)) constraints = builder.build() renderer = WhylogsConstraintsRenderer() flytekit.Deck("constraints", renderer.to_html(constraints=constraints)) return constraints.validate() .. GENERATED FROM PYTHON SOURCE LINES 86-91 This is a representation of a prediction task. Since we are looking to take some of the complexity away from our demonstrations, our model prediction here will be represented by generating a bunch of random numbers with numpy. This task will take place if we pass our constraints suite. .. GENERATED FROM PYTHON SOURCE LINES 91-100 .. code-block:: default @task def make_predictions(input_data: pd.DataFrame, output_path: str) -> str: input_data["predictions"] = np.random.random(size=len(input_data)) if not os.path.exists(output_path): os.makedirs(output_path) input_data.to_csv(os.path.join(output_path, "predictions.csv")) return f"wrote predictions successfully to {output_path}" .. GENERATED FROM PYTHON SOURCE LINES 101-104 Lastly, if the constraint checks fail, we will create a FlyteDeck with the Summary Drift Report, which can provide further intuition into whether there was a data drift to the failed constraint checks. .. GENERATED FROM PYTHON SOURCE LINES 104-112 .. code-block:: default @task def summary_drift_report(new_data: pd.DataFrame, reference_data: pd.DataFrame) -> str: renderer = WhylogsSummaryDriftRenderer() report = renderer.to_html(target_data=new_data, reference_data=reference_data) flytekit.Deck("summary drift", report) return f"reported summary drift for target dataset with n={len(new_data)}" .. GENERATED FROM PYTHON SOURCE LINES 113-115 Finally, we can then create a Flyte workflow that will chain together our example data pipeline .. GENERATED FROM PYTHON SOURCE LINES 115-139 .. code-block:: default @workflow def wf() -> str: # 1. Read data target_df = get_target_data() # 2. Profile data and validate it profile_view = create_profile_view(df=target_df) validated = constraints_report(profile_view=profile_view) # 3. Conditional actions if data is valid or not return ( conditional("stop_if_fails") .if_(validated.is_false()) .then( summary_drift_report( new_data=target_df, reference_data=get_reference_data(), ) ) .else_() .then(make_predictions(input_data=target_df, output_path="./data")) ) .. GENERATED FROM PYTHON SOURCE LINES 140-142 .. code-block:: default if __name__ == "__main__": wf() .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_auto_integrations_flytekit_plugins_whylogs_examples_whylogs_example.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: whylogs_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: whylogs_example.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_