.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto/integrations/flytekit_plugins/modin_examples/knn_classifier.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_integrations_flytekit_plugins_modin_examples_knn_classifier.py: KNN Classifier -------------- In this example, let's understand how effortlessly the Modin DataFrames can be used with tasks and workflows in a simple classification pipeline. Modin uses `Ray `__ or `Dask `__ as the compute engine. We will use Ray in this example. To install Modin with Ray as the backend, .. code:: bash pip install modin[ray] .. note:: To install Modin with Dask as the backend, .. code:: bash pip install modin[dask] Let's dive right in! .. GENERATED FROM PYTHON SOURCE LINES 25-26 Let's import the necessary dependencies. .. GENERATED FROM PYTHON SOURCE LINES 26-50 .. code-block:: default from typing import List, NamedTuple import flytekitplugins.modin # noqa: F401 import modin.pandas import ray from flytekit import task, workflow from sklearn.datasets import load_wine from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier ray.shutdown() # close previous instance of ray (if any) ray.init(num_cpus=2) # open a new instance of ray split_data = NamedTuple( "split_data", train_features=modin.pandas.DataFrame, test_features=modin.pandas.DataFrame, train_labels=modin.pandas.DataFrame, test_labels=modin.pandas.DataFrame, ) .. GENERATED FROM PYTHON SOURCE LINES 51-52 We define a task that processes the wine dataset after loading it into the environment. .. GENERATED FROM PYTHON SOURCE LINES 52-73 .. code-block:: default @task def preprocess_data() -> split_data: wine = load_wine(as_frame=True) # convert features and target (numpy arrays) into Modin DataFrames wine_features = modin.pandas.DataFrame(data=wine.data, columns=wine.feature_names) wine_target = modin.pandas.DataFrame(data=wine.target, columns=["target"]) # split the dataset X_train, X_test, y_train, y_test = train_test_split( wine_features, wine_target, test_size=0.33, random_state=101 ) return split_data( train_features=X_train, test_features=X_test, train_labels=y_train, test_labels=y_test, ) .. GENERATED FROM PYTHON SOURCE LINES 74-79 Next, we define a task that: 1. trains a KNeighborsClassifier model, 2. fits the model to the data, and 3. predicts the output for the test dataset. .. GENERATED FROM PYTHON SOURCE LINES 79-91 .. code-block:: default @task def fit_and_predict( X_train: modin.pandas.DataFrame, X_test: modin.pandas.DataFrame, y_train: modin.pandas.DataFrame, ) -> List[int]: lr = KNeighborsClassifier() # create a KNeighborsClassifier model lr.fit(X_train, y_train) # fit the model to the data predicted_vals = lr.predict(X_test) # predict values for test data return predicted_vals.tolist() .. GENERATED FROM PYTHON SOURCE LINES 92-93 We compute accuracy of the model. .. GENERATED FROM PYTHON SOURCE LINES 93-100 .. code-block:: default @task def calc_accuracy( y_test: modin.pandas.DataFrame, predicted_vals_list: List[int] ) -> float: return accuracy_score(y_test, predicted_vals_list) .. GENERATED FROM PYTHON SOURCE LINES 101-102 Lastly, we define a workflow. .. GENERATED FROM PYTHON SOURCE LINES 102-117 .. code-block:: default @workflow def pipeline() -> float: split_data_vals = preprocess_data() predicted_vals_output = fit_and_predict( X_train=split_data_vals.train_features, X_test=split_data_vals.test_features, y_train=split_data_vals.train_labels, ) return calc_accuracy( y_test=split_data_vals.test_labels, predicted_vals_list=predicted_vals_output ) if __name__ == "__main__": print(f"Accuracy of the model is {pipeline()}%") .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.000 seconds) .. _sphx_glr_download_auto_integrations_flytekit_plugins_modin_examples_knn_classifier.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: knn_classifier.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: knn_classifier.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_