:orphan: NLP Processing -------------- This tutorial will demonstrate how to process text data and generate word embeddings and visualizations as part of a Flyte workflow. It's an adaptation of the official Gensim `Word2Vec tutorial `__. About Gensim ============ Gensim is a popular open-source natural language processing (NLP) library used to process large corpora (can be larger than RAM). It has efficient multicore implementations of a number of algorithms such as `Latent Semantic Analysis `__, `Latent Dirichlet Allocation (LDA) `__, `Word2Vec deep learning `__ to perform complex tasks including understanding document relationships, topic modeling, learning word embeddings, and more. You can read more about Gensim `here `__. Data ==== The dataset used for this tutorial is the open-source `Lee Background Corpus `__ that comes with the Gensim library. Step-by-Step Process ==================== The following points outline the modelling process: - Returns a preprocessed (tokenized, stop words excluded, lemmatized) corpus from the custom iterator. - Trains the Word2vec model on the preprocessed corpus. - Generates a bag of words from the corpus and trains the LDA model. - Saves the LDA and Word2Vec models to disk. - Deserializes the Word2Vec model, runs word similarity and computes word movers distance. - Reduces the dimensionality (using tsne) and plots the word embeddings. Let's dive into the code! .. raw:: html
.. raw:: html
.. only:: html .. image:: /auto/case_studies/ml_training/nlp_processing/images/thumb/sphx_glr_word2vec_and_lda_thumb.png :alt: Word Embeddings and Topic Modelling with Gensim :ref:`sphx_glr_auto_case_studies_ml_training_nlp_processing_word2vec_and_lda.py` .. raw:: html
Word Embeddings and Topic Modelling with Gensim
.. raw:: html
.. toctree:: :hidden: /auto/case_studies/ml_training/nlp_processing/word2vec_and_lda .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-gallery .. container:: sphx-glr-download sphx-glr-download-python :download:`Download all examples in Python source code: nlp_processing_python.zip ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download all examples in Jupyter notebooks: nlp_processing_jupyter.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_