I am using Python 3 in the following examples but you can easily adapt them to Python 2. Python for Apache Spark.īefore installing pySpark, you must have Python and Spark installed. Scala pro and cons for Spark context, please refer to this interesting article: Scala vs. If you prefer to develop in Scala, you will find many alternatives on the following github repository: alexarchambault/jupyter-scala In my opinion, Python is the perfect language for prototyping in Big Data/Machine Learning fields. However like many developers, I love Python because it’s flexible, robust, easy to learn, and benefits from all my favorites libraries. Python for Spark is obviously slower than Scala. While using Spark, most data engineers recommends to develop either in Scala (which is the “native” Spark language) or in Python through complete PySpark API. I wrote this article for Linux users but I am sure Mac OS users can benefit from it too. That’s why Jupyter is a great tool to test and prototype programs. It allows you to modify and re-execute parts of your code in a very flexible way. Jupyter Notebook is a popular application that enables you to edit, run and share Python code into a web view. In a few words, Spark is a fast and powerful framework that provides an API to perform massive distributed processing over resilient sets of data. Spark with JupyterĪpache Spark is a must for Big data’s lovers.
0 Comments
Leave a Reply. |