Saturday 3 March 2018

SQL with Apache Spark, easy!

Reading about cluster computing developments like Apache Spark and SQL I decided to find out.

What I was after was to see how easy is to write SQL in Spark-SQL. In this micro-post I will show you how easy is to SQL a JSON file.

For my experiment I will use my chrome_history.json file which you can download from your chrome browser using the extension To run the SQL query on PySpark on my laptop I will use the PyCharm IDE. After little bit of configuration on PyCharm, setting up environments (SPARK_HOME), there it is: It only takes 3 lines to be able SQL query a JSON document in Spark-SQL.

(click image to enlarge)

Think of the possibilities with SQL, the 'cluster' partitioning and parallelisation you can achieve


Apache Spark:


No comments: