How to read a CSV file in spark-shell using Spark SQL

In this post, we will go through the steps to read a CSV file in Spark SQL using spark-shell.


Step 1: In Spark 1.6.0, to read a CSV file, we need to use a third-party tool(data bricks CSV API). we can import the dependency while opening spark-shell as below


Step 2: Import the required classes before using them.


Step 3: Specify the schema of the CSV file records using StructType/StructField classes imported in Step 2.


Step 4: Load the CSV file using sqlContext as below:


Step 5: Display the records read from CSV file


Step 6: Display the total marks of each student


Step 7: Display the Sum of Total Marks of all students


These steps can be performed programmatically using Scala/Java. We will see them in upcoming posts.

