How to read a CSV file in spark-shell using Spark SQL

Hey there!

Welcome to ClearUrDoubt.com.

In this post, we will go through the steps to read a CSV file in Spark SQL using spark-shell.

 

Step 1: In Spark 1.6.0, to read a CSV file, we need to use a third-party tool(data bricks CSV API). we can import the dependency while opening spark-shell as below

 

Step 2: Import the required classes before using them.

 

Step 3: Specify the schema of the CSV file records using StructType/StructField classes imported in Step 2.

 

Step 4: Load the CSV file using sqlContext as below:

 

Step 5: Display the records read from CSV file

 

Step 6: Display the total marks of each student

 

Step 7: Display the Sum of Total Marks of all students

 

These steps can be performed programmatically using Scala/Java. We will see them in upcoming posts.

Happy Learning :).

Please leave a reply in case of any queries.

Leave a Reply

Your email address will not be published.