How to add a new column to a Dataframe in Apache Spark

Hey there!

Welcome to ClearUrDoubt.com.

In this post, we will look at withColumn() function in Apache Spark SQL API.

withColumn(String columnName, Column newColumn) is used to add a new column to a dataframe.

Let’s look at the below code snippet in spark-shell for demo:

sc.read.json() – reads a JSON format file and creates a dataframe with that JSON data.

df.withColumn() – creates a new column by concatenating data in first_name column , an empty space and data in last_name column.

Happy Learning :).

Please leave a reply in case of any queries.

Leave a Reply

Your email address will not be published.