Hey there!
Welcome to ClearUrDoubt.com.
In this post, we will look at withColumn() function in Apache Spark SQL API.
withColumn(String columnName, Column newColumn) is used to add a new column to a dataframe.
Let’s look at the below code snippet in spark-shell for demo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
scala> val df = sqlContext.read.json("/clearurdoubt/practice/students.json"); df: org.apache.spark.sql.DataFrame = [first_name: string, id: string, last_name: string] scala> df.show +-------------+---+----------+ | first_name| id| last_name| +-------------+---+----------+ | Sai Gowtham|101| Badvity| |Krishna Reddy|102| Padala| |Ramanjaneyulu|103| BL| | Adi Narayan|104| Vajra| | Sai Charan|105| Mangipudi| | Pramod Reddy|106|Savitikati| | Ramakrishna|107| Mutyala| +-------------+---+----------+ scala> val df_new = df.withColumn("full_name", concat(df("first_name"),lit(" "),df("last_name"))) df_new: org.apache.spark.sql.DataFrame = [first_name: string, id: string, last_name: string, full_name: string] scala> scala> df_new.show +-------------+---+----------+--------------------+ | first_name| id| last_name| full_name| +-------------+---+----------+--------------------+ | Sai Gowtham|101| Badvity| Sai Gowtham Badvity| |Krishna Reddy|102| Padala|Krishna Reddy Padala| |Ramanjaneyulu|103| BL| Ramanjaneyulu BL| | Adi Narayan|104| Vajra| Adi Narayan Vajra| | Sai Charan|105| Mangipudi|Sai Charan Mangipudi| | Pramod Reddy|106|Savitikati|Pramod Reddy Savi...| | Ramakrishna|107| Mutyala| Ramakrishna Mutyala| +-------------+---+----------+--------------------+ |
sc.read.json() – reads a JSON format file and creates a dataframe with that JSON data.
df.withColumn() – creates a new column by concatenating data in first_name column , an empty space and data in last_name column.
Happy Learning :).
Please leave a reply in case of any queries.