Jun
09
How to add a new column and update its value based on the other column in the Dataframe in Spark
Hey there! Welcome to ClearUrDoubt.com. In this post, we will look at updating a column value based on another column value in a dataframe using when() utility function in Spark.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
Spark context available as 'sc' (master = local[*], app id = local-1560028311690). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.3 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 12.0.1) Type in expressions to have them evaluated. Type :help for more information. scala> val df = spark.read.json("/clearurdoubt/output.json"); df: org.apache.spark.sql.DataFrame = [first_name: string, id: bigint ... 2 more fields] scala> df.show(false); +----------+---+---------+------------+ |first_name|id |last_name|student_type| +----------+---+---------+------------+ |Rahul |101|Kumar |1 | |Antony |102|James |0 | |Ashok |103|Kedar |0 | |Ravi |104|Aswin |1 | +----------+---+---------+------------+ scala> val new_df = df.withColumn("student_type_description", when(col("student_type").equalTo("1"), lit("Day-Scholar")).otherwise("Residential")); new_df: org.apache.spark.sql.DataFrame = [first_name: string, id: bigint ... 3 more fields] scala> new_df.show(false); +----------+---+---------+------------+------------------------+ |first_name|id |last_name|student_type|student_type_description| +----------+---+---------+------------+------------------------+ |Rahul |101|Kumar |1 |Day-Scholar | |Antony |102|James |0 |Residential | |Ashok |103|Kedar |0 |Hostler | |Ravi |104|Aswin |1 |Residential | +----------+---+---------+------------+------------------------+ scala> |
WHEN() function documentation: static Column when(Column condition, Object value) Evaluates a list Read more