Hey there!
Welcome to ClearUrDoubt.com.
In this post, we will look at the WordCount program in Spark using Scala.
Here is the Scala program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
package com.sample import org.apache.spark._ object ScalaWordCounts { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Scala Word Count").setMaster("local") val sc = new SparkContext(conf) val lines = sc.textFile(args(0)) val words = lines.flatMap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val counts = pairs.reduceByKey((count1, count2) => (count1 + count2)) counts.saveAsTextFile(args(1)) sc.stop() } } |
Input(HDFS) file:
1 2 3 4 5 |
[admin@myworkstation ~]$ hadoop fs -cat /user/workspace/clearurdoubt/test_folder1/testfile_1.txt This is a test file. it is created for the practice purpose. [admin@myworkstation ~]$ |
Spark-submit command:
1 |
spark-submit --master local --class com.sample.ScalaWordCounts /user/workspace/clearurdoubt/lib/ScalaWordCount-0.0.1-SNAPSHOT.jar /user/workspace/clearurdoubt/test_folder1/testfile_1.txt /user/workspace/clearurdoubt/word_count_output/ |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[admin@myworkstation ~]$ hdfs dfs -cat /user/workspace/clearurdoubt/word_count_output/* (created,1) (it,1) (is,2) (practice,1) (a,1) (,1) (This,1) (for,1) (file.,1) (purpose.,1) (the,1) (test,1) [admin@myworkstation ~]$ |
Happy Learning! 🙂
Please leave a reply in case of any queries.