Aug
24
Word Count program using Spark and Scala
Hey there! Welcome to ClearUrDoubt.com. In this post, we will look at the WordCount program in Spark using Scala. Here is the Scala program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
package com.sample import org.apache.spark._ object ScalaWordCounts { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Scala Word Count").setMaster("local") val sc = new SparkContext(conf) val lines = sc.textFile(args(0)) val words = lines.flatMap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val counts = pairs.reduceByKey((count1, count2) => (count1 + count2)) counts.saveAsTextFile(args(1)) sc.stop() } } |
Input(HDFS) file:
1 2 3 4 5 |
[admin@myworkstation ~]$ hadoop fs -cat /user/workspace/clearurdoubt/test_folder1/testfile_1.txt This is a test file. it is created for the practice purpose. [admin@myworkstation ~]$ |
Spark-submit command:
1 |
spark-submit --master local --class com.sample.ScalaWordCounts /user/workspace/clearurdoubt/lib/ScalaWordCount-0.0.1-SNAPSHOT.jar /user/workspace/clearurdoubt/test_folder1/testfile_1.txt /user/workspace/clearurdoubt/word_count_output/ |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
[admin@myworkstation ~]$ hdfs dfs -cat /user/workspace/clearurdoubt/word_count_output/* (created,1) (it,1) (is,2) (practice,1) (a,1) (,1) (This,1) (for,1) (file.,1) (purpose.,1) (the,1) (test,1) [admin@myworkstation ~]$ |
Happy Learning! 🙂 Please Read more