Oct
27
Finding word counts in a file using Spark
Hey there! Welcome to ClearUrDoubt.com. In this post, we will look at the steps to find the word counts in a file using Spark 2.4.3. Input file:
1 2 3 4 5 6 7 |
Hey there! Welcome to clearurdoubt.com. Happy Learning. Hope you find these articles useful! Happy Learning. |
Spark shell statements:
1 2 3 4 5 6 7 8 |
val rdd = spark.sparkContext.textFile("/clearurdoubt/test_file.txt") // Read the input file val words = rdd.flatMap(line => line.split(" ")).filter(word => !word.isEmpty) // split the input lines to words and ignore the empty ones. words.cache // Cache the resource for future use val totalWords = words.count // total words in the input file val pairs = words.map(word => (word, 1)) // parse the words into (word, noOfOccurence) pairs val wordCounts = pairs.reduceByKey(_ + _).collect // finding the each word count in terms of pairs and collecting them into an Array |
Output:
1 2 3 |
totalWords: Long = 15 wordCounts: Array[(String, Int)] = Array((Learning.,2), (Hey,1), (Welcome,1), (Happy,2), (Hope,1), (find,1), (you,1), (there!,1), (to,1), (clearurdoubt.com.,1), (articles,1), (these,1), (useful!,1)) |
Happy Learning! Please leave a reply Read more