In this article, we will count and print number of repeated word occurrences in a text file i.e.;
Counting & Printing duplicate word occurrences :
- Using Java 8 Stream and java.util.AbstractMap.SimpleEntry
- Using Java 8 Stream and Collectors.toMap() method
- Using Pattern.compile(“\W+”).splitAsStream() method
Note:- Same example is implemented using below Java 1.8 version and without Stream, check Java – Count and print number of repeated word occurrences in a String
Sample text file:
1. Using Java 8 Stream and SimpleEntry :
- First, read file lines parallelly using Files.lines().parallel()
- Split every line on the basis of space as delimiter using Stream.flatMap() method
- Replace all non-alphabet characters using Stream.map() method to remove white-spaces, if any
- Filter out word having its length greater than zero using Stream.filter() method
- Using Stream.map() method again, put every word in SimpleEntry
- Finally, collect words and its count using Java 8 Collectors
CountRepeatedWordsUsingJava8.java
package in.bench.resources.count.lines.words;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.AbstractMap.SimpleEntry;
import java.util.Arrays;
import java.util.Comparator;
import java.util.Map;
import java.util.stream.Collectors;
public class CountRepeatedWordsUsingJava8 {
public static void main(String[] args) throws IOException {
// read file from root folder
Path path = Paths.get("Words.txt"); // get file location
// count repeated words
Map<String, Long> wordCountMap = Files.lines(path) // read all lines in file
.parallel() // parallely
.flatMap(line -> Arrays.stream(line.trim().split(" "))) // split words on space
.map(word -> word.replaceAll("[^a-zA-Z]", "").trim()) // remove white-spaces
.filter(word -> word.length() > 0) // filter word length greater than 1
.map(word -> new SimpleEntry<>(word, 1)) // put it in temp Entry
.collect(Collectors.groupingBy(SimpleEntry::getKey, Collectors.counting()));
// print to the console
System.out.println("1. Words and its Count in Random-order :- \n");
wordCountMap
.entrySet()
.forEach(System.out::println);
// print to the console
System.out.println("\n\n2. Words and its Count in Descending-order :- \n");
wordCountMap
.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.forEach(System.out::println);
}
}
Output:
1. Words and its Count in Random-order :-
Social=1
Telugu=1
English=2
Maths=2
blank=15
Kannda=1
Science=1
Hindi=2
Civics=2
History=1
Tamil=3
Physics=1
2. Words and its Count in Descending-order :-
blank=15
Tamil=3
English=2
Maths=2
Hindi=2
Civics=2
Social=1
Telugu=1
Kannda=1
Science=1
History=1
Physics=1
2. Using Java 8 Stream and Collectors.toMap() method :
- First, read file lines parallelly using Files.lines().parallel()
- Split every line on the basis of space as delimiter using Stream.map() method
- Replace all non-alphabet characters using Stream.map() method to remove white-spaces, if any
- Filter out word which isn’t empty using Stream.filter() method
- Finally, collect words and its count using Java 8 Collectors
CountRepeatedWordsUsingJava8CollectorsToMap.java
package in.bench.resources.count.lines.words;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
public class CountRepeatedWordsUsingJava8CollectorsToMap {
public static void main(String[] args) throws IOException {
// read file from root folder
Path path = Paths.get("Words.txt"); // get file location
// count repeated words
Map<String, Long> wordCountMap = Files.lines(path) // read all lines in file
.parallel() // parallely
.flatMap(line -> Arrays.stream(line.trim().split(" "))) // split words on space
.map(word -> word.replaceAll("[^a-zA-Z]", "").trim()) // remove white-spaces
.filter(word -> !word.isEmpty()) // filter words which isn't empty
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// print to the console
System.out.println("1. Words and its Count in Random-order :- \n");
wordCountMap
.entrySet()
.forEach(System.out::println);
// print to the console
System.out.println("\n\n2. Words and its Count in Ascending-order :- \n");
wordCountMap
.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue())
.forEach(System.out::println);
}
}
Output:
1. Words and its Count in Random-order :-
Social=1
Telugu=1
English=2
Maths=2
blank=15
Kannda=1
Science=1
Hindi=2
Civics=2
History=1
Tamil=3
Physics=1
2. Words and its Count in Ascending-order :-
Social=1
Telugu=1
Kannda=1
Science=1
History=1
Physics=1
English=2
Maths=2
Hindi=2
Civics=2
Tamil=3
blank=15
3. Using Pattern.compile().splitAsStream() method :
- First, read file lines parallelly using Files.readAllLines() and convert to String using toString() method
- Form a regex expression to split each word from a text file using Pattern.compile(“\\W+”) and at the same time convert it into Stream using splitAsStream() method passing string read from file as argument
- Filter out word which isn’t empty using Stream.filter() method
- Finally, collect words and its count using Java 8 Collectors after converting words into lowercase
CountRepeatedWordsUsingJava8PatternSplitAsStream.java
package in.bench.resources.count.lines.words;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Comparator;
import java.util.Map;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class CountRepeatedWordsUsingJava8PatternSplitAsStream {
public static void main(String[] args) throws IOException {
// read file from root folder
Path path = Paths.get("Words.txt"); // get file location
// read all lines and convert to String to process
String input = Files.readAllLines(path).toString();
// count repeated words, ignoring case
Map<String, Integer> wordCountMap = Pattern.compile("\\W+")
.splitAsStream(input)
.filter(word -> !word.isEmpty()) // filter words which isn't empty
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.summingInt(s -> 1))); // calculate repeated count
// print to the console
System.out.println("1. Words and its Count in Random-order :- \n");
wordCountMap
.entrySet()
.forEach(System.out::println);
// print to the console
System.out.println("\n\n2. Words and its Count in Descending-order :- \n");
wordCountMap
.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.forEach(System.out::println);
}
}
Output:
1. Words and its Count in Random-order :-
kannda=1
tamil=3
blank=15
social=1
maths=2
civics=2
physics=1
science=1
hindi=2
english=2
history=1
telugu=1
2. Words and its Count in Descending-order :-
blank=15
tamil=3
maths=2
civics=2
hindi=2
english=2
kannda=1
social=1
physics=1
science=1
history=1
telugu=1
Related Articles:
- Java 8 – Count and print number of lines and words in a text file
- Java 8 – Count and print number of repeated word occurrences in a text file
- Java 8 – Count and print number of repeated character occurrences in a String
- Java – Count and print number of words and lines in a text file
- Java – Count and print number of repeated word occurrences in a String
- Java – Count and print number of repeated character occurrences in a String
References:
- AbstractMap.SimpleEntry (Java Platform SE 8 ) (oracle.com)
- Pattern (Java Platform SE 8 ) (oracle.com)
- Collectors (Java Platform SE 8 ) (oracle.com)
- Java – Count and print number of repeated word occurrences in a String
Happy Coding !!
Happy Learning !!