How can I eliminate duplicate words from String in Java?

10,490

Solution 1

Assuming the String is repeated just twice, and with an space in between as in your examples, the following code would remove repetitions:

for (int i=0; i<myList.size(); i++) {
    String s = myList.get(i);
    String fs = s.substring(0, s.length()/2);
    String ls = s.substring(s.length()/2+1, s.length());
    if (fs.equals(ls)) {
        myList.set(i, fs);
    }
}

The code just split each entry of the list into two substrings (dividing by the half point). If both are equal, substitute the original element with only one half, thus removing the repetition.

I was testing the code and did not see @Brendan Robert answer. This code follows the same logic as his answer.

Solution 2

I would suggest using regular expressions. I was able to remove duplicates using this pattern: \b([\w\s']+) \1\b

public class Main {
    static String [] phrases = {
            "this is a first sentence",
            "hello my name is Chris",
            "what's up man what's up man",
            "today is tuesday",
            "this is a very long sentence this is a very long sentence",
            "single word single word",
            "hey hey"
    };
    public static void main(String[] args) throws Exception {
        String duplicatePattern = "\\b([\\w\\s']+) \\1\\b";
        Pattern p = Pattern.compile(duplicatePattern);
        for (String phrase : phrases) {
            Matcher m = p.matcher(phrase);
            if (m.matches()) {
                System.out.println(m.group(1));
            } else {
                System.out.println(phrase);
            }
        }
    }
}

Results:

this is a first sentence
hello my name is Chris
what's up man
today is tuesday
this is a very long sentence
single word
hey

Solution 3

Assumptions:

  1. Uppercase words are equal to lowercase counterparts.

String fullString = "lol lol";
String[] words = fullString.split("\\W+");
StringBuilder stringBuilder = new StringBuilder();
Set<String> wordsHashSet = new HashSet<>();

for (String word : words) {
    // Check for duplicates
    if (wordsHashSet.contains(word.toLowerCase())) continue;

    wordsHashSet.add(word.toLowerCase());
    stringBuilder.append(word).append(" ");
}
String nonDuplicateString = stringBuilder.toString().trim();

Solution 4

simple logic : split every word by token space i.e " " and now add it in LinkedHashSet , Retrieve back, Replace "[","]",","

 String s = "I want to walk my dog I want to walk my dog";
 Set<String> temp = new LinkedHashSet<>();
 String[] arr = s.split(" ");

 for ( String ss : arr)
      temp.add(ss);

 String newl = temp.toString()
          .replace("[","")
          .replace("]","")
          .replace(",","");

 System.out.println(newl);

o/p : I want to walk my dog

Share:
10,490
user3766930
Author by

user3766930

Updated on June 09, 2022

Comments

  • user3766930
    user3766930 almost 2 years

    I have an ArrayList of Strings and it contains records such as:

    this is a first sentence
    hello my name is Chris 
    what's up man what's up man
    today is tuesday
    

    I need to clear this list, so that the output does not contain repeated content. In the case above, the output should be:

    this is a first sentence
    hello my name is Chris 
    what's up man
    today is tuesday
    

    as you can see, the 3rd String has been modified and now contains only one statement what's up man instead of two of them. In my list there is a situation that sometimes the String is correct, and sometimes it is doubled as shown above.

    I want to get rid of it, so I thought about iterating through this list:

    for (String s: myList) {
    

    but I cannot find a way of eliminating duplicates, especially since the length of each string is not determined, and by that I mean there might be record:

    this is a very long sentence this is a very long sentence
    

    or sometimes short ones:

    single word singe word
    

    is there some native java function for that maybe?

  • HopefullyHelpful
    HopefullyHelpful about 7 years
    You need a tolowercase, assuming that words are the same when case is not.
  • Veneet Reddy
    Veneet Reddy about 7 years
    Added assumption.