Large string split into lines with maximum length in java

56,864

Solution 1

Just iterate through the string word by word and break whenever a word passes the limit.

public String addLinebreaks(String input, int maxLineLength) {
    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        if (lineLen + word.length() > maxLineLength) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

I just typed that in freehand, you may have to push and prod a bit to make it compile.

Bug: if a word in the input is longer than maxLineLength it will be appended to the current line instead of on a too-long line of its own. I assume your line length is something like 80 or 120 characters, in which case this is unlikely to be a problem.

Solution 2

Best : use Apache Commons Lang :

org.apache.commons.lang.WordUtils

/**
 * <p>Wraps a single line of text, identifying words by <code>' '</code>.</p>
 * 
 * <p>New lines will be separated by the system property line separator.
 * Very long words, such as URLs will <i>not</i> be wrapped.</p>
 * 
 * <p>Leading spaces on a new line are stripped.
 * Trailing spaces are not stripped.</p>
 *
 * <pre>
 * WordUtils.wrap(null, *) = null
 * WordUtils.wrap("", *) = ""
 * </pre>
 *
 * @param str  the String to be word wrapped, may be null
 * @param wrapLength  the column to wrap the words at, less than 1 is treated as 1
 * @return a line with newlines inserted, <code>null</code> if null input
 */
public static String wrap(String str, int wrapLength) {
    return wrap(str, wrapLength, null, false);
}

Solution 3

You can use WordUtils.wrap method of Apache Commans Lang

 import java.util.*;
 import org.apache.commons.lang3.text.WordUtils;
 public class test3 {


public static void main(String[] args) {

    String S = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    String F = WordUtils.wrap(S, 20);
    String[] F1 =  F.split(System.lineSeparator());
    System.out.println(Arrays.toString(F1));

}}

Output

   [THESE TERMS AND, CONDITIONS OF, SERVICE (the Terms), ARE A LEGAL AND, BINDING AGREEMENT, BETWEEN YOU AND, NATIONAL GEOGRAPHIC, governing your use, of this site,, www.nationalgeographic.com,, which includes but, is not limited to, products, software, and services offered, by way of the, website such as the, Video Player,, Uploader, and other, applications that, link to these Terms, (the Site). Please, review the Terms, fully before you, continue to use the, Site. By using the, Site, you agree to, be bound by the, Terms. You shall, also be subject to, any additional terms, posted with respect, to individual, sections of the, Site. Please review, our Privacy Policy,, which also governs, your use of the, Site, to understand, our practices. If, you do not agree,, please discontinue, using the Site., National Geographic, reserves the right, to change the Terms, at any time without, prior notice. Your, continued access or, use of the Site, after such changes, indicates your, acceptance of the, Terms as modified., It is your, responsibility to, review the Terms, regularly. The Terms, were last updated on, 18 July 2011.]

Solution 4

Thanks Barend Garvelink for your answer. I have modified the above code to fix the Bug: "if a word in the input is longer than maxCharInLine"

public String[] splitIntoLine(String input, int maxCharInLine){

    StringTokenizer tok = new StringTokenizer(input, " ");
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    while (tok.hasMoreTokens()) {
        String word = tok.nextToken();

        while(word.length() > maxCharInLine){
            output.append(word.substring(0, maxCharInLine-lineLen) + "\n");
            word = word.substring(maxCharInLine-lineLen);
            lineLen = 0;
        }

        if (lineLen + word.length() > maxCharInLine) {
            output.append("\n");
            lineLen = 0;
        }
        output.append(word + " ");

        lineLen += word.length() + 1;
    }
    // output.split();
    // return output.toString();
    return output.toString().split("\n");
}

Solution 5

Starting from @Barend 's suggestion, following is my final version with minor modifications :

private static final char NEWLINE = '\n';
private static final String SPACE_SEPARATOR = " ";
//if text has \n, \r or \t symbols it's better to split by \s+
private static final String SPLIT_REGEXP= "\\s+";

public static String breakLines(String input, int maxLineLength) {
    String[] tokens = input.split(SPLIT_REGEXP);
    StringBuilder output = new StringBuilder(input.length());
    int lineLen = 0;
    for (int i = 0; i < tokens.length; i++) {
        String word = tokens[i];

        if (lineLen + (SPACE_SEPARATOR + word).length() > maxLineLength) {
            if (i > 0) {
                output.append(NEWLINE);
            }
            lineLen = 0;
        }
        if (i < tokens.length - 1 && (lineLen + (word + SPACE_SEPARATOR).length() + tokens[i + 1].length() <=
                maxLineLength)) {
            word += SPACE_SEPARATOR;
        }
        output.append(word);
        lineLen += word.length();
    }
    return output.toString();
}

System.out.println(breakLines("THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A     LEGAL AND BINDING " +
                "AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing     your use of this site, " +
            "www.nationalgeographic.com, which includes but is not limited to products, " +
            "software and services offered by way of the website such as the Video Player.", 20));

Outputs :

THESE TERMS AND
CONDITIONS OF
SERVICE (the Terms)
ARE A LEGAL AND
BINDING AGREEMENT
BETWEEN YOU AND
NATIONAL GEOGRAPHIC
governing your use
of this site,
www.nationalgeographic.com,
which includes but
is not limited to
products, software
and services 
offered by way of
the website such as
the Video Player.
Share:
56,864

Related videos on Youtube

Abhishek
Author by

Abhishek

Updated on July 09, 2022

Comments

  • Abhishek
    Abhishek almost 2 years
    String input = "THESE TERMS AND CONDITIONS OF SERVICE (the Terms) ARE A LEGAL AND BINDING AGREEMENT BETWEEN YOU AND NATIONAL GEOGRAPHIC governing your use of this site, www.nationalgeographic.com, which includes but is not limited to products, software and services offered by way of the website such as the Video Player, Uploader, and other applications that link to these Terms (the Site). Please review the Terms fully before you continue to use the Site. By using the Site, you agree to be bound by the Terms. You shall also be subject to any additional terms posted with respect to individual sections of the Site. Please review our Privacy Policy, which also governs your use of the Site, to understand our practices. If you do not agree, please discontinue using the Site. National Geographic reserves the right to change the Terms at any time without prior notice. Your continued access or use of the Site after such changes indicates your acceptance of the Terms as modified. It is your responsibility to review the Terms regularly. The Terms were last updated on 18 July 2011.";
    
    //text copied from http://www.nationalgeographic.com/community/terms/
    

    I want to split this large string into lines and the lines should not content more than MAX_LINE_LENGTH characters in each line.

    What I tried so far

    int MAX_LINE_LENGTH = 20;    
    System.out.print(Arrays.toString(input.split("(?<=\\G.{MAX_LINE_LENGTH})")));
    //maximum length of line 20 characters
    

    Output :

    [THESE TERMS AND COND, ITIONS OF SERVICE (t, he Terms) ARE A LEGA, L AND B ...
    

    It causes breaking of words. I don't want this. Instead of I want to get output like this:

    [THESE TERMS AND , CONDITIONS OF , SERVICE (the Terms) , ARE A LEGAL AND B ...
    

    One more condition added : If a word length is greater than MAX_LINE_LENGTH then the word should get split.

    And solution should be without helping of external jars.

    • hammar
      hammar over 12 years
    • Abhishek
      Abhishek over 12 years
      @hammer - my client doesn't want me to use any external jar files. I didn't get any solution in that thread you mentioned without any external jar files.
  • Abhishek
    Abhishek over 12 years
    no. we need to fix this bug also. cause my max_line_length is 30. and my line may contents filename also which could be more than 30. In this case we need to break the word.
  • Abhishek
    Abhishek over 12 years
    I just confirmed that filename would not be more than 15 characters. So cheers friend!!! \m/
  • Abhishek
    Abhishek over 12 years
    I just changed one part in your code String word = tok.nextToken()+" ";
  • clic
    clic over 7 years
    you should use output.append(word).append (" ");
  • stacky
    stacky about 7 years
    Awesome solution, works like charm. Can the text be made center??