Split Java String by New Line
Solution 1
This should cover you:
String lines[] = string.split("\\r?\\n");
There's only really two newlines (UNIX and Windows) that you need to worry about.
Solution 2
String#split(String regex)
method is using regex (regular expressions). Since Java 8 regex supports \R
which represents (from documentation of Pattern class):
Linebreak matcher
\R Any Unicode linebreak sequence, is equivalent to\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
So we can use it to match:
-
\u000D\000A
->\r\n
pair -
\u000A -> line feed (
\n
) -
\u000B -> line tabulation (DO NOT confuse with character tabulation
\t
which is\u0009
) -
\u000C -> form feed (
\f
) -
\u000D -> carriage return (
\r
) - \u0085 -> next line (NEL)
- \u2028 -> line separator
- \u2029 -> paragraph separator
As you see \r\n
is placed at start of regex which ensures that regex will try to match this pair first, and only if that match fails it will try to match single character line separators.
So if you want to split on line separator use split("\\R")
.
If you don't want to remove from resulting array trailing empty strings ""
use split(regex, limit)
with negative limit
parameter like split("\\R", -1)
.
If you want to treat one or more continues empty lines as single delimiter use split("\\R+")
.
Solution 3
If you don’t want empty lines:
String.split("[\\r\\n]+")
Solution 4
String.split(System.lineSeparator());
This should be system independent
Solution 5
A new method lines
has been introduced to String
class in java-11, which returns Stream<String>
Returns a stream of substrings extracted from this string partitioned by line terminators.
Line terminators recognized are line feed "\n" (U+000A), carriage return "\r" (U+000D) and a carriage return followed immediately by a line feed "\r\n" (U+000D U+000A).
Here are a few examples:
jshell> "lorem \n ipusm \n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r sit".lines().forEach(System.out::println)
lorem
ipusm
sit
jshell> "lorem \n ipusm \r\n sit".lines().forEach(System.out::println)
lorem
ipusm
sit
dr.manhattan
I am a Computer Science major at UCF. Currently, I'm struggling between deciding whether or not I want to pursue game programming or doing something else.
Updated on October 19, 2021Comments
-
dr.manhattan over 2 years
I'm trying to split text in a
JTextArea
using a regex to split the String by\n
However, this does not work and I also tried by\r\n|\r|n
and many other combination of regexes. Code:public void insertUpdate(DocumentEvent e) { String split[], docStr = null; Document textAreaDoc = (Document)e.getDocument(); try { docStr = textAreaDoc.getText(textAreaDoc.getStartPosition().getOffset(), textAreaDoc.getEndPosition().getOffset()); } catch (BadLocationException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } split = docStr.split("\\n"); }
-
Alan Moore over 15 yearsinsertUpdate() is a DocumentListener method. Assuming the OP is using it right, trying to modify the document from within the listener method will generate an exception. But you're right: the code in that question doesn't actually do anything.
-
Alan Moore over 15 yearsA JTextArea document SHOULD use only '\n'; its Views completely ignore '\r'. But if you're going to look for more than one kind of separator, you might as well look for all three: "\r?\n|\r".
-
Alan Moore over 15 yearsNot really. When you write a regex in the form of a Java String literal, you can use "\n" to pass the regex compiler a linefeed symbol, or "\\n" to pass it the escape sequence for a linefeed. The same goes for all the other whitespace escapes except \v, which isn't supported in Java literals.
-
angryITguy over 12 yearsdouble backslashes are unnecessary, see section "Backslashes, escapes, and quoting" docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/…
-
angryITguy over 12 years@Yuval. Sorry that is incorrect, you don't need it at all "Backslashes, escapes, and quoting" docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/…
-
Gumbo over 12 years
-
Maarten Bodewes almost 12 yearsIt's an interesting idea, but you should take care that the text actually uses the system's line separator. I've good many many text files under unix (e.g. XML) that uses "Windows" separators and quite a few under Windows that use unix separators.
-
Raekye about 11 yearsMac 9 uses \r. OSX 10 uses \n
-
ruX over 10 yearsWorks even on android
-
Makoto about 10 yearsThis pales in comparison to the other answers, which are more explanatory and less code-heavy. Could you explain what it is you're accomplishing with this code, and why it would make a suitable answer?
-
Admin about 10 years${fn:length(fn:split(data, '\\r?\\n'))} is not working in jstl
-
bvdb almost 10 yearsFiles created in a Windows OS and transfered to a Unix OS will still contain \r\n seperators. I think it's better to play safe and take both seperators in account.
-
FeinesFabi over 9 yearsIsn't it: 'String[] lines = String.split("\\r?\\n");' ?
-
John over 9 yearsThis worked on Mac OSX when the above answer did not.
-
Martin over 9 yearsThis is a very problematic approach! The file may not originate from the system running the code. I strongly discourage these kinds of "system independent" designs that actually depends on a specific system, the runtime system.
-
Martin over 9 yearsThis has nothing to do with splitting a file into lines. Consider removing your answer.
-
Shervin Asgari over 9 years@Martin if you have control over the deployed system, this is fine. However, if you are deploying your code to the cloud and have no control, then its not the best way to do it
-
Martin over 9 years@Shervin It is never the best way to do it. It is in fact very bad practice. Consider some other programmer calling System.setProperty("line.separator", "you have no point"); Your code is broken. It might even be called similarly by a dependency you have no knowledge about.
-
logixplayer almost 9 yearsThis also worked for me. Excellent solution. It worked for the following 2 cases: 1) i woke up at 3 o clock.\r\n\r\nI hope 2) this is real life\r\nso I
-
Greg almost 9 yearsThis did not work as the file originated on Unix, and was being split on Windows.
-
greyseal96 about 8 yearsThis answer is exactly correct. One little suggestion would be that it might be helpful to add why it gets rid of the empty lines for people that might not be as familiar with regex and how it behaves. For anybody that might be wondering, it's because the "+" is a greedy operator and will match at least one but will continue to match the '\r\n' characters until it no longer can match them. See here: regular-expressions.info/repeat.html#greedy
-
Alan Moore about 8 yearsYes, you do. If they need double-escaping anywhere, they need it everywhere. Whitespace escapes like
\r
and\n
can have one or two backslashes; they work either way. -
Pshemo almost 8 years@antak yes,
split
by default removes trailing empty strings if they ware result of split. To turn this mechanism off you need to use overloaded version ofsplit(regex, limit)
with negative limit liketext.split("\\r?\\n", -1)
. More info: Java String split removed empty values -
nurchi almost 8 yearsThe double backslash
'\\'
in code becomes a'\'
character and is then passed to the RegEx engine, so"[\\r\\n]"
in code becomes[\r\n]
in memory and RegEx will process that. I don't know how exactly Java handles RegEx, but it is a good practice to pass a "pure" ASCII string pattern to the RegEx engine and let it process rather than passing binary characters."[\r\n]"
becomes (hex)0D0A
in memory and one RegEx engine might accept it while another will choke. So the bottom line is that even if Java's flavour of RegEx doesn't need them, keep double slashes for compatibility -
ibai over 7 yearsString[] lines = string.split(System.getProperty("line.separator")); This will work fine while you use strings generated in your same OS/app, but if for example you are running your java application under linux and you retrieve a text from a database that was stored as a windows text, then it could fail.
-
James McLaughlin about 7 yearsThe comment by @stivlo is misinformation, and it is unfortunate that it has so many upvotes. As @ Raekye pointed out, OS X (now known as macOS) has used \n as its line separator since it was released in 2001. Mac OS 9 was released in 1999, and I have never seen a Mac OS 9 or below machine used in production. There is not a single modern operating system that uses \r as a line separator. NEVER write code that expects \r to be the line separator on Mac, unless a) you're into retro computing, b) have an OS 9 machine spun up, and c) can reliably determine that the machine is actually OS 9.
-
Rop almost 7 years@Martin -- "some other programmer calling System.setProperty("line.separator", "you have no point"); " --- Just wondering, wouldn't such idiocy/sabotage break a lot of expected behaviours in the JDK libraries, too?
-
Lealo almost 7 yearsAnd what does it mean?
-
Martin almost 7 years@Rop I can't think of any cases right away, but there might exist dependencies to system properties that actually break code. I would strongly encourage configuration without use of system properties whenever possible.
-
Maykel Llanes Garcia over 6 yearsThis answer did not work for me. I just use "String pieces[] = text.split("\n") " or "String pieces[] = text.split(System.getProperty("line.separator")) "on java 8.
-
Danilo Piazzalunga over 6 yearsI know this may be an overkill solution.
-
Ted Hopp about 6 yearsOr
String[] lines = new BufferedReader(...).lines().toArray(String[]::new);
for an array instead of a list. The nice thing about this solution is thatBufferedReader
knows about all kinds of like terminators, so it can handle text in all sorts of formats. (Most of the regex-based solutions posted here fall short in this regard.) -
leventov almost 6 yearsThis solution is obsolete since Java 11 and the introduction of the String.lines() method.
-
john ktejik over 5 yearsWhat about unicode?? A next-line character ('\u0085'), A line-separator character ('\u2028'), or A paragraph-separator character ('\u2029).
-
tresf over 5 yearsWhy not
[\\r?\\n]+
? -
Dawood ibn Kareem over 4 yearsYes, it's the best answer. Unfortunate that the question was asked six years too early for this answer.
-
Breina over 4 years@tresf You can't use quantifiers in square brackets.
-
Ubeogesh over 4 yearshow about this: \v+ (one or more vertical whitespace character)
-
SeverityOne over 4 yearsI ended up splitting on
\\R+
, to avoid any end-of-line characters that were not covered by\\R
alone. -
MichaelMoser over 3 yearsthis works for java8 and splits the string into a stream of line strings: Arrays.stream(str.split("\\n"))
-
Pshemo over 3 yearsJAVA 9 PROBLEM with
find
matches
. Java 9 incorrectly allows regex like\R\R
to match sequence\r\n
which represents single separation sequence. To solve such problem we can write regex like(?>\u000D\u000A)|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
which thanks to atomic group(?>\u000D\u000A)
will prevent regex which already matched\r\n
to backtrack and try to match\r
and\n
separately. -
Leponzo over 2 yearsAre double backslashes necessary?
-
Diablo about 2 yearsnot sure if it worked with previous versions of Java. But it'll not work anymore. the
split()
method expects a regex and the valid regex would be\\\\n
-
vault about 2 yearsWhy is there no such method on Android...