Core Java

Normalize End Of Line Character

1. Introduction

An end-of-line (EOL) character is a special character that marks the end of a line in a text file or a string. Historically, different operating systems denote a different character for EOL. For example, The UNIX system defines EOL as "\n" (NewLine ), and Apple MacOS chooses "\r" (CarriageReturn) while the Microsoft Windows system defines "\r\n" (CRLF). Out of these three EOL, the “\n” is the most used. When java programs process text data from different operating systems with different EOL characters, normalizing the EOL characters ensures consistency in data processing and avoids unexpected behavior. In this example, I will demonstrate how to normalize end of line character via the java.lang.System.lineSeparator method as it returns the system-dependent line separator string.

2. Maven Set up

In this step, I will set up a maven project to print out 3 text lines.

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>org.zheng.demo</groupId>
	<artifactId>demoEOL</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<dependencies>
		<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-lang3 -->
		<dependency>
			<groupId>org.apache.commons</groupId>
			<artifactId>commons-lang3</artifactId>
			<version>3.14.0</version>
		</dependency>

	</dependencies>
</project>

I will create a TestData class which builds 3 text lines with a different EOL character with the buildAStringWithEol method.

TestData.java

package demoEOL;

public class TestData {
	public static String EOL_MAC = "\r";
	public static String EOL_UNIX = "\n";
	public static String EOL_WIN = "\r\n";

	public static String buildAStringWithEol(String eolChar) {
		StringBuilder sb = new StringBuilder();

		sb.append("Line 1Mary");
		sb.append(eolChar);
		sb.append("Line 2Zheng");
		sb.append(eolChar);
		sb.append("Line 3Joe");
		sb.append(eolChar);

		return sb.toString();
	}
	
	public static void printout(String replacedString) {
		System.out.print(System.lineSeparator());
		System.out.println("*** Replaced String should have 3 lines:");
		System.out.print(replacedString);
	}
}
  • Line 4: the EOL character for MacOs.
  • Line 5: the EOL character for UNIX.
  • Line 6: the EOL character for Windows.
  • Line 22: the Built-in Java system property for the line separator. The value is different based on the underlying operating system.

3. Show the Issue

In this step, I will demonstrate the potential issue caused by unnormalized EOL characters.

DemoEOLIssue.java

package demoEOL;

import org.junit.jupiter.api.Test;

class DemoEOLIssue {

	@Test
	void test_eol_macs() {
		System.out.println("Mac \\r with 3 lines:");
		System.out.print(TestData.buildAStringWithEol(TestData.EOL_MAC));
	}

	@Test
	void test_eol_macs_console() {
		System.out.print("Line1LongSentenceWillbeTruncated\r");
		System.out.print("Line2MaryZheng");
	}

}
  • Line 10: constructs three text line with MacOs EOL character.
  • Line 15: print out a line with "\r" character which moves the cursor to the beginning at the line in Windows system instead of creating a new line as it does in MacOS.

Execution the test_eol_macs tests and capture the output here.

test_eol_test output

Mac \r with 3 lines:
Line 3Joeng

Note: as you see in the output, the first two lines are missing and the third line becomes “Line 3Joeng“.

Run the test_eol_macs_console test and capture the output:

test_eol_macs_console output

Line2MaryZhengnceWillbeTruncated

As you see here, the printed line is not expected. This is the data issue caused by the EOL characters "\r" is not supported as the new line in the windows system.

Figure 1. Unexpected Data

Note: Eclipse IDE console setting for "Interpret Carriage Return(\r) as control character" should be checked.

4. Normalize EOL Character via Java Built-in Library

In this step, I will normalize the EOL characters into System.lineSeparator with two Java built-in libraries:

  • String.replaceAll – replaces each substring of this string that matches the given regular expression with the given replacement.
  • Stream API – uses both lines and map methods to transform the line separator.

NormalizeEOL.java

package demoEOL;

import java.util.stream.Collectors;

import org.junit.jupiter.api.Test;

class NormalizeEOL {

	private static final String NEW_LINE_REG = "\r\n|\r|\n";

	@Test
	void test_eol_mac_stream() {
		String replacedString = TestData.buildAStringWithEol(TestData.EOL_MAC).lines()
				.map(line -> line + System.lineSeparator()).collect(Collectors.joining());
		TestData.printout(replacedString);
	}

	@Test
	void test_eol_macs() {

		String replacedString = TestData.buildAStringWithEol(TestData.EOL_MAC).replaceAll(NEW_LINE_REG,
				System.lineSeparator());
		TestData.printout(replacedString);
	}

	@Test
	void test_eol_unix() {

		String replacedString = TestData.buildAStringWithEol(TestData.EOL_UNIX).replaceAll(NEW_LINE_REG,
				System.lineSeparator());
		TestData.printout(replacedString);
	}

	@Test
	void test_eol_unix_stream() {
		String replacedString = TestData.buildAStringWithEol(TestData.EOL_UNIX).lines()
				.map(line -> line + System.lineSeparator()).collect(Collectors.joining());
		TestData.printout(replacedString);
	}

	@Test
	void test_eol_windows() {

		String replacedString = TestData.buildAStringWithEol(TestData.EOL_WIN).replaceAll(NEW_LINE_REG,
				System.lineSeparator());
		TestData.printout(replacedString);
	}

	@Test
	void test_eol_windows_stream() {
		String replacedString = TestData.buildAStringWithEol(TestData.EOL_WIN).lines()
				.map(line -> line + System.lineSeparator()).collect(Collectors.joining());
		TestData.printout(replacedString);
	}

}
  • Line 9: create a regular expression for three EOL characters used by Unix, MacOS, and Windows.
  • Line 14. 37, 52: use Stream map method to change the EOL to System.lineSeparator.
  • Line 21, 29, 44: Use replaceAll to System.lineSeparator

Execute the Junit test and capture the output. All print three text lines as expected.

Junit Output

*** Replaced String should have 3 lines:
Line 1Mary
Line 2Zheng
Line 3Joe

5. Normalize EOL Character via Apache Library

In this step, I will normalize the EOL characters into System.lineSeparator with Apache Common Lang library:

  • replaceEach: replaces all occurrences of a String within another String.

NormalizeEOLViaApache.java

package demoEOL;

import org.apache.commons.lang3.StringUtils;
import org.junit.jupiter.api.Test;

public class NormalizeEOLViaApache {

	@Test
	void test_eol_mac_apacheCommon() {
		String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_MAC),
				new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX },
				new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() });

		TestData.printout(replacedString);
	}

	@Test
	void test_eol_unix_apacheCommon() {
		String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_UNIX),
				new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX },
				new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() });

		TestData.printout(replacedString);
	}

	@Test
	void test_eol_windows_apacheCommon() {
		String replacedString = StringUtils.replaceEach(TestData.buildAStringWithEol(TestData.EOL_WIN),
				new String[] { TestData.EOL_WIN, TestData.EOL_MAC, TestData.EOL_UNIX },
				new String[] { System.lineSeparator(), System.lineSeparator(), System.lineSeparator() });

		TestData.printout(replacedString);
	}

}

Execute the Junit test and capture the output. All prints out three lines as expected.

Junit Output

*** Replaced String should have 3 lines:
Line 1Mary
Line 2Zheng
Line 3Joe

6. Conclusion

In this example, I created four Java classes to demonstrate the importance of normalizing the EOL characters and how to normalize end of line characters in Java.

  • The TestData class sets up the basic 3 text lines.
  • The DemoEOLIssue class demonstrates the issue caused by the EOL character used in the string is not supported by the underlying operating system.
  • The NormalizeEOL class normalizes the EOL characters into System.lineSeparator based on the built-in library.
  • The NormalizeEOLViaApache class normalizes the EOL character into System.lineSeparator based on the apache common lang library.

7. Download

This was an example of a Java maven project which normalizes the EOL character.

Download
You can download the full source code of this example here: Normalize the EOL Character

Mary Zheng

Mary graduated from the Mechanical Engineering department at ShangHai JiaoTong University. She also holds a Master degree in Computer Science from Webster University. During her studies she has been involved with a large number of projects ranging from programming and software engineering. She worked as a lead Software Engineer in the telecommunications sector where she led and worked with others to design, implement, and monitor the software solution.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button