banner

For a full list of BASHing data blog posts, see the index page.     RSS


The trouble with Windows CRLF

In 2018 there was hope for Windows end-of-line (CRLF) sufferers. Microsoft announced that the latest version of its text editor "Notepad" would correctly open text files with plain linefeed (LF) line endings:

For many years, Windows Notepad only supported text documents containing Windows End of Line (EOL) characters - Carriage Return (CR) & Line Feed (LF). This means that Notepad was unable to correctly display the contents of text files created in Unix, Linux and macOS...
 
...This has been a major annoyance for developers, IT Pros, administrators, and end users throughout the community.
 
Today, we’re excited to announce that we have fixed this issue!

Then Microsoft acquired GitHub, and people began to wonder if 2018 was the beginning of the end for CRLF in Windows programs, and an end to the Great Newline Schism. I doubt it, just as I doubt that Microsoft will join the rest of us in the 21st century and move to UTF-8 encoding anytime soon.

A Data Cleaner's Cookbook has a page devoted to finding and removing CR on the command line. Here I give examples of how a Windows carriage return at the end of a line can stuff up some basic command-line operations.


An operation specifying the last character on a line will give the wrong result. Because CR is non-printing, the line looks OK but shell tools will recognise CR (\r) as the character at the end of the line, just before the linefeed (LF, \n):

CR1

The rev command will put a CR at the beginning:

CR2

echo with the -n option to omit a newline will fail and hang:

CR3

tr deleting newlines will fail and hang:

CR4

grepping to match a full line will fail:

CR5

reading lines in a while loop will fail:

CR6

Concatenation of successive lines with paste or sed will fail:

CR7

AWK will fail on records and fields:

CR8

diff and comm will respect CR and give unexpected results:

CR9

join will give unexpected results:

CR10

Last update: 2019-03-17
The blog posts on this website are licensed under a
Creative Commons Attribution-NonCommercial 4.0 International License