Velvet Star Monitor

Standout celebrity highlights with iconic style.

updates

Bash alternative to trim newline \n and extra white-space

Writer Mia Lopez

I am trying to parse a multiline sentence:

You have to go tomorrow by car.

As you can see there is a new line + space then "car."

I used this regex:

You.have.to.go.tomorrow.by.\n.+

It worked great when I used it with regex101 but when I used it in bash, it worked for only the first sentence:

Parser='You.have.to.go.tomorrow.by.\n.+'

Result:

You have to go tomorrow by

I am using bash, I want the full sentence:

"You have to go tomorrow by car."

I am using:

sed -e 's/<[^>]\+>/ /g' | grep -oP $parser

to delete all HTML tags then grep for the parser.

5

2 Answers

-z, --null-data separates lines with NUL character instead of newline, which makes it possible to match newlines.

grep -Pzo \
'You have to go tomorrow by\n\s+car.' text | tr -s '\n ' ' '

If you were to do it in pure bash, you would probably need to ANSI-quote your pattern to represent newline.

#!/bin/bash
pattern=$'You have to go tomorrow by\n\s+car.'
[[ $(<text) \
=~ ($pattern) ]] && echo ${BASH_REMATCH}

Assume you intend to clean up only the line you are referring to, then it's possible to combine substitutions. Matching the line containing 'You have to go tomorrow by' we can then group and run multiple commands with braces {...}, separated by semicolons, on this match.

sed -rn '/You have to go tomorrow by/{N; s/\n//; s/ {2,}/ /; s/<[^>]+>//g;p}' text
  • N Read the next line and add it to the pattern space.
  • s substitute text.
  • g global, substitute all occurrences in the line.
  • p print.
0

With tr

<FileName tr -s '\n' ' '

With xargs

<FileName xargs

Notice

Change FileName with the name of the file containing the two lines.

  • <FileName will read the file and output to STDIN... i.e something like cat FileName |.

  • tr -s '\n' ' ' brings the two lines into one line and removes multiple spaces leaving only single spaces.

  • xargs by default trims newlines and extra white space as part of its job to converts input from STDIN into arguments to a command... i.e. this is how it works to do its job.

You can also pipe the output from sed to tr like so:

<FileName sed -e 's/<[^>]\+>//g' | tr -s '\n' ' '

or from sed to xargs like so:

<FileName sed -e 's/<[^>]\+>//g' | xargs

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy