1

My previous question was flagged "duplicate" and I was pointed to this and this. The solutions provided on those threads does not solve this at all.

Content of file.txt:

Some line of text 0
Some line of text 1
Some line of text 2
PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2
Some line of text 6
Some line of text 7
Some line of text 8
PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2
Some line of text 12
Some line of text 13
Some line of text 14

I need to extract "PATTERN1" and "PATTERN2" + lines in between, and the following command does this perfectly:

awk '/PATTERN1 /,/PATTERN2/' ./file.txt

Output:

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

But now I am trying to create a bash script that:

  1. uses awk to find the lines between PATTERN1 and PATTERN2
  2. store each occurrence of PATTERN1 + lines in between + PATTERN2 in an array
  3. does 1 & 2 until the end of file.

To clarify. Means store the following lines inside the quotes:

"PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2"

to array[0]

and store the following lines inside the quotes:

"PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2"

to array[1]

and so on..... if there are more occurrence of PATTERN1 and PATTERN2

What I currently have:

#!/bin/bash
var0=`cat ./file.txt`
mapfile -t thearray < <(echo "$var0" | awk '/PATTERN1 /,/PATTERN2/')

The above does not work.
And as much as possible I do not want to use mapfile, because the script might be executed on a system that does not support it.

Based on this link provided:

myvar=$(cat ./file.txt)
myarray=($(echo "$var0" | awk '/PATTERN1 /,/PATTERN2/')) 

But when I do echo ${myarray[1]}

I get a blank response.

And when I do echo ${myarray[0]}

I get:

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

What I expect when I do echo ${myarray[0]}

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2

What I expect when I do echo ${myarray[1]}

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

Any help will be great.

ZYX Rhythm
  • 73
  • 7
  • 4
    Make your awk insert a literal NUL after each segment, and then you can use `readarray -d '' arrayname < <(awk ...)` to populate an array segmented by those NULs. – Charles Duffy Aug 19 '20 at 16:44

3 Answers3

3

An implementation in plain bash could be something like that:

#!/bin/bash

beginpat='PATTERN1'
endpat='PATTERN2'

array=()
n=-1
inpatterns=
while read -r; do
    if [[ ! $inpatterns && $REPLY = $beginpat ]]; then
        array[++n]=$REPLY
        inpatterns=1
    elif [[ $inpatterns ]]; then
        array[n]+=$'\n'$REPLY
        if [[ $REPLY = $endpat ]]; then
            inpatterns=
        fi
    fi
done

# Report captured lines
for ((i = 0; i <= n; ++i)); do
    printf "=== array[%d] ===\n%s\n\n" $i "${array[i]}"
done

Run as ./script < file. The use of awk isn't required but the script will work correctly on the awk output as well.

M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17
2

As Charles suggested...

Edited to strip the newline off the and of the block (not every record)

while IFS= read -r -d '' x; do array+=("$x"); done < <(awk '
  /PATTERN1/,/PATTERN2/ { if ( $0 ~ "PATTERN2" ) { x=$0; printf "%s%c",x,0; next }
                          print }' ./file.txt)

I reformatted it. It was getting kinda busy and hard to read.

And to test it -

$: echo "[${array[1]}]"
[PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2]

As an aside, it seems very odd to me to include the redundant sentinel values in the data elements, so if you want to strip those:

$: while IFS= read -r -d '' x; do array+=("$x"); done < <(
    awk '/PATTERN1/,/PATTERN2/{ if ( $0 ~ "PATTERN1" ) { next }
      if ( $0 ~ "PATTERN2" ) { len--; 
        for (l in ary) { printf "%s%c", ary[l], l<len ? "\n" : 0; } 
        delete ary; len=0; next }
      ary[len++]=$0;
    }' ./file.txt )

$: echo "[${array[1]}]"
[Some line of text 9
Some line of text 10
Some line of text 11]
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
  • @PaulHodges wow. just wow. 3 lines is all it took to solve my problem for 2 days now. This WORKED PERFECTLY. You just made my day very very good man. Thank you so much! =) – ZYX Rhythm Aug 20 '20 at 01:05
  • 1
    Thanks, Ed. The both of you are kind of SO gods in my mind, and I get you confused. Still hoping to drive up and take you both to lunch one of these days, lol... Sorry Charles. And you were right, as usual. – Paul Hodges Aug 20 '20 at 13:12
  • I was restricted to post the actual data or at least the actual lines without the sensitive information. IDK why. So I created file.txt as dummy. But PATTERN1 and PATTERN2 are essential parts as well, so they should be kept. The needed chunk of data always starts and ends with those patterns. BTW thanks again. – ZYX Rhythm Aug 20 '20 at 20:47
0

Paul's answer does what I want, so I flagged it as the accepted answer. Though his solution produces a blank extra line at the bottom of every stored value in the array, which is ok, it is easy to remove anyway, so I did not mind. But I also posted this same question on another site, and though Paul's answer was good, I found a better solution:

IFS=$'\r' read -d'\r' -a  ARR < <(awk '/PATTERN1/,/PATTERN2/ {if($0 ~ /PATTERN2/) printf $0"\r"; else print}' file.txt)

The above does the job, does not produce a blank extra line, and its a one liner.

echo "${ARR[1]}"
echo "${ARR[0]}"

Output:

PATTERN1
Some line of text 9
Some line of text 10
Some line of text 11
PATTERN2

PATTERN1
Some line of text 3
Some line of text 4
Some line of text 5
PATTERN2
ZYX Rhythm
  • 73
  • 7
  • LOL! I used a carriage return while testing, but switched it to a NUL byte ("As Charles suggested", lol) because you might end up with CRLF's in a file if it were edited in Notepad, etc. XD – Paul Hodges Aug 20 '20 at 13:15