2

I'm trying to make a small shell program to get me fortune cookie quotes. When I run it in my terminal it (almost) works fine

curl -s http://www.fortunecookiemessage.com | grep -oP '(<div class=\"quote\").*(</div>)' | sed 's/.*link\">\(.*\)<\/a>.*/\1/'

But when I tried putting the same into my bash script (run.sh) like this

sentence=$(curl -s http://www.fortunecookiemessage.com | grep -oP '(<div class=\"quote\").*(</div>)' | sed 's/.*link\">\(.*\)<\/a>.*/\1/')

it gave me an error as follows

bash-3.2# sh run.sh 
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]

I need help to resolve this. Also, sometimes the quote which gets extracted comes in the format <p>QUOTE</p>. This isn't all the time but sometimes only. I'm wondering what I must change in the regular expression in the sed command to cut out the <p></p> tags IF it occurs.

My output with set -x

bash-3.2# sh run.sh 
++ curl -s http://www.fortunecookiemessage.com
++ grep -oP '(<div class=\"quote\").*(</div>)'
++ sed 's/.*link\">\(.*\)<\/a>.*/\1/'
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]
(23) Failed writing body
+ sentence=
+ echo
Saifur Rahman Mohsin
  • 929
  • 1
  • 11
  • 37
  • 2
    Remove spaces around `=`. So use: `sentence=$(curl -s http://www.fortunecookiemessage.com | grep -oP '(
    )' | sed 's/.*link\">\(.*\)<\/a>.*/\1/')`
    – anubhava Aug 11 '14 at 20:51
  • 2
    ...that said, in general, the better approach for extracting content from XML is to use an actual XML-aware query tool -- `xmlstarlet sel`, `xmllint --xpath`, etc. – Charles Duffy Aug 11 '14 at 20:52
  • @CharlesDuffy unfortunately, fortunecookiemessage dot com does not return valid XML, so `xmllint` can't handle it. Nevertheless, [yOu caN't paRsE hTmL wIth reGulAr expReSsioNs](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), so your overall point is still right. – kojiro Aug 11 '14 at 20:57
  • 1
    @kojiro, that's one of the places where xmlstarlet's `-H` (`--html`) comes in handy. – Charles Duffy Aug 11 '14 at 20:57
  • I tried xmllint but it gave me an error. Apparently the fortune cookie site mentioned in the code has an HTML error and therefore xmllint throws an error from the HTML parser! Do check with the site if your curious. It's line #8 of the source code where an is written instead of @anubhava Sorry the spaces were not there, still it gave error. I added that when I typed it here. I have updated my question description with that correction as well as set -x output – Saifur Rahman Mohsin Aug 11 '14 at 21:01
  • 1
    Instead of `grep -oP '(
    )'` try: `grep -Eo '
    '`
    – anubhava Aug 11 '14 at 21:11
  • 1
    @anubhava If you make that an answer then OP can accept it and get their +2 instead of just posting a *thank you comment* that [will probably be deleted](http://meta.stackoverflow.com/a/258032/418413). – kojiro Aug 12 '14 at 13:38
  • Fair point @kojiro, I will post it as answer. – anubhava Aug 12 '14 at 13:39

4 Answers4

2

It appears that your grep isn't supporting -P option. Change your grep command to:

grep -Eo '<div class="quote".*</div>'
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

You can do it with perl oneliner like:

perl -Mojo -E 'say g(q(http://www.fortunecookiemessage.com))->dom(q(div[class=quote]))->all_text'

but you need to have installed the Mojolicious oneliners module suite.

clt60
  • 62,119
  • 17
  • 107
  • 194
1

Thanks a lot guys. anubhava's comment helped me solve this. The final answer is

sentence=$(curl -s http://www.fortunecookiemessage.com | grep -Eo '(<div class=\"quote\").*(</div>)' | sed 's/.*link\">\(.*\)<\/a>.*/\1/' | sed 's/<[^>]*>//g')
Saifur Rahman Mohsin
  • 929
  • 1
  • 11
  • 37
0

You might have a different environment set up when running Bash in interactive vs non-interactive mode, and it could be that you are not launching the same grep in both cases.

Compare the output of which grep when you run it in your terminal and the output of the same command when added to your script. If they differ, use the output of that command when run in your terminal to specify the full path to grep in your script.

Or, as @anubhava suggests, change your command parameters to get rid of the incriminated option.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110