12

How do I do a one way diff in Linux?

Normal behavior of diff:

Normally, diff will tell you all the differences between a two files. For example, it will tell you anything that is in file A that is not in file B, and will also tell you everything that is in file B, but not in file A. For example:

File A contains:

cat
good dog
one
two

File B contains:

cat
some garbage
one
a whole bunch of garbage
something I don't want to know

If I do a regular diff as follows:

diff A B

the output would be something like:

2c2
< good dog
---
> some garbage
4c4,5
< two
---
> a whole bunch of garbage
> something I don't want to know

What I am looking for:

What I want is just the first part, for example, I want to know everything that is in File A, but not file B. However, I want it to ignore everything that is in file B, but not in file A.

What I want is the command, or series of commands:

???? A B

that produces the output:

2c2
< good dog
4c4,5
< two

I believe a solution could be achieved by piping the output of diff into sed or awk, but I am not familiar enough with those tools to come up with a solution. I basically want to remove all lines that begin with --- and >.

Edit: I edited the example to account for multiple words on a line.

Note: This is a "sub-question" of: Determine list of non-OS packages installed on a RedHat Linux machine

Note: This is similar to, but not the same as the question asked here (e.g. not a dupe): One-way diff file

Community
  • 1
  • 1
Jonathan
  • 1,050
  • 1
  • 12
  • 36

4 Answers4

8

An alternative, if your files consist of single-line entities only, and the output order doesn't matter (the question as worded is unclear on this), would be:

comm -23 <(sort A) <(sort B)

comm requires its inputs to be sorted, and the -2 means "don't show me the lines that are unique to the second file", while -3 means "don't show me the lines that are common between the two files".

If you need the "differences" to be presented in the order they occur, though, the above diff / awk solution is ok (although the grep bit isn't really necessary - it could be diff A B | awk '/^</ { $1 = ""; print }'.

EDIT: fixed which set of lines to report - I read it backwards originally...

twalberg
  • 59,951
  • 11
  • 89
  • 84
7

As stated in the comments, one mostly correct answer is

diff A B | grep '^<'

although this would give the output

< good dog
< two

rather than

2c2
< good dog
4c4,5
< two
1''
  • 26,823
  • 32
  • 143
  • 200
5

diff A B|grep '^<'|awk '{print $2}'

grep '^<' means select rows start with <

awk '{print $2}' means select the second column

leo108
  • 817
  • 5
  • 12
  • 2
    Thank you so much, that put me on the right track. The problem with print $2 is that it ignores any words that come later (e.g. if I put "good dog" in file A vs. dog. It turns out that the first part of the command achieves what I want though, e.g. the following command: diff A B | grep '^<' – Jonathan Jun 24 '14 at 16:32
  • 2
    @Jonathan try this: diff A B|grep '^<'|cut -c 3- – leo108 Jun 25 '14 at 05:44
1

If you want to also see the files in question, in case of diffing folders, you can use

diff public_html temp_public_html/ | grep '^[^>]'

to match all but lines starting with >

Pasi Matalamäki
  • 1,843
  • 17
  • 14