Strip directory from filename in checksum output

Question

I have the output of a checksum used in a unix shell script, and I need only the checksum value and the filename to be displayed.

$ Cksum path/path2/f1.txt | awk '{print $1,$2}'
1237668 path/path2/f1.txt

However I want the filename without the directory:

1237668 f1.txt

I have tried sed by which I only get the filename and not the checksum:

$ Cksum path/path2/f1.txt | sed 's/.*path2//'
/f1.txt

You need to replace `$2` by `$3`, as `$2` contains the filesize, not the filename. — Dominique, Aug 17 '21 at 11:57

anubhava · Answer 1 · 2021-08-18T09:58:02.687

2

Assuming your filenames don't contain spaces, here are are sed and awk solution:

A simpler sed:

cksum path/to/f.txt | sed 's/ .*[/ ]/ /'

878395353 f.txt

This sed starts match from space character and matches until it gets last / or space since .* is greedy. We just replace this matched text with a single space.

Or a simpler awk using / or space as input field separator:

cksum path/to/f.txt | awk -F '[ /]' '{print $1, $NF}'

878395353 f.txt

edited Aug 18 '21 at 09:58

answered Aug 18 '21 at 06:16

anubhava

761,203
64
569
643

As an aside; with sed and the substitution command, normally the delimiter used is the forward slash `/` but can be changed to another character if it makes the regex/replacement easier (as you have chosen `~`). However, in this case since the `/` is inside the bracket expression, there is no need to change the delimiter, so `sed 's/ .*[/ ]/ /' file` works as well and perhaps is easier on the eye? – potong Aug 18 '21 at 08:41
Thank you. I did use this . Cksum path1/path2/filename.txt | awk 'BEGIN{FS="/";} { print $1, $NF} and I get now checksumvalue size filename.txt. I want to get rid of the size, but unable to. – melony_r Sep 06 '21 at 12:14
I have used cksum path/path2/f1.txt | awk 'BEGIN {print $1,$NF}| awk 'print$1, $3}' AND get the required output..works – melony_r Sep 06 '21 at 12:36
You don't need to use 2 awk actually. It can be done in a single awk – anubhava Sep 06 '21 at 12:51

The fourth bird · Answer 2 · 2021-08-17T13:05:57.973

0

Based on man cksum

The cksum utility writes to the standard output three whitespace separated fields for each input file. These fields are a checksum CRC, the total number of octets in the file and the file name.

Using sed you could use 3 capture groups and use group 1 and 3 in the replacement.

([^[:space:]]+) [^[:space:]]+ (.*/)?([^[:space:]]+)

Explanation

([^[:space:]]+) Group 1, match 1+ non whitespace chars
[^[:space:]]+ Match 1+ chars other than a whitespace char between spaces
(.*/)? Optionally match group 2 matching until the last occurrence of /
([^[:space:]]+) Group 3, match 1+ non whitespace chars

For example:

cksum ./file.txt # --> 3777026118 8 ./file.txt

Using sed

cksum ./file.txt | sed -E 's~([^[:space:]]+) [^[:space:]]+ (.*/)?([^[:space:]]+)~\1 \3~'

Output

3777026118 file.txt

Using awk printing the first field and the last item from the result of splitting the 3rd field:

cksum ./file.txt | awk '
{
  n=split($3,a,"/")
  print $1, a[n]
}'

Output

3777026118 file.txt

edited Aug 17 '21 at 13:05

answered Aug 17 '21 at 11:39

The fourth bird

154,723
16
55
70

The Perl extension `\S` might not be available in `sed` even with `-E`. Anyway, simply `'s%[^[:space:]]+/%%'` should work portably, assuming you don't have an file names with spaces in them (which this answer already seems to assume). – tripleee Aug 17 '21 at 12:42
@tripleee I see, thanks for pointing that out. I tested this on my Mac and it did not work with the `\S`. Now it does on Ubuntu and Mac using `[^[:space:]]+` – The fourth bird Aug 17 '21 at 12:50
1

Still a bit complex for my taste, but thanks for the fix. The `/g` is still superfluous; you don't expect (and could not have) more than one match per line. – tripleee Aug 17 '21 at 13:02

score 0 · Answer 3 · answered Aug 17 '21 at 11:45

Awk can do this. For example:

awk '{ printf("%d", $1); n=split($2,a,"/"); print(" ", a[n])}'

Tested with:

echo "1237668 path/path2/f1.txt"  | awk  '{ printf("%d", $1); n=split($2,a,"/"); print(" ", a[n])}'

1237668  f1.txt

The first element is just a print with $1 in the printf. The second is a split on / then print the last element:

How to split a delimited string into an array in awk?

how to access last index of array from split function inside awk?

Renaud Pacalet · Answer 4 · 2021-08-17T12:13:54.100

0

Note: I do not know the Cksum command you use. My cksum outputs 3 fields: checksum, size and filename. Adapt the indexes in the following if yours behaves differently.

If your shell is bash, you could use a bash array and basename. If you don't have spaces in your filenames:

$ a=($(cksum path/path2/f1.txt))
$ printf '%s %s\n' "${a[0]}" "$(basename ${a[2]})"
857691210 f1.txt

If you have spaces in your filenames, adapt the printf parameters:

$ a=($(cksum "path/path2/f 1.txt"))
$ printf '%s %s\n' "${a[0]}" "$(basename "${a[*]:2}")"
857691210 f 1.txt

And if you prefer quoting the filename:

$ printf '%s "%s"\n' "${a[0]}" "$(basename "${a[*]:2}")"
857691210 "f 1.txt"

edited Aug 17 '21 at 12:13

answered Aug 17 '21 at 11:46

Renaud Pacalet

25,260
3
34
51

I didn't test this, but `a[0]` and `a[2]`, no `a[1]`? – Dominique Aug 17 '21 at 11:48
1

The OP apparently want only the checksum and the file's basename, not the size. – Renaud Pacalet Aug 17 '21 at 11:51
Oh, the size, I forgot about that. So you say that the original question should contain `$3` instead of `$2`: `awk '{print $1, $3}'`? – Dominique Aug 17 '21 at 11:56
1

Yeah, probably. Unless this `Cksum` (uppercase `C`) the OP use is different from my `cksum`... – Renaud Pacalet Aug 17 '21 at 11:57

score 0 · Answer 5 · answered Aug 17 '21 at 12:03

0

You can also use:

cd $(dirname path/path2/f1.txt); cksum $(basename path/path2/f1.txt)

or, to keep the current directory the same:

a=$(pwd); cd $(dirname path/path2/f1.txt); cksum $(basename path/path2/f1.txt); cd $a

answered Aug 17 '21 at 12:03

Luuk

12,245
5
22
33

agc · Answer 6 · 2021-08-18T03:17:35.813

0

Assuming the / character only occurs in field #2, delete everything after field #2; then remove the directory name:

 Cksum path/path2/f1.txt | 
 sed 's# [^/]*$##;s# .*/# #'

edited Aug 18 '21 at 03:17

answered Aug 18 '21 at 02:41

agc

7,973
2
29
50

Strip directory from filename in checksum output

6 Answers6