0

Given an excel column containing filepaths, what excel formula returns only the file extension?

src\main\java\com\something\proj\UI.java --> java src\main\java\com\something\proj\Server.scala --> scala src\main\java\com\something\proj\include.h\someinclude.hpp --> hpp

Note 1: this formula works great for filepaths with only a single period, but not for case 3:
=IF(A1="","",RIGHT(A1,LEN(A1)-FIND(".",A1)))

Note 2: I understand that these filepaths are Windows-specific, I don't need a cross-platform solution.

Related: triming extension from filename in Excel and How to extract file name from path?

Community
  • 1
  • 1
Brad Wehrwein
  • 155
  • 2
  • 9

4 Answers4

3

With data in A1, use:

=SUBSTITUTE(RIGHT(SUBSTITUTE(A1,".",REPT(".",999)),999),".","")

enter image description here

From:

Jim Cone's old post

Gary's Student
  • 95,722
  • 10
  • 59
  • 99
  • Awesome, thanks! Trying to understand the approach - can you explain why this works? – Brad Wehrwein Nov 28 '16 at 16:51
  • 1
    @BradWehrwein It works because it separates any characters after the last period away from all other characters by 999 dots, or whatever number you choose. You can then select the rightmost 999 characters knowing that you are selecting nothing but your extension characters and a bunch of periods. Substitute out the periods and you're set to go. – u8it Feb 02 '18 at 22:51
1

This will find everything after the last .:

=MID(A1,FIND("{{{",SUBSTITUTE(A1,".","{{{",LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))+1,LEN(A1))

enter image description here

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
  • Cool, thanks! I'm trying to understand how this works - can you explain the approach? – Brad Wehrwein Nov 28 '16 at 16:46
  • @user3633993 first it counts the number of `.` and replaces the last `.` with `{{{`. Then it finds that `{{{` and splits the text string on it. – Scott Craner Nov 28 '16 at 16:50
  • 2
    It might be safer to use a disallowed character like `*` to split on. It's a corner case, but `{{{.foo` is a valid file name. – Comintern Nov 28 '16 at 17:32
0

Here's a nice long answer. :-)

=SUBSTITUTE(A1,LEFT(A1,FIND(CHAR(1),SUBSTITUTE(A1,".",CHAR(1),LEN(A1)-LEN(SUBSTITUTE(A1,".",""))))),"")
Brian
  • 2,078
  • 1
  • 15
  • 28
0

A neat trick I sometimes use for string parsing in general is to leverage the FilterXML() function (Excel 2013 and later). The basic strategy is to use Substitute() to format your string in a way that it is parsed across elements in an xml string, and then you can use xpath syntax to conveniently navigate your parsed elements. Using this strategy, getting an extension would look like this...

=FILTERXML("<A><p>" & SUBSTITUTE(A1,".","</p><p>.")&"</p></A>","//p[last()]")

If you're not familiar with xml, this can seem intimidating, but if you can grasp what's going on, I find it to be cleaner, more flexible, and easier to remember than the alternative approaches using len(), substitute(), etc. One reason why it's nicer is because there's only one cell reference.

Illegal Characters

There are two characters that are allowed in paths but not in xml: & and '

The equation above will work if these characters are not present, otherwise, they will need to be handled something like this...

=FILTERXML("<A><p>" & SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(J8,"'",""),"&",""),".","</p><p>.")&"</p></A>","//p[last()]")

Example

Suppose we have a nasty file path like this:

C:\Folder1\Folder2\(ugly characters !@#$%^()_+={};;,`)\two.dots.LongExt

1.) The Substitution() portion will convert it to an xml string like this...

<A>
    <p>
        C:\Folder1\Folder2\(ugly characters !@#$%^()_+={};;,`)\two
    </p>
    <p>
        .dots
    </p>
    <p>
        .txt
    </p>
</A>

2.) Once formatted like this, it's trivial to pick out the last p element using the xpath syntax //p[last()].

u8it
  • 3,956
  • 1
  • 20
  • 33