3

I have a table which include some domain names

    site
1 Google.com
2 yahoo.in
3 facebook.com
4 badge.net

So, I want to remove all the words after "." for example ( .com, .net, .in). I used below function but that convert my string into numeric form.

gsub("\\..*","",df)
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
Sahil Desai
  • 3,418
  • 4
  • 20
  • 41
  • 1
    The http://stackoverflow.com/questions/14173754/splitting-a-file-name is not a good close source, because here, the *first* comma is found and all after it is removed. With filepath splitting, the algorithm is different. – Wiktor Stribiżew Sep 01 '16 at 09:47
  • 1
    This is maybe also also relevant http://stackoverflow.com/questions/19020749/function-to-extract-domain-name-from-url-in-r – David Arenburg Sep 01 '16 at 13:49

1 Answers1

6

You're working with domain names, so you may want to use some tools that were designed to do so:

library(urltools)

df <- data.frame(site=c("Google.com", "yahoo.in", "facebook.com", "badge.net"))

suffix_extract(df$site)
##           host subdomain   domain suffix
## 1   Google.com      <NA>   google    com
## 2     yahoo.in      <NA>    yahoo     in
## 3 facebook.com      <NA> facebook    com
## 4    badge.net      <NA>    badge    net

for @Sotos:

urltools::suffix_extract('www.bankofcyprus.com')
##                   host subdomain       domain suffix
## 1 www.bankofcyprus.com       www bankofcyprus    com
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205