-2

How to extract root domain in C#?

www.google.co.in => google.co.in
google.co.in => google.co.in
coo.coo.coo.coo.com => coo.com
www.google.com => google.com

Do I have to hardcode all the top-level domains into my application?

The code I have is found on every topic concerning this problem:

string domainName = host.Split('.')[host.Split('.').Count() - 2] + "." +
                                    host.Split('.')[host.Split('.').Count() - 1];

But it doesn't work for domains like google.co.uk (as it returns co.uk)

Edit:

What I have found working so far is doing a HTTP request to "http://whois.domaintools.com/www.domain.org" which returns a 301 response code with a url containing the root domain. This is the most reliable solution for me at the moment. Maybe there is another free API for doing this?

Darxis
  • 1,482
  • 1
  • 17
  • 37
  • the top levels above are `in` and `com` – Daniel A. White Jun 17 '14 at 17:38
  • I know... I want to have a piece of code that will do it for any domain – Darxis Jun 17 '14 at 17:39
  • what are you attempting to accomplish? – Daniel A. White Jun 17 '14 at 17:39
  • As written in my question: I want to extract the root domain for any subdomain given – Darxis Jun 17 '14 at 17:40
  • but maybe theres a better way to solve it if you just provided some context. – Daniel A. White Jun 17 '14 at 17:41
  • I have a C# application. Users type domains in it, and I just want to remove subdomain prefixes like "ftp." "www." to have only the root domain name – Darxis Jun 17 '14 at 17:43
  • there isn't an algorithmic way to determine that. www.google.com and google.com might not have the same IP address. – Daniel A. White Jun 17 '14 at 17:45
  • That's why I want to know if there is any way of doing it simply using built-in C# classes, or do I have to manually hardcode every top-level domain names (.com .net .co.it etc) – Darxis Jun 17 '14 at 17:46
  • The downvotes are likely for lack of research. You have shown a requirement with no attempt to solve it. Also, the complaint doesn't belong in your question (a quick question to that effect in the comments is fine, but please don't rant). I have rolled back the edit. – BradleyDotNET Jun 17 '14 at 17:56
  • The problem here is that `.co.it` is not a top-level domain. It's `.it` that is the top-level domain. You can't tell how many sub levels there are without a list that determines how different top-level domains are used. Some have multiple uses even, for example both `google.uk` and `google.co.uk` are possible domains. – Guffa Jun 17 '14 at 17:57
  • Hmm, no research you say? I have found so much code for this, but I have not found any code that works for any domain. The codes on the Internet works like this: take the substring between the last by one dot, and the end of the string. And it doesn't work for domains like google.co.uk – Darxis Jun 17 '14 at 18:01
  • 1
    That kind of information is good to include in your question, as it shows what you know (doesn't) work and perhaps shows where you are trying to go, etc. Knowing that **vastly** improves the question quality. Think about it from our end. Read your question again and see if *you* would think you had done any research/tried anything. – BradleyDotNET Jun 17 '14 at 18:05

1 Answers1

3

The general problem is not at all simple. The rules for what constitutes a valid domain name are set by the authorities that control each top level domain (i.e. .com, .uk, .au, etc.).

Mozilla has an initiative called the Public Suffix List, in which they maintain a list of currently known public suffixes for all TLDs. The list is formatted such that some fairly simple code can interpret it and extract the root domain name from a given host name.

The list itself is available from https://publicsuffix.org/. There you can learn about the format of the list, download the list, and get other information. See the Stack Overflow question, Get the subdomain from a URL, for links to implementations in many different languages, including C#.

Community
  • 1
  • 1
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351