I have a project to extract keywords from URLs generated from a Search Query using R. Then, identify the most frequent keywords, compute for the TF-IDF, etc. of these extracted keywords.
Being new to R, I have tried the following approach. I used two different links then I:
STEP 1: I did keyword extraction using this code: Web Scraping and Text Mining in R. I ran this code twice because I am extracting 2 URLs by changing the links in the getURL() in the code. RESULT: I have 1 dtm for each URL extracted.
STEP 2: To compute for the tf-idf, I analyzed and used Chapter 3 in this document: http://tidytextmining.com/tfidf.html. I patterned my data based on the document by:
- Converting each dtm into a dataframe
- Add new columns "Sitename/URL" and "Total no. of Terms" to the dataframes
- Append dataframe of link2 to link1 as they have the same columns
- Then I used the formula in the document to compute for the 'term frequency' and the function bind_tf_idf for the computation of tf-idf
The objective is to extract keywords from the URLs generated from a Search Query. I have already generated URLs by using the following code: How to get google search results. (see below for the snippet of code)
Once extracted, identify the number of occurrence of these keywords, the most frequent keywords used, then compute for the TF-IDF of these keywords.
As a beginner, this is the best I could come up with (i did really try though), but I definitely think that there's a better approach in doing this rather than doing Step 1 and Step 2 for every URLs.
Your help and/or feedback on this is greatly appreciated.
> search.term <- "tour package"
> quotes <- "FALSE"
> search.url <- getGoogleURL(search.term=search.term, quotes=quotes)
> links <- getGoogleLinks(search.url)
> links <- gsub('/url\\?q=','',sapply(strsplit(links[as.vector(grep('url',links))],split='&'),'[',1))
> links
[1] "https://www.makemytrip.com/holidays-india/"
[2] "https://www.makemytrip.com/holidays-india/"
[3] "https://www.yatra.com/india-tour-packages"
[4] "http://www.thomascook.in/tcportal/international-holidays"
[5] "https://www.yatra.com/holidays"
[6] "https://www.travelguru.com/holiday-packages/domestic-packages.shtml"
[7] "https://www.chanbrothers.com/package"
[8] "https://www.tourmyindia.com/packagetours.html"
[9] "http://traveltriangle.com/tour-packages"
[10] "http://www.coxandkings.com/bharatdeko/"
[11] "https://www.sotc.in/india-tour-packages"