I web scraped the data using the following code:
# personal skills
skills = soup.findAll("li", {"data-ng-repeat": "xxxskillDetailsxxx"})
for i in skills:
print(str(i.get_text())) # The output has 5 different skills, for example.
# languages
languages = soup.findAll("li", {"data-ng-repeat": "xxxlanguagesxxx"})
for n in languages:
print(str(n.get_text())) # The output has 3 different languages, for instance.
The above code works well if I just print it. However, if I use the following code to save the data and then later save it as a dataframe, only the last element was saved. That is, only the last skill and the last language were saved.
data=[]
for url in df.urls[:10]:
webdriver.get(url)
time.sleep(5)
soup = BeautifulSoup(webdriver.page_source, 'html.parser')
# personal skills
skills = soup.findAll("li", {"data-ng-repeat": "xxxskillDetailsxxx"})
for i in skills:
data.append(str(i.get_text()))
# languages
languages = soup.findAll("li", {"data-ng-repeat": "xxxlanguagesxxx"})
for n in languages:
data.append(str(n.get_text()))
print(data) # Output: only the last skill output and the last language output are printed.
How could I save all skills and languages into two different columns and within the column, it is separate by commas?
I searched results online for a while, but I did not find a good solution. Any suggestion is highly appreciated. Thank you.