0
import requests
from bs4 import BeautifulSoup
import pprint

response= requests.get("https://www.biospace.com/news/")
soup = BeautifulSoup(response.text, 'html.parser')

link= soup.select(".lister__header")
#print(link)

def list_articles(link):
    articles=[]
    for index, item in enumerate(link):
        title= link[index].getText()
        atab= link[index].a
        href= atab.get('href') 
        url= 'https://www.biospace.com'+href
        articles.append({'title':title, 'link':url})
    return articles

pprint.pprint(list_articles(link))

with open('Current_BioSpace_Articles.txt', mode='w') as file:
    text_file=(list_articles(link)) 
    file.write(f"{text_file}")

On my powershell the links and titles are displayed neatly however, on my text document its very messy I tried to do a for loop to write each key and value of the dictionary in a separate line but it is wrapped in a list.

UPDATE: So I was able to iterate over each element in the list. by doing the following-

    text_file=(list_articles(link)) 
    for element in text_file:
        file.write(f"{element}\n")

And it comes out like this in the text document:

{'title': "title of article" , 'link' : "url"} and each line has the title of the article and the url link for only one article.

Is there any way to put the the title and url on seperate lines and a space in between each article for example: ............................

title: "title of article"
link: "url"

title: "title of article"
etc.

Nakul Joy
  • 3
  • 2
  • What is the current output in the file and what is the expected output in the file? Please provide some samples. – ywbaek Mar 27 '20 at 17:22
  • Hi thanks for replying I posted an update above in the question and my desired output in the text file – Nakul Joy Mar 27 '20 at 17:41

1 Answers1

0

Seems you would want to prettyprint to the text file so the result will resemble what you get in the PowerShell window.

This can be done by modifying the output portion of your code as follows per this post

import print

# Create result to output
text_file = list_articles(link) 

# Pretty print to console
pprint.pprint(text_file)

# Pretty print to file
with open('Current_BioSpace_Articles.txt', 'wt') as file:
    pprint.pprint(text_file, stream = file)
DarrylG
  • 16,732
  • 2
  • 17
  • 23