I am trying to scratch from a spanish website: 'https://www.marca.com/futbol/real-madrid.html?intcmp=MENUESCU&s_kw=realmadrid'
But when parsing the headlines I recieve the following text: "Rafa Mar�n y Peter, los �nicos canteranos para la Copa adem�s de los porteros Fuidias y Diego". So I am trying to get rid of the � and parse the correct ó ñ ¿ á é í ú... characters.
I am scratching the data as follows next:
axios.get(newspaper.address)
.then((response) =>{
const html = response.data;
const $ = cheerio.load(html);
$('.mod-title > a', html).each(function(){
const headline = $(this).text().trim();
const link = $(this).attr('href');
if(!articles.some(article => article.headline == headline)){
articles.push({source:newspaper.name, headline, link});
}
});
}).catch((err) => console.log(err));
I do not really know how change the encoding and what encoding use.