-1

I am trying to scratch from a spanish website: 'https://www.marca.com/futbol/real-madrid.html?intcmp=MENUESCU&s_kw=realmadrid'

But when parsing the headlines I recieve the following text: "Rafa Mar�n y Peter, los �nicos canteranos para la Copa adem�s de los porteros Fuidias y Diego". So I am trying to get rid of the � and parse the correct ó ñ ¿ á é í ú... characters.

I am scratching the data as follows next:

axios.get(newspaper.address)
.then((response) =>{
    const html = response.data;
    const $ = cheerio.load(html);
    $('.mod-title > a', html).each(function(){
        const headline = $(this).text().trim();
        const link = $(this).attr('href');
        if(!articles.some(article => article.headline == headline)){
            articles.push({source:newspaper.name, headline, link});
        }
    });
}).catch((err) => console.log(err));

I do not really know how change the encoding and what encoding use.

LuisOB33
  • 1
  • 2

1 Answers1

-1

It does seem as you are fetching a web page with a different encoding.

  1. What you can do is request with responeType and responseEncoding as shown below:
const response = await axios.request({
  method: 'GET',
  url: 'https://www.WantedWebsite.com',
  responseType: 'arraybuffer',
  responseEncoding: 'binary'
});
  1. You then have to decode the data So you can use it for your format!
let html = iso88592.decode(response.data.toString('binary'));

You might have to edit a few things but this might be a solution to your problem.

Hope this helps, good luck deving!