0

Here is a piece of html to be parsed:

<td style="text-align:center; color:black; background:#ff6666; border:2px solid #8811ff;"   title="Alkali metals; Primordial; Solid">
37  
<br />  
<a title="Rubidium" href="/wiki/Rubidium">Rb</a>  
</td>`  

I can get the values with xmlValue. What I get is:

text   br    a   
"19"   ""  "K"   

But I want to get the value of attribute, in td, the attribute of title, value is "Alkali metals; Primordial; Solid" in a , the attribute of title, value is "Rubidium".

How can I get it?

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
Dd Pp
  • 5,727
  • 4
  • 21
  • 19
  • BTW: maybe [this](http://stackoverflow.com/questions/4393780/scraping-a-wiki-page-for-the-periodic-table-and-all-the-links) SO question does help you. – sgibb Sep 02 '12 at 10:30

1 Answers1

3

You could use xmlAttrs or xmlGetAttr (see ?xmlAttrs for details).

html <- '<td style="text-align:center; color:black; background:#ff6666; border:2px solid #8811ff;" title="Alkali metals; Primordial; Solid">37<br /><a title="Rubidium" href="/wiki/Rubidium">Rb</a></td>'
td <- xmlRoot(xmlParse(html))
xmlAttrs(td)["title"]
# "Alkali metals; Primordial; Solid"
xmlAttrs(xmlChildren(td)$a)["title"]
# "Rubidium

# or
xmlGetAttr(td, "title")
# "Alkali metals; Primordial; Solid"
sgibb
  • 25,396
  • 3
  • 68
  • 74