1

Original YAML file is

toplevel:
  #comment1
  hello: gut
  #comment2
  howdy: gut #horizontalcomment
  #comment3
  #comment4
  gets: gut
  #comment5

In python, I did

yml = ruamel.yaml.round_trip_load(yaml_input_str)
exec("del yml['toplevel']['gets']")
output_str = ruamel.yaml.round_trip_dump(yml)

output_str becomes

toplevel:
  #comment1
  hello: gut
  #comment2
  howdy: gut #horizontalcomment
  #comment5

and comment3 and comment4 disappear. Is this as-designed or is it a bug?

SuperStormer
  • 4,997
  • 5
  • 25
  • 35
A. Leung
  • 21
  • 3
  • 1
    Look at this answer: http://stackoverflow.com/a/38252508/252648. Looks like the handling of comments are not fully done. – Turing Nov 29 '16 at 21:57
  • 1
    @mangoDrunk The intention of `ruamel.yaml`'s round-trip mode is to be able to load a YAML file (e.g. used for configuration), change some values and write the file back out without losing the comments. What the author, of the post you link to, tries to do lies beyond it designed purpose. There just happen to be some mechanisms in `ruamel.yaml` that make it more easy to achieve that goal, than starting from other Python packages in which comments are not fully done (i.e. every other Python package), including that the comments are not thrown away while reading. – Anthon Nov 30 '16 at 10:20

1 Answers1

0

That is as designed. But is, as so much of ruamel.yaml, underdocumented, primarily as a result of laziness of the package author.

The comments are associated with the keys, not with some position relative to the beginning of a mapping. Because of the way the YAML is tokenized the comments are associated with a following key. The result of this is that if there is no longer a key, the comment is still available, but no longer emitted.

That side effect of this is that if you do:

import sys
import ruamel.yaml

yaml_str = """\
toplevel:
  #comment1
  hello: gut
  #comment2
  howdy: gut #horizontalcomment
  #comment3
  #comment4
  gets: gut
  #comment5
"""

data = ruamel.yaml.round_trip_load(yaml_str)
del data['toplevel']['gets']
ruamel.yaml.round_trip_dump(data, sys.stdout)

you get your output_str, and then if you follow that by:

data['toplevel']['gets'] = 42
ruamel.yaml.round_trip_dump(data, sys.stdout)

you'll get:

toplevel:
  #comment1
  hello: gut
  #comment2
  howdy: gut #horizontalcomment
  #comment3
  #comment4
  gets: 42
  #comment5

so the comments "magically" reappear.

If you want to move the comments to the end of the nested mapping (with key 'toplevel'), you can do:

comment_tokens = data['toplevel'].ca.items['gets'][1]
del data['toplevel'].ca.items['gets']  # not strictly necessary
data['toplevel'].ca.end = comment_tokens

and you'll get:

toplevel:
  #comment1
  hello: gut
  #comment2
  howdy: gut #horizontalcomment
  #comment3
  #comment4
  #comment5

which is probably what you expected in the first place.


That leaves me wondering why you use exec() instead of directly using:

del yml['toplevel']['gets']
Anthon
  • 69,918
  • 32
  • 186
  • 246