YAML
YAML, or YAML Ain’t Markup Language, is a human-readable data serialization format. It is commonly used to store configuration data for software applications.
The official site can be found at https://yaml.org/ (which is written in YAML format!!!).
JSON is a valid subset of the YAML specification, which means a .json
file is a valid YAML file, and you can embed JSON syntax inside a YAML file.
History
YAML initially stood for Yet Another Markup Language. However, it was later changed to YAML Ain’t Markup Language (a recursive acronym) to emphasize its data-oriented nature and distinguish it from document markup1.
A List
All members of a basic list start with -
(a hash and a space):
You can also specify a list in a much more condensed form:
This terser syntax is called a Flow collection.
A Collection (Dictionary)
All items in a dictionary take the form key: value
(there must be a space after the colon):
Comments
YAML comments begin with a #
(hash symbol, or Octothorpe if you’re feeling fancy).
PyYAML
Interestingly, Python is not shipped with a built-in YAML parser. The most popular library for parsing YAML in Python is PyYAML (pyyaml
). You can install it via pip
:
Here is an example of how to read a YAML file:
Line Width
By default, yaml.dump
inserts new lines when a line reaches around 80-100 characters. You can change this with the width
parameter:
Prevent Aliases
By default, PyYAML will use aliases to when it comes across references to the same object in memory when serializing (dumping). The following example shows a basic example which doesn’t use aliases, because string assignment in Python creates a new object:
But things happen differently for a datetime
object, in where the assignment just takes a reference to the object:
date_copy
is now just *id001
, which is an alias (think: reference) to the date
object above which has been given the anchor &id001
2. While this is valid YAML (and in fact it’s required to work this way, see below), in many cases it’s not what you want. Some parsers may not handle aliases correctly, and it also makes it hard to a user to read or update things (imagine a big file with many aliases!). I would submit that most of time it’s preferable to copy the data. To do this, you need to tell PyYAML to ignore aliases. You can do this on a global level with:
Or even better, you can subclass yaml.SafeDumper
and override the ignore_aliases
method. This has the advantage of not being a global change and affecting other parts of your code (unless that’s what you want):
Either of these methods would then result in yaml.dump()
giving you a string like this:
Footnotes
-
Wikipedia (2024, Jul 19). YAML. Retrieved 2024-08-31, from https://en.wikipedia.org/wiki/YAML. ↩
-
BitBucket. YAML anchors. Retrieved 2024-09-01, from https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/. ↩
-
TTL255 - Przemek Rogala’s blog. YAML Anchors and Aliases. Retrieved 2024-09-01, from https://ttl255.com/yaml-anchors-and-aliases-and-how-to-disable-them/. ↩