Skip to content
Published On:
Jan 4, 2019
Last Updated:
Aug 31, 2024

YAML, or YAML Ain’t Markup Language, is a human-readable data serialization format. It is commonly used to store configuration data for software applications.

The official site can be found at https://yaml.org/ (which is written in YAML format!!!).

JSON is a valid subset of the YAML specification, which means a .json file is a valid YAML file, and you can embed JSON syntax inside a YAML file.

History

YAML initially stood for Yet Another Markup Language. However, it was later changed to YAML Ain’t Markup Language (a recursive acronym) to emphasize its data-oriented nature and distinguish it from document markup1.

A List

All members of a basic list start with - (a hash and a space):

colors:
- red
- green
- blue

You can also specify a list in a much more condensed form:

colors: ["red", "green", "blue"]

This terser syntax is called a Flow collection.

A Collection (Dictionary)

All items in a dictionary take the form key: value (there must be a space after the colon):

my_dict:
date: 2019-01-01
color: red
location: north

Comments

YAML comments begin with a # (hash symbol, or Octothorpe if you’re feeling fancy).

# This is a comment

PyYAML

Interestingly, Python is not shipped with a built-in YAML parser. The most popular library for parsing YAML in Python is PyYAML (pyyaml). You can install it via pip:

Terminal window
pip install pyyaml

Here is an example of how to read a YAML file:

import yaml
with open("config.yaml", "r") as file:
config = yaml.safe_load(file)

Line Width

By default, yaml.dump inserts new lines when a line reaches around 80-100 characters. You can change this with the width parameter:

yaml.dump(data, width=200) # Longer lines
yaml.dump(data, width=float("inf")) # Infinitely long lines

Prevent Aliases

By default, PyYAML will use aliases to when it comes across references to the same object in memory when serializing (dumping). The following example shows a basic example which doesn’t use aliases, because string assignment in Python creates a new object:

data = {}
data['name'] = 'Bob'
data['name_copy'] = data['name']
print(yaml.dump(data))
# Output:
# name: Bob
# name_copy: Bob

But things happen differently for a datetime object, in where the assignment just takes a reference to the object:

data = {}
data['date'] = datetime.datetime.now().date()
data['date_copy'] = data['date']
print(yaml.dump(data))
# Output:
# date: &id001 2024-09-01
# date_copy: *id001

date_copy is now just *id001, which is an alias (think: reference) to the date object above which has been given the anchor &id0012. While this is valid YAML (and in fact it’s required to work this way, see below), in many cases it’s not what you want. Some parsers may not handle aliases correctly, and it also makes it hard to a user to read or update things (imagine a big file with many aliases!). I would submit that most of time it’s preferable to copy the data. To do this, you need to tell PyYAML to ignore aliases. You can do this on a global level with:

yaml.SafeDumper.ignore_aliases = lambda *args : True

Or even better, you can subclass yaml.SafeDumper and override the ignore_aliases method. This has the advantage of not being a global change and affecting other parts of your code (unless that’s what you want):

# As per https://github.com/yaml/pyyaml/issues/535
class VerboseSafeDumper(yaml.SafeDumper):
def ignore_aliases(self, data):
return True
yaml.dump(Dumper=VerboseSafeDumper)

Either of these methods would then result in yaml.dump() giving you a string like this:

# Output:
# date: 2024-09-01
# date_copy: 2024-09-01

Footnotes

  1. Wikipedia (2024, Jul 19). YAML. Retrieved 2024-08-31, from https://en.wikipedia.org/wiki/YAML.

  2. BitBucket. YAML anchors. Retrieved 2024-09-01, from https://support.atlassian.com/bitbucket-cloud/docs/yaml-anchors/.

  3. TTL255 - Przemek Rogala’s blog. YAML Anchors and Aliases. Retrieved 2024-09-01, from https://ttl255.com/yaml-anchors-and-aliases-and-how-to-disable-them/.