Should I use YAML or JSON to store my Perl data?

16,134

Solution 1

YAML vs JSON is something very much not settled in Perl, and I will admit I tend to be in the middle of that. I would advice that either is going to get you about as much community traction. I'd make the decision based on the various pros and cons of the formats. I break down the various data serializing options like so (I'm going to community wiki this so people can add to it):

YAML Pros

  • Human friendly, people write basic YAML without even knowing it
  • WYSIWYG strings
  • Expressive (it has the TMTOWDI nature)
  • Expandable type/metadata system
  • Perl compatible data types
  • Portable
  • Familiar (a lot of the inline and string syntax looks like Perl code)
  • Good implementations if you have a compiler (YAML::XS)
  • Good ability to dump Perl data
  • Compact use of screen space (possible, you can format to fit in one line)

YAML Cons

  • Large spec
  • Unreliable/incomplete pure Perl implementations
  • Whitespace as syntax can be contentious.

JSON Pros

  • Human readable/writable
  • Small spec
  • Good implementations
  • Portable
  • Perlish syntax
  • YAML 1.2 is a superset of JSON
  • Compact use of screen space
  • Perl friendly data types
  • Lots of things handle JSON

JSON Cons

  • Strings are not WYSIWYG
  • No expandability
  • Some Perl structures have to be expressed ad-hoc (objects & globs)
  • Lack of expressibility

XML Pros

  • Widespread use
  • Syntax familiar to web developers
  • Large corpus of good XML modules
  • Schemas
  • Technologies to search and transform the data
  • Portable

XML Cons

  • Tedious for humans to read and write
  • Data structures foreign to Perl
  • Lack of expressibility
  • Large spec
  • Verbose

Perl/Data::Dumper Pros

  • No dependencies
  • Surprisingly compact (with the right flags)
  • Perl friendly
  • Can dump pretty much anything (via DDS)
  • Expressive
  • Compact use of screen space
  • WYSIWYG strings
  • Familiar

Perl/Data::Dumper Cons

  • Non-portable (to other languages)
  • Insecure (without heroic measures)
  • Inscrutable to non-Perl programmers

Storable Pros

  • Compact? (don't have numbers to back it up)
  • Fast? (don't have numbers to back it up)

Storable Cons

  • Human hostile
  • Incompatible across Storable versions
  • Non-portable (to other languages)

Solution 2

As with most things, it depends. I think if you want speed and interoperability (with other languages), use JSON, in particular JSON::XS.

If you want something that's only ever going to be used by Perl modules, stick with YAML. It's much more common to find Perl modules on CPAN that support data description with YAML, or which depend on YAML, than JSON.

Note that I am not an authority and this opinion is based largely on hunch and conjecture. In particular, I have not profiled JSON::XS vs. YAML::XS. If I am offensively ignorant, I can only hope I will make someone irate enough to bring useful information to the discussion by correcting me.

Solution 3

It's all about human-readability, if this is your main concern choose YAML:

YAML:

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

JSON:

{
  "american": [
    "Boston Red Sox", 
    "Detroit Tigers", 
    "New York Yankees"
  ], 
  "national": [
    "New York Mets", 
    "Chicago Cubs", 
    "Atlanta Braves"
  ]
}

Solution 4

The pure-Perl YAML implementation (YAML module as opposed to YAML::Syck) seems to have some serious problems. I recently ran into issues where it could not process YAML documents with very long lines (32k characters or so).

YAML is able to store and load blessed variables and does so by default (The snippet below was copied from a *sepia-repl* buffer in Emacs):

I need user feedback!  Please send questions or comments to [email protected].
Sepia version 0.98.
Type ",h" for help, or ",q" to quit.
main @> use YAML
undef
main @> $foo = bless {}, 'asdf'
bless( {}, 'asdf' )
main @> $foo_dump = YAML::Dump $foo
'--- !!perl/hash:asdf {}
'
main @> YAML::Load $foo_dump
bless( {}, 'asdf' )

This is quite scary security-wise because untrusted data can be used to call any DESTROY method that has been defined in your application -- or any of the modules it uses.

The following short program demonstrates the problem:

#!/usr/bin/perl
use YAML;
use Data::Dumper;
package My::Namespace;
sub DESTROY {
    print Data::Dumper::Dumper \@_;
}
package main;
my $var = YAML::Load '--- !!perl/hash:My::Namespace
bar: 2
foo: 1
';

JSON does not allow this by default -- it is possible to serialize Perl "objects", but in order to do that, you have to define TO_JSON methods.

Solution 5

I use YAML for tracking status of processes because I can read YML in the middle of the process. You (technically) need fully formed documents to read XML or JS. YAML is nice for tracking status because you can write lots of mini docs to a file. Otherwise, I usually go with XML or JS. Nice summary of pros & cons above, btw.

Share:
16,134
Paul Nathan
Author by

Paul Nathan

Software engineer/craftman/programmer Passion for quality and correctness as exemplified in the continuous improvement/kaizen model. Obsessive about tools and infrastructure Focused on working in Scala/Rust/Lisp/Haskell/OCaml Articulate Lisp - a Common Lisp environment tutorial site. Learn Common Lisp today!

Updated on June 06, 2022

Comments

  • Paul Nathan
    Paul Nathan almost 2 years

    I've been using the YAML format with reasonable success in the last 6 months or so.

    However, the pure Perl implementation of the YAML parser is fairly fidgety to hand-write a readable file for and has (in my opinion) annoying quirks such as requiring a newline at end of the file. It's also gigantically slow compared to the rest of my program.

    I'm pondering the next evolution of my project, and I'm considering using JSON instead (a mostly strict subset of YAML, as it turns out). But which format has the most community traction and effort in Perl?

    Which appears today to be the better long-term format for simple data description in Perl, YAML or JSON, and why?

  • Telemachus
    Telemachus over 14 years
    In the context of speed for something like this, I suspect that "not pure Perl" is better.
  • Paul Nathan
    Paul Nathan over 14 years
    In this specific context, due to circumstances out of my control, pure perl is required. But, I agree that not pure pure is almost certainly better for speed.
  • Telemachus
    Telemachus over 14 years
    @Paul Nathan: if the problem is compiling modules which involve C extensions, Storable is a built-in. So there's no need to get it later. If you have Perl, you already have Storable.
  • Paul Nathan
    Paul Nathan over 14 years
    Well then. Ignorance is exposed and reduced. :-)
  • Telemachus
    Telemachus over 14 years
    @Paul: not at all. It's a common misunderstanding. Many, many modules are core or built-ins. You can browse through them here: perldoc.perl.org/index-modules-A.html
  • mpeters
    mpeters over 14 years
    A friend recently did some benchmarks comparing Storable, JSON::XS and YAML::Syck for speed and JSON::XS came out faster by a long shot.
  • Paul Nathan
    Paul Nathan over 14 years
    +1 for good info & the security note.
  • Paul Nathan
    Paul Nathan over 14 years
    (1)The parser for that is string eval; (2)Data::Dumper load/store requires some gnarly coupling between the loading and storing routine. Also and most importantly, 'PON' is not a standard interchange format. JSON is, and so is YML(within Perl).
  • Jacob
    Jacob over 14 years
    There was a post somewhere on Stack Overflow about JSON being faster for a very simple, but large data structure than Storable.
  • Jacob
    Jacob over 14 years
    There is currently some work to move the Pure Perl YAML module to YAML::PP. Then having a new YAML module that automatically chooses one of the implementations for you. ( This is what the JSON module currently does ).
  • hillu
    hillu over 14 years
    Brad Gilbert: This seems a good thing. Are there any plans to break compatibility in favor of security -- at least for the Load function?
  • nick
    nick over 13 years
    Storable's output will be portable if you use nfreeze() instead of freeze()
  • Schwern
    Schwern over 13 years
    @nick I meant not portable across languages.
  • ericslaw
    ericslaw over 12 years
    Even though Storable is not 'pure perl', it is a core module, so 'should' be already available within your perl installation.
  • brian d foy
    brian d foy over 10 years
    Different people will have different answers about which one is more readable :)
  • brian d foy
    brian d foy over 10 years
    Storable has various security issues: masteringperl.org/2012/12/the-storable-security-problem
  • jmh
    jmh over 10 years
    While readability is somewhat subjective, but one of those looks like something I might write in an email and one doesn't.
  • dolmen
    dolmen over 8 years
    And in YAML you can put comments. This is not possible in JSON. This is a major concern for human-writability.
  • Chris Koknat
    Chris Koknat almost 4 years
    Benchmark results for JSON vs YAML are in one of the answers here: stackoverflow.com/questions/1726802/…
  • brian d foy
    brian d foy over 3 years
    Note that both Data::Dumper and Storable have security issues because they may attempt to inflate Perl objects. That runs code and it might not be the code you expected.
  • Handsome Nerd
    Handsome Nerd over 3 years
    @dolmen yep, but that is not usually a good reason to switch from "Json" to "Yaml"
  • dolmen
    dolmen over 3 years
    @PHPst The question is not about switching data format but about choosing the more appropriate one for a task.
  • Handsome Nerd
    Handsome Nerd over 3 years
    @dolmen, I know. I imply that JSON is the default choice unless the readability is important.