CSV (Comma-Separated Value) Files

Terminology

A CSV file is made up of many records, where a record is a single entire line in the file. Each new line is signified with the record delimiter (typically a carriage return and line feed character, CRLF or \r\n). A record is made up of many fields, each which are separated by the field delimiter (typically a comma or a tab character).

General Rules

Each record is located on a different line, and each line is separated by the carriage return and line break characters (the record delimiter).

The last record in the file may or may not have a record delimiter at the end of it.

xxx,yyy,zzz CRLF (record delimiter on last line)

xxx,yyy,zzz (no record delimiter on last line)

There is an optional header record at the top of the file (the first line), with the same format as the rest of the records. The header contains names/descriptions of the fields in the file.
```
Field1Name, Field2Name, Field3Name

aaa,bbb,ccc

xxx,yyy,zzz
```
White-space (space and tab characters) are considered part of the field, and shall be preserved (i.e. not stripped) when parsing CSV files.

Standards

There is no “official” standard for CSV files, however, RFC 4180 stands as the de facto standard (it formally registers the MIME type “text/csv”).

Excel

Excel supports the reading and writing of data to/from CSV files. One thing to watch out for when reading CSV files is that Excel will try and deduce the type of each data field. Some deductions are non-intuitive, such as Excels conversion of numbers within quotes to scientific notation.

For example, the data 12345678901234567890 will be shown as 1.23457E+19 (without highlighting the cell) when naively opened in Excel. The number in the cell will be 12345678901234500000 (still wrong).