About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages

Quotes

Links

Presence Elsewhere

jim@jimkeener.com

GitHub

BitBucket

ADF -- ASCII Data File

Date: 2013-08-03
Tags: computers protocols data programming

Because the current standards (CSV and TSV) aren’t really standardized, I propose a new standard: ASCII Data File (ADF). The downside is that it won’t be (easily) editable in a text-editor (without extensions). The upside is that my proposed format would handle more complex data than CSV or TSV. Don’t get me wrong, CSV and TSV (and in general all Delimiter separated Values) will always be useful for simple datatypes and I don’t expect them to ever go away.

The field type is also encoded in the header

My proposal is to do this:

Type encodings

Types (May be combined with flags):

0 8-bit 1 byte
1 16-bit 2 byte
2 32-bit 4 byte
3 64-bit 8 byte
4 128-bit 16 byte
5 256-bit 32 byte
6 512-bit 64 byte
7 1024-bit 128 byte
8 2048-bit 256 byte
9 4096-bit 512 byte
A 8196-bit 1024 byte
B
C
D
E
F 32-bit length, Pascal String. 8-bit Safe

Flags(Set/Unset):

0×10 With 8-bit type:
With 16-bit type:
With 32-bit, 64-bit types: Unix Datetime
With 128-bit type: 128-bit UUID.
With 256-bit type:
With 512-bit type:
With 1024-bit type:
0×20 Floating Point (IEEE 754) / Integer
For DateTime, Seconds/Nanoseconds
0×40 Signed/Unsigned
When the floating-point and unsigned are set, the value is text

File Format

<SOH>
Metadata Fields * # of Metadata fields
<SOH>
Header Fields * # of fields
<SOT>
Record Value * # of records

Metadata header

Metadata Fields

Type Field Name Encoding
Type Field Value Encoding
Metadata Name
Metadata Value
Type Field Value 2 Encoding (if required. May be repeated as defined)
Metadata Value If required. May be repeated as defined.
<RS>

Pre-defined Metadata field values (as 8-bit integer):

0 SHA-512 Hash Calculated without this record or signature records. (Not repeatable)
1 Number of fields. Must be kept up-to-date if used (Not repeatable)
2 Number of records. Must be kept up-to-date if used (Not repeatable)
3 RSA Signature. Includes no signatures records. Requires 2 values: key fingerprint, signature.
4 DSA Signature. Includes no signature records. Requires 2 values: key fingerprint, signature.
5 ECDSA Signature. Includes no signature records. Requires 2 values: key fingerprint, signature.
6 Date edited Must be 64-bit unix timestamp. (Not repeatable)
7 Author Will more-than-likely be a string. May be in RFC 8222 format, otherwise taken as a literal
8 Copyright Contains the name of the copyright used in field 1, and a URL to it in field 2 (URL may be a data: url (e.g.: data:text/plain,All Rights Reserved) (Not repeatable)
9 Distribution Status e.g.: “This document may not be distributed outside of XYZ Inc.” or “No restrictions” (Not repeatable)

Note on signatures: When editing a document, all the signatures on it are remove. (May be replaced with the saver’s own signature). Non-editors can add their own signature to a document with 0 or more signatures.

Field Header

Header Fields:

Type Field Name Encoding
Type Defined Type for field
Field Name

Records

Value Fields:

Value The value as encoded by the field type (since the type is known, there is no need to delimit it)

Record Value:

Value Fields * # of fields

Example

This is an example file that stores a table of squares stored as 8-bit integers. The field names are simply V(alue) and S(quare)

<SOH><0x00><0x12><0x06><0x51><0xFF><0xEA><0x7C><SOH><0x70><0x00>V<0x70><0x00>S<SOT><0x00><0x00><0x01><0x01><0x02><0x04><0x03><0x09><0x04><0x10><0x05><0x19><0x06><0x24><0x07><0x31><0x08><0x40><0x09><0x51><0x0A><0x64>

This says that the file was last edited on 1375726204 (2013-08-05T18:10:04Z), has 2 fields (both 1 char long) named V and S. The data portion has the following table

0 0
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100

The following is an equivalent CSV (less the metadata)

V,S
0,0
1,1
2,4
3,9
4,16
5,25
6,36
7,49
8,64
9,81
10,100