This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.
Date: 2013-08-03
Tags: computers protocols data programming
Because the current standards (CSV and TSV) aren’t really standardized, I propose a new standard: ASCII Data File (ADF). The downside is that it won’t be (easily) editable in a text-editor (without extensions). The upside is that my proposed format would handle more complex data than CSV or TSV. Don’t get me wrong, CSV and TSV (and in general all Delimiter separated Values) will always be useful for simple datatypes and I don’t expect them to ever go away.
The field type is also encoded in the header
My proposal is to do this:
Types (May be combined with flags):
0 | 8-bit | 1 byte |
1 | 16-bit | 2 byte |
2 | 32-bit | 4 byte |
3 | 64-bit | 8 byte |
4 | 128-bit | 16 byte |
5 | 256-bit | 32 byte |
6 | 512-bit | 64 byte |
7 | 1024-bit | 128 byte |
8 | 2048-bit | 256 byte |
9 | 4096-bit | 512 byte |
A | 8196-bit | 1024 byte |
B | ||
C | ||
D | ||
E | ||
F | 32-bit length, Pascal String. 8-bit Safe |
Flags(Set/Unset):
0×10 | With 8-bit type: With 16-bit type: With 32-bit, 64-bit types: Unix Datetime With 128-bit type: 128-bit UUID. With 256-bit type: With 512-bit type: With 1024-bit type: |
0×20 | Floating Point (IEEE 754) / Integer For DateTime, Seconds/Nanoseconds |
0×40 | Signed/Unsigned When the floating-point and unsigned are set, the value is text |
<SOH> |
Metadata Fields * # of Metadata fields |
<SOH> |
Header Fields * # of fields |
<SOT> |
Record Value * # of records |
Type | Field Name Encoding |
Type | Field Value Encoding |
Metadata Name | |
Metadata Value | |
Type | Field Value 2 Encoding (if required. May be repeated as defined) |
Metadata Value | If required. May be repeated as defined. |
<RS> |
0 | SHA-512 Hash | Calculated without this record or signature records. (Not repeatable) |
1 | Number of fields. | Must be kept up-to-date if used (Not repeatable) |
2 | Number of records. | Must be kept up-to-date if used (Not repeatable) |
3 | RSA Signature. | Includes no signatures records. Requires 2 values: key fingerprint, signature. |
4 | DSA Signature. | Includes no signature records. Requires 2 values: key fingerprint, signature. |
5 | ECDSA Signature. | Includes no signature records. Requires 2 values: key fingerprint, signature. |
6 | Date edited | Must be 64-bit unix timestamp. (Not repeatable) |
7 | Author | Will more-than-likely be a string. May be in RFC 8222 format, otherwise taken as a literal |
8 | Copyright | Contains the name of the copyright used in field 1, and a URL to it in field 2 (URL may be a data: url (e.g.: data:text/plain,All Rights Reserved) (Not repeatable) |
9 | Distribution Status | e.g.: “This document may not be distributed outside of XYZ Inc.” or “No restrictions” (Not repeatable) |
Note on signatures: When editing a document, all the signatures on it are remove. (May be replaced with the saver’s own signature). Non-editors can add their own signature to a document with 0 or more signatures.
Type | Field Name Encoding |
Type | Defined Type for field |
Field Name |
Value | The value as encoded by the field type (since the type is known, there is no need to delimit it) |
Value Fields * # of fields |
This is an example file that stores a table of squares stored as 8-bit integers. The field names are simply V(alue) and S(quare)
<SOH><0x00><0x12><0x06><0x51><0xFF><0xEA><0x7C><SOH><0x70><0x00>V<0x70><0x00>S<SOT><0x00><0x00><0x01><0x01><0x02><0x04><0x03><0x09><0x04><0x10><0x05><0x19><0x06><0x24><0x07><0x31><0x08><0x40><0x09><0x51><0x0A><0x64>
This says that the file was last edited on 1375726204 (2013-08-05T18:10:04Z), has 2 fields (both 1 char long) named V and S. The data portion has the following table
0 | 0 |
1 | 1 |
2 | 4 |
3 | 9 |
4 | 16 |
5 | 25 |
6 | 36 |
7 | 49 |
8 | 64 |
9 | 81 |
10 | 100 |
The following is an equivalent CSV (less the metadata)
V,S 0,0 1,1 2,4 3,9 4,16 5,25 6,36 7,49 8,64 9,81 10,100