This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.
Date: 2013-10-04
Tags: computers protocols data programming
Please see rev3 for a newer revision
In a previous post I describe a binary file-format for the exchange of data. Here I would like to describe a format, but one limited to 7-bit ASCII and editable in a text-editor (without plugins).
Delimeters: , and ; and :
These can be escaped by:
, | $%% | |
; | $ |
$ | $$$ | |||
% | %%% |
I’ve chosen these so that tools such as awk can still be used to parse files that contain delimiters inside the values.
Metadata-header: valueField Name[;Type[:Type]…;Format[;Description]], (Field Name[;Type[,Type]&2a;;Format[;Description]])*
value(,value)*
value(,value)*
….
Last-Modified: 2013-08-06T08:52:00ESTValue;Int;Dec,Square;Int;Dec
0,0
1,1
2,2
3,9
4,16
5,25
6,36
7,59
8,64
9,81
10,100
Last-Modified: 2013-08-06T08:52:00ESTValue;Int:Fix2;Dec,Square;Int:Fix2;Dec
00,00
01,01
02,02
03,09
04,16
05,25
06,36
07,59
08,64
09,81
10,100
Value,Square 0,0 1,1 2,2 3,9 4,16 5,25 6,36 7,59 8,64 9,81 10,100
The last one looks familiar, huh?
Metadata will be stored in RFC 2616 Header format (HTTP/1.1 header syntax).
I propose the following headers:
Header | value | Example | Comment |
---|---|---|---|
Digest | digest base64-hash | sha1 yaUxx5mrIRyXNdovreYa/PFh0PE= | Calculated without this line |
Last-Modified | ISO 8601 date | 2013-08-06T08:52:00EST | |
Signature | user;key fingerprint;signature | Jim <jim@example.com>;f642a8d2552281d792b52a17cbe79f3163b296f3;MIGHAkER9CmV5WJPB3hnk9eD31oqhAKWTsXVKubdIffMM9ocjU667p5yDh8xrOuOx0T8xx2NTQgmnDgsrPaXLK8WiMEaaQJCAYn2TwWkSVpgTM7oFg3O6r9ZTSRTnqZhxyk3g7O1SDHcqxohBREITiMsIFFNjv6m6sj/M8e4ndlaHZVgv5J/T+NR |
Because of their size, EC keys are useful for this. Calculated without this line. |
Author | user | Jim <jim@example.com> |
|
Description | Description | This data is awesome! | |
Source | URL or description of the source of the data (where to go to find out more or who made it. May be repeated | http://example.com/dataor Jim's Lat @ HisHouseU |
Types may be chained, e.g. Bin32,Float
Type | Comment |
---|---|
Fix{n} | Fixed length field of length n |
Bin | Arbitrary binary data |
Integer | 2s Compliment if Binary |
Float | IEEE 754 if Binary |
UUID | UUID |
Text | Text |
Time | Date and/or Time |
CI{m} | Contains Confidence Interval with ; separator. One value is min and max intervals, two values are min;max. m is the percentage of the confidence interval (e.g. 90%, 95%, &c). CI is in the same encoding as the field |
Geometry | Stores Geometry types |
Fomat | Comment |
---|---|
Dec | Decimal (default) |
Hex | Hex encoded/Base-16 Encoded |
B32 | Base-32 Encoded |
B36 | Base-36 Encoded |
B58 | Base-58 Encoded |
B64 | Base-64 Encoded |
B85 | Base-85 Encoded |
UU | UUEncoded |
XX | XXEncoded |
UTF8 | UTF-8 encoded text |
ASCII | ASCII Text |
Latin1 | Latin1/ISO 8859/ Text |
NT | Null Terminated (useful in a fixed-length field) |
PS{n} | Pascal/Length-Prefixed String with n-byte length (useful in a fixed-length field) |
WKT | Well-Known Text (only valid for Geometry type) |
WKB | Well-Known Binary (only valid for Geometry type) |
UDT | Unix Date Time (only valid for time types) |
UMT | Unix Date Time (Miliseconds)/Javascript Time (only valid for time types) |
UNT | Unix Date Time (Nanoseconds) (only valid for time types) |
EDT | Excel Date Time (only valid for time types) |
WFT | Windows File Time (100-nanosecond ticks since 12:00 A.M. January 1, 1601 UTC ) (only valid for time types) |
ISO8601 | ISO 8601 format (only valid for time types) |
FQ{m}.{f} | Fixed Point (only really useful for bin encoded) |
F{s}:{m}:{f} | Fixed Point (only really useful for bin encoded) |
Last-Modified: 2013-10-04T08:52:00EST
Author: Jim
Description: This data was collected with a Blah Blah Spectrometer. The procedure can be found at http://example.com/procTime;Int;Dec;Seconds from starting,Measurement;Float:CI95;Dec;Absorption at 520cm-1 over 4 experiments
0,0;0
10,1;2
15,4;2;1
20,9;3
23,14;3;4