About Me!

This blog is about my musings and thoughts. I hope you find it useful, at most, and entertaining, at least.

Résumé [PDF]

Other Pages

Quotes

Links

Presence Elsewhere

jim@jimkeener.com

GitHub

BitBucket

TDF -- Text Data File rev 2

Date: 2013-10-04
Tags: computers protocols data programming

Please see rev3 for a newer revision

In a previous post I describe a binary file-format for the exchange of data. Here I would like to describe a format, but one limited to 7-bit ASCII and editable in a text-editor (without plugins).

Format

Delimeters: , and ; and :

These can be escaped by:

, $%%
; $
$ $$$
% %%%

I’ve chosen these so that tools such as awk can still be used to parse files that contain delimiters inside the values.

delimits types and encodings in header lines


Metadata-header: value

Field Name[;Type[:Type]…;Format[;Description]], (Field Name[;Type[,Type]&2a;;Format[;Description]])*
value(,value)*
value(,value)*
….

Examples


Last-Modified: 2013-08-06T08:52:00EST

Value;Int;Dec,Square;Int;Dec
0,0
1,1
2,2
3,9
4,16
5,25
6,36
7,59
8,64
9,81
10,100



Last-Modified: 2013-08-06T08:52:00EST

Value;Int:Fix2;Dec,Square;Int:Fix2;Dec
00,00
01,01
02,02
03,09
04,16
05,25
06,36
07,59
08,64
09,81
10,100

Value,Square
0,0
1,1
2,2
3,9
4,16
5,25
6,36
7,59
8,64
9,81
10,100

The last one looks familiar, huh?

Metadata

Metadata will be stored in RFC 2616 Header format (HTTP/1.1 header syntax).

I propose the following headers:

Header value Example Comment
Digest digest base64-hash sha1 yaUxx5mrIRyXNdovreYa/PFh0PE= Calculated without this line
Last-Modified ISO 8601 date 2013-08-06T08:52:00EST
Signature user;key fingerprint;signature
Jim <jim@example.com>;f642a8d2552281d792b52a17cbe79f3163b296f3;MIGHAkER9CmV5WJPB3hnk9eD31oqhAKWTsXVKubdIffMM9ocjU667p5yDh8xrOuOx0T8xx2NTQgmnDgsrPaXLK8WiMEaaQJCAYn2TwWkSVpgTM7oFg3O6r9ZTSRTnqZhxyk3g7O1SDHcqxohBREITiMsIFFNjv6m6sj/M8e4ndlaHZVgv5J/T+NR
Because of their size, EC keys are useful for this. Calculated without this line.
Author user
Jim <jim@example.com>
Description Description This data is awesome!
Source URL or description of the source of the data (where to go to find out more or who made it. May be repeated
http://example.com/data
or
Jim's Lat @ HisHouseU

Types

Types may be chained, e.g. Bin32,Float

Type Comment
Fix{n} Fixed length field of length n
Bin Arbitrary binary data
Integer 2s Compliment if Binary
Float IEEE 754 if Binary
UUID UUID
Text Text
Time Date and/or Time
CI{m} Contains Confidence Interval with ; separator. One value is min and max intervals, two values are min;max. m is the percentage of the confidence interval (e.g. 90%, 95%, &c). CI is in the same encoding as the field
Geometry Stores Geometry types

Formats

Fomat Comment
Dec Decimal (default)
Hex Hex encoded/Base-16 Encoded
B32 Base-32 Encoded
B36 Base-36 Encoded
B58 Base-58 Encoded
B64 Base-64 Encoded
B85 Base-85 Encoded
UU UUEncoded
XX XXEncoded
UTF8 UTF-8 encoded text
ASCII ASCII Text
Latin1 Latin1/ISO 8859/ Text
NT Null Terminated (useful in a fixed-length field)
PS{n} Pascal/Length-Prefixed String with n-byte length (useful in a fixed-length field)
WKT Well-Known Text (only valid for Geometry type)
WKB Well-Known Binary (only valid for Geometry type)
UDT Unix Date Time (only valid for time types)
UMT Unix Date Time (Miliseconds)/Javascript Time (only valid for time types)
UNT Unix Date Time (Nanoseconds) (only valid for time types)
EDT Excel Date Time (only valid for time types)
WFT Windows File Time (100-nanosecond ticks since 12:00 A.M. January 1, 1601 UTC ) (only valid for time types)
ISO8601 ISO 8601 format (only valid for time types)
FQ{m}.{f} Fixed Point (only really useful for bin encoded)
F{s}:{m}:{f} Fixed Point (only really useful for bin encoded)


Last-Modified: 2013-10-04T08:52:00EST
Author: Jim
Description: This data was collected with a Blah Blah Spectrometer. The procedure can be found at http://example.com/proc

Time;Int;Dec;Seconds from starting,Measurement;Float:CI95;Dec;Absorption at 520cm-1 over 4 experiments
0,0;0
10,1;2
15,4;2;1
20,9;3
23,14;3;4