diff options
-rw-r--r-- | proposals/0000-hyperdb-timestamps.md | 153 |
1 files changed, 153 insertions, 0 deletions
diff --git a/proposals/0000-hyperdb-timestamps.md b/proposals/0000-hyperdb-timestamps.md new file mode 100644 index 0000000..1a80e32 --- /dev/null +++ b/proposals/0000-hyperdb-timestamps.md @@ -0,0 +1,153 @@ + +Title: **DEP-0000: Hyperdb Timestamps** + +Short Name: `0000-hyperdb-timestamps` + +Type: Standard + +Status: Undefined (as of YYYY-MM-DD) + +Github PR: (add HTTPS link here after PR is opened) + +Authors: [Bryan Newbold](https://github.com/bnewbold) + + +# Summary +[summary]: #summary + +An optional timestamp field is added to hyperdb "entry" metadata for writers to +self-report the date and time of modifications to the database. + + +# Motivation +[motivation]: #motivation + +The timestamp of changes to Dat archives is not currently captured at any layer +of the technical stack (hyperdrive, hyperdb, or hypercore). Hyperdrive records +the filesystem creation and modification timestamps of individual files, but +these [POSIX][posix]-style fields store pre-existing filesystem-level metadata, +not hyperdrive-level event metadata, and do not track changes like deletion. + +By adding an optional field to hyperdb entries, clients and applications will +be able to track the (self-reported) date and time of changes, which are +valuable to human users inspecting the history of a dat archive. Uses of this +metadata might include surfacing recent events to users, calculating activity +metrics over large numbers of archives, and comparing "freshness" between +multiple archives with similar or overlapping content. + +This metadata is *not* intended to be trusted for any security or +protocol-layer concerns. + +[POSIX]: https://en.wikipedia.org/wiki/POSIX + + +# Usage Documentation +[usage-documentation]: #usage-documentation + +Implementation libraries expose timestamp information at higher protocol +levels. The Hyperdb library also exposes a method to fetch the timestamp for a +given revision/sequence of the log, and options to control the inclusion of +timestamps when creating databases or making modifications. + +It is important to note that timestamps are self-generated and reported by the +writer's computer, which may be misconfigured or have significant local clock +skew. This means that timestamps can end up inaccurate, out of order, or +entirely bogus (eg, in the case of a device with no datetime configured at all, +which might default to the year 1970). + + +# Reference Documentation +[reference-documentation]: #reference-documentation + +The timestamp would be an additional `optional` field to the `Entry` and +`InflatedEntry` protobuf fields in the hyperdb protocol, with type `uint64`. +The number represents the number of milliseconds since the UNIX epoch, in the +UTC timezone. The timestamp would be calculated at the time the protobuf +message is created for appending to the hypercore log. + + +# Drawbacks +[drawbacks]: #drawbacks + +The marginal storage and bandwidth overhead to implement timestamps would be +very small for Dat and hyperdrive applications, but could be non-trivial for +some hyperdb key/value store applications with very small values. Even in this +case, the overhead would be on the order of 5 bytes per key. + +See [Privacy and Security Concerns][privacy]. + + +# Privacy and Security Concerns +[privacy]: #privacy + +Timestamps can be used as a side-band to deanonymize users, by leaking +information about their local timezone and computer usage habits. +High-resolution timestamps can be correlated with network traffic or real-life +events to identify nodes. + +Users and applications could "fuzz" (add random error to), truncate, or +entirely disable timestamping to mitigate these issues, but these would depend +on user and developer awareness of the issue. + + +# Rationale and alternatives +[alternatives]: #alternatives + +The usefulness of "self-reported" (as opposed to network-verified) timestamps +is demonstrated by the git version control software. + +The timestamp could have additional resolution (eg, nanoseconds) at the cost of +additional storage. Seconds seems like a good design trade-off between +efficiency and value to human users. Timestamps could also be stored in +floating point, but this removes the potential efficiency of `varint` encoding. + +The timestamp field is added at the hyperdb layer instead of hyperdrive (in the +`Stat` metadata) because in the hyperdb version of hyperdrive, a deletion is +represented as a non-existant `value` (aka, no `Stat` at all). If timestamps +aren't stored at the hyperdb layer, then deletions can't be timestamped. + +The consistent use of UTC time requires implementations to convert to local +time, but avoids the complexities of local timezone representations. The +behavior around leap seconds is intentionally not specified here, to allow use +of "clock smearing" or operating system default transitions to TAI or other +leap-second-free time standards. + + +# Unresolved questions +[unresolved]: #unresolved-questions + +Should this instead live at the hypercore layer? Protocol designers have so far +been hesitant to add additional features or complexity to that protocol layer. + +Should a new "wrapper" message type be added to hyperdrive to allow +timestamping of deletions as well as insertions. This could look something +like: + +```protobuf +message DriveEvent { + message Stat { + required uint32 mode = 1; + optional uint32 uid = 2; + optional uint32 gid = 3; + optional uint64 size = 4; + optional uint64 blocks = 5; + optional uint64 offset = 6; + optional uint64 byteOffset = 7; + optional uint64 mtime = 8; + optional uint64 ctime = 9; + } + + optional Stat stat = 1; + optional uint63 timestamp = 2; +} +``` + +Is there a better practice today (in 2018) for dealing with leap seconds in +specifications? + + +# Changelog +[changelog]: #changelog + +- 2018-03-17: First submitted for comment. + |