From 0ac571f41a2a0be84dbd360805e3c4e51686681d Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Fri, 24 Jun 2022 19:53:18 -0700 Subject: commit old notes --- notes/background.txt | 10 +++++++++ notes/plan.txt | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+) create mode 100644 notes/background.txt create mode 100644 notes/plan.txt (limited to 'notes') diff --git a/notes/background.txt b/notes/background.txt new file mode 100644 index 0000000..9449861 --- /dev/null +++ b/notes/background.txt @@ -0,0 +1,10 @@ + +## Libraries + +- [tablib](http://docs.python-tablib.org/en/latest/) +- [records](https://github.com/kennethreitz-archive/records) + +## File Formats + +- column stores like parquet, arrow +- data packages diff --git a/notes/plan.txt b/notes/plan.txt new file mode 100644 index 0000000..8f09fc0 --- /dev/null +++ b/notes/plan.txt @@ -0,0 +1,58 @@ + +x write basic exmaple from TSV file +x pass-through basic thing in pipeline +x pretty printer (using column writing, term color) +- convert + to/from TSV + to/from JSON +- example apps + ls + stat + df, mount, something like that + trivial web server (or other thing that logs) +- example datasets (eg, for benchmarking): compare TSV, AFT, JSON + million-line CDX + log file +- manpage (?) +- build .deb and installable +- helper library + => R/W trails/wrappers + => header struct + => iterate rows (from input) + => pretty-print output based on tty status + => validation/check modes + => stream mode/helper for subprocesses +- tests +- compare with xsv command (?) +- reimplement basic commands + cut (accept field names) + cat (combining files with compatible headers) + head, tail + wc (count rows, records) + format (accept field names) + grep/match/filter by column value? + paste + uniq (by column) + sort (by column) + join + +- extended commands + parallel (with column names) + shuf + comm + expand/unexpand + nl + seq + +ideas: +- python stuff +- C stuff +- log log format integration +- rust serde integration +- aft-header: pretty-prints header as rows +- aft-single: pretty-print single row (first?) as rows +- aft-format (or printf?) "this {col1} to that {col2}" "some other column" +- aft2json, json2aft +- aft2html +- aft-stats: sum, mean, stddev, min, max +- aft-sql -- cgit v1.2.3