diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-02-09 20:13:05 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-02-09 20:13:05 -0800 |
commit | 260ae4ec7ee4a836b35f036ead102be8347103e8 (patch) | |
tree | aa044a914be9cb01a5fff287b558f5ff95cb59cf /notes | |
parent | 0f89be721b820fe295cacfd5e3c875d3037b9874 (diff) | |
download | fatcat-cli-260ae4ec7ee4a836b35f036ead102be8347103e8.tar.gz fatcat-cli-260ae4ec7ee4a836b35f036ead102be8347103e8.zip |
move around notes, docs, other text files
Diffstat (limited to 'notes')
-rw-r--r-- | notes/binary_size.md | 88 | ||||
-rw-r--r-- | notes/original_proposal.md | 124 | ||||
-rw-r--r-- | notes/plan.txt | 146 |
3 files changed, 358 insertions, 0 deletions
diff --git a/notes/binary_size.md b/notes/binary_size.md new file mode 100644 index 0000000..a79cf9b --- /dev/null +++ b/notes/binary_size.md @@ -0,0 +1,88 @@ + +## Binary Size + +As of 2020-05-24, in early development, the relative binary sizes are: + + 121 MB default debug build + 12 MB default release build + 8.2 MB release build w/ LTO + 6.6 MB release build w/ LTO, striped + +After some small changes: + + 5.9 MB release build w/ LTO, size optimization, other flags + 4.1 MB release build w/ LTO, size optimization, other flags, striped + +Replacing reqwest with minreq: + + 6.3 MB release build w/ LTO, size optimization, other flags + 4.1 MB release build w/ LTO, size optimization, other flags, striped + + (so, not worth it, at least while using fatcat_openapi with hyper+tokio) + +Note that release builds with LTO take *quite* a long time (many minutes). We +probably don't want that to be the defualt for `fatcatd` builds. + + cargo bloat --release --crates + + File .text Size Crate + 12.2% 21.4% 1021.5KiB fatcat_cli + 7.1% 12.5% 596.7KiB fatcat_openapi + 6.3% 11.1% 529.6KiB reqwest + 6.2% 10.9% 518.5KiB std + 3.5% 6.1% 290.3KiB clap + 2.5% 4.3% 205.9KiB regex + 2.4% 4.2% 198.7KiB regex_syntax + 2.1% 3.6% 172.8KiB h2 + 1.9% 3.4% 162.7KiB hyper + 1.8% 3.1% 149.9KiB futures + 1.4% 2.4% 116.9KiB serde_json + 1.3% 2.3% 111.2KiB macaroon + 1.0% 1.8% 85.3KiB unicode_normalization + 0.7% 1.3% 62.4KiB http + 0.6% 1.0% 50.1KiB serde + 0.6% 1.0% 47.5KiB url + 0.5% 0.9% 41.9KiB [Unknown] + 0.4% 0.8% 36.5KiB tokio_reactor + 0.4% 0.7% 31.8KiB env_logger + 0.3% 0.6% 26.6KiB chrono + 3.4% 5.9% 283.3KiB And 57 more crates. Use -n N to show more. + 57.2% 100.0% 4.7MiB .text section size, the file size is 8.2MiB + + + bnewbold@orithena$ cargo bloat --release + Finished release [optimized] target(s) in 0.27s + Analyzing target/release/fatcat-cli + + File .text Size Crate Name + 0.4% 1.0% 53.2KiB regex <regex::exec::ExecNoSync as regex::re_trait::RegularExpression>::capture... + 0.4% 0.8% 44.1KiB regex_syntax regex_syntax::ast::parse::ParserI<P>::parse_with_comments + 0.3% 0.7% 36.8KiB unicode_normalization unicode_normalization::tables::compatibility_fully_decomposed + 0.3% 0.6% 30.3KiB unicode_normalization unicode_normalization::tables::canonical_fully_decomposed + 0.2% 0.5% 25.2KiB data_encoding data_encoding::Encoding::decode_mut + 0.2% 0.5% 24.0KiB fatcat_openapi? <fatcat_openapi::models::_IMPL_DESERIALIZE_FOR_ReleaseEntity::<impl serd... + 0.2% 0.5% 23.5KiB clap clap::app::parser::Parser::get_matches_with + 0.2% 0.4% 21.7KiB clap clap::app::validator::Validator::validate + 0.2% 0.4% 20.6KiB http http::header::name::parse_hdr + 0.2% 0.4% 19.5KiB fatcat_cli fatcat_cli::Specifier::get_from_api + 0.1% 0.3% 16.4KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 16.4KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 16.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 16.1KiB fatcat_cli fatcat_cli::run + 0.1% 0.3% 15.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 14.3KiB serde_json? <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 14.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 0.1% 0.3% 14.0KiB regex regex::exec::ExecBuilder::build + 0.1% 0.3% 13.8KiB unicode_normalization unicode_normalization::tables::composition_table + 0.1% 0.3% 13.6KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... + 38.6% 89.5% 4.5MiB And 13832 smaller methods. Use -n N to show more. + 43.1% 100.0% 5.1MiB .text section size, the file size is 11.8MiB + +Low hanging fruit includes: + +- reviewing features for reqwest, clap, regex, fatcat_openapi +- replace reqwest with something smaller +- use `ansi-term` (already part of clap) +- consider removing fancy clap features? meh +- look at graph; probably duplicate versions of things + diff --git a/notes/original_proposal.md b/notes/original_proposal.md new file mode 100644 index 0000000..2a0c8fa --- /dev/null +++ b/notes/original_proposal.md @@ -0,0 +1,124 @@ + +status: prototyping, side-project + + +Fatcat CLI Client +=================== + + fatcat get release_awuvsvwrwzev7jcljyo34r6gem + fatcat get --toml release_awuvsvwrwzev7jcljyo34r6gem + + fatcat search containers "elife" + => pretty prints in terminal/interactive; JSON rows in ES schema for non-interactive + => limit, offset + +Editing commands: + + fatcat editgroup new + + fatcat editgroup list + fatcat eg list + + fatcat update container_tupsi5ep7bhhraup4irzk6tpuy publisher="Taylor and Francis" + => prints URL of revision, and mentioned editgroup id + + fatcat get container_tupsi5ep7bhhraup4irzk6tpuy --toml > wip.toml + => --expand files, --hide as args or "expand==files"? + # edit wip.toml + fatcat update container_tupsi5ep7bhhraup4irzk6tpuy --toml < wip.toml + + fatcat delete release_awuvsvwrwzev7jcljyo34r6gem + + fatcat create release < some_release.json + + fatcat create release --bulk < release_set.json + => makes editgroups automatically + + fatcat update container --bulk < container_set.json + => or, "fatcat update containers"? + + fatcat edit container_tupsi5ep7bhhraup4irzk6tpuy + => or "update"? + => container is fetched, $EDITOR is opened with JSON or TOML format, on + save tool validates/prettyprints (a diff?) and asks whether to make edit + +Other: + + fatcat download file_wuyi7kl4njehpg3yyngaqxcqfa + + fatcat status + => account info? + => current editgroup? + +For editgroup ergonomics, entity mutating commands (which require an +editgroup), tool should fetch recent open editgroups for the user, filter to +those created with the CLI tool, and use the most recent. Behavior and be tuned +to be more or less conservative (let's start conservative). + +At least for prototyping, configure via environment variables (eg, API token, +specifying alternative API endpoints). + +Clever but already taken names: + +- `fcc` (FatCat Client) is a fortran compiler. Also the name of the USA Federal + Communications Commission (a notable radio/internet/phone regulator) +- `fc` (FatCat) is a bash built-in. + +Argument conventions: + + ':' Lookup specifier for entity (eg, external identifier like `doi:10.123/abc`) + + '=' Assign field to value in create or update contexts. Non-string + values often can be infered by field type + + ':=' Assign field to non-string value in create or update contexts + +Small details (mostly TODO): + +- pass through API warning headers to stderr + +## Similar Tools / Interfaces + +### `httpie` + + ':' HTTP headers + + '==' URL parameters + + '=' data fields serialized into JSON, or as form data + + ':=' non-string JSON data (eg, true (boolean), 42 (number), or lists) + + '@' Form field + +Output goes to stdout (pretty-printed), unless specified to `--download / -d`), +in which case output file is infered, or `--output` sets it explicitly. + +### Internet Archive `ia` Tool + +TODO + +#### `jq` / `toml` + +Rust `toml-cli` has a small DSL for making mutations. + +#### `ripgrep` + +## More Ideas + +Some sort of pretty-printer for work/release/file structure. Eg, like `tree` +unix command. See `ptree` rust crate. + +## Implementation + +Rust libraries: + +- `toml` +- `toml_edit`: format-preserving TOML loading/mutating/serializing +- `termcolor` +- `atty` ("are we connected to a terminal") +- `tabwriter` for tabular CLI output +- `human-panic` +- `synect` for highlighting +- `exitcode` + diff --git a/notes/plan.txt b/notes/plan.txt new file mode 100644 index 0000000..651acac --- /dev/null +++ b/notes/plan.txt @@ -0,0 +1,146 @@ + +x search release, query string, limit, dumping search doc JSON +x search release, query string, limit, fetching API for each +x search release, query string, scroll API, fetching API for each + +x handle stdout terminated + +x editgroup creation + => set agent +x editgroup accept +x editgroup submit +x editgroup list + +x release create from json/TOML, to an editgroup +x release delete, to an editgroup +x release update from full json/TOML to API +x release edit (using $EDITOR, temp file) + +x release update fields and submit to editgroup + => more fields, like 2-5 for all entity types +x expand/hide flags for get, search + +- search/update/etc containers (and files?) + +- polish and test so actually usable for release edits from search + x consider moving to new repo, with copy of fatcat-openapi-client + x manpage + x .deb generation + => write actual manpage (and, HTML output? ronn? pandoc?) + => write actual README + +- implement @-syntax for create/update + => TODO: what was the proposal here? + => some variant of @-syntax for stream of multiple updates/creations? + +- get revisions for all entity types + + +#### Milestones + +- ability (at all) to revise edits for a single entity in editgroup + => clobber existing edits on update + => edits: get entity in current edit state +- streaming updates from search, with either pipe (jq) or field mutations + => syntax/commands + => batching (syntax? subcommand?) + => auto-accept mode +- download many PDFs from search query + => parallelism could be GNU/parallel + => don't clobber existing + +#### Editgroup Workflow + +- editgroup creation outputs just editgroup_id on stdout (unless output type selected), plus "success" to stderr +- parse editgroup specifier + => "auto": fetch from recent; default? + => "new": create + => editgroup_blah or blah +- implement "delete from editgroup" for updates, edit + => no updates with current setup + => fetch editgroup helper + => helper function that takes editgroup (model) and expanded specifier; deletes existing edit from editgroup if necessary + => skip this codepath for "new" and batch creation + +#### File Downloads + +- download single file: + => try archive.org files, then wayback, then original URLs + => download to current directory as {sha1hex}.pdf.partial, then atomic move on success +- optional directory structure: {dir}/{hex}/{hex}/{sha1hex}.pdf +- parallelism of downloads + +#### Backburner + +- -o/--output and -i/--input for format/schema selection (including 'es-json') +- search release, filters, scroll API, fetching API for each + => structopt parses: query, filter, anti-filter +- search release, filters, scroll API, fetching API for each, verifying revision and filters for each + +## Design Decisions + +- batch/multi behavior for mutations + => need some option to do auto-accept batches +- updates and create, from-file vs. args + => basically, could be any of specifier, input_file, mutations supplied on command-line + => could use httpie @file.blah syntax to load entire file + => "edit" as an option for reading single files from disk? meh + proposal: + create <type> + either reads a file from path/stdin, or has mutation args + optionally --new-editgroup + create-multi <type> + reads multiple JSON from file or stdin + optionally --auto-batch in chunks + optionally --new-editgroup + update <specifier> + takes a specifier + either reads a file from path/stdin, or has mutation args + update-multi <type> + reads multiple JSON from file or stdin + creates new editgroup? + edit <specifier> + delete <specifier> + delete-multi <type> + reads multiple entities from stdin + + --skip-check controls whether to do a GET and validate mutations + => eg, don't update if equal +- holding state about current editgroup + => env var, with helpful output to show how to export + => spawn sub-shell with FATCAT_EDITGROUP set + => state in a config file somewhere (user homedir?) + => "smart" select most recent fatcat-cli editgroup from editor's list +- release revision checking on updates + => could re-fetch and check rev and/or mutations against current before making edit +- delete edit from editgroup + +## Rust refactors + +In rust code, all entity responses could have trait object implementations, +which would transform to either returning the entity (trait object) or error. + +## API refactors + +Could significantly reduce number of response types and endpoints by making +many methods generic (same endpoint URL, but entity type as a keyword): + +- entity history +- delete +- get edit + +Should allow destructive updates in editgroups with "clobber" flag. In +implementation, could either delete first or on conflict do upsert. + +More consistent use of generic success/error? + +## Feature Ideas + +- changelog (table): under editgroup command? +- syntect coloring of output for stdout +- cross build for OS X? homebrew? +- shell (bash) completions from clap +- fcid/UUID helper +- history for all entity types + => pretty table, json optional +- "edit editgroup" as a text file, `git rebase -i` style |