diff options
Diffstat (limited to 'notes')
| -rw-r--r-- | notes/binary_size.md | 88 | ||||
| -rw-r--r-- | notes/original_proposal.md | 124 | ||||
| -rw-r--r-- | notes/plan.txt | 146 | 
3 files changed, 358 insertions, 0 deletions
diff --git a/notes/binary_size.md b/notes/binary_size.md new file mode 100644 index 0000000..a79cf9b --- /dev/null +++ b/notes/binary_size.md @@ -0,0 +1,88 @@ + +## Binary Size + +As of 2020-05-24, in early development, the relative binary sizes are: + +    121 MB      default debug build +    12 MB       default release build +    8.2 MB      release build w/ LTO +    6.6 MB      release build w/ LTO, striped + +After some small changes: + +    5.9 MB      release build w/ LTO, size optimization, other flags +    4.1 MB      release build w/ LTO, size optimization, other flags, striped + +Replacing reqwest with minreq: + +    6.3 MB      release build w/ LTO, size optimization, other flags +    4.1 MB      release build w/ LTO, size optimization, other flags, striped + +    (so, not worth it, at least while using fatcat_openapi with hyper+tokio) + +Note that release builds with LTO take *quite* a long time (many minutes). We +probably don't want that to be the defualt for `fatcatd` builds. + +    cargo bloat --release --crates + +     File  .text      Size Crate +    12.2%  21.4% 1021.5KiB fatcat_cli +     7.1%  12.5%  596.7KiB fatcat_openapi +     6.3%  11.1%  529.6KiB reqwest +     6.2%  10.9%  518.5KiB std +     3.5%   6.1%  290.3KiB clap +     2.5%   4.3%  205.9KiB regex +     2.4%   4.2%  198.7KiB regex_syntax +     2.1%   3.6%  172.8KiB h2 +     1.9%   3.4%  162.7KiB hyper +     1.8%   3.1%  149.9KiB futures +     1.4%   2.4%  116.9KiB serde_json +     1.3%   2.3%  111.2KiB macaroon +     1.0%   1.8%   85.3KiB unicode_normalization +     0.7%   1.3%   62.4KiB http +     0.6%   1.0%   50.1KiB serde +     0.6%   1.0%   47.5KiB url +     0.5%   0.9%   41.9KiB [Unknown] +     0.4%   0.8%   36.5KiB tokio_reactor +     0.4%   0.7%   31.8KiB env_logger +     0.3%   0.6%   26.6KiB chrono +     3.4%   5.9%  283.3KiB And 57 more crates. Use -n N to show more. +    57.2% 100.0%    4.7MiB .text section size, the file size is 8.2MiB + + +    bnewbold@orithena$ cargo bloat --release +        Finished release [optimized] target(s) in 0.27s +        Analyzing target/release/fatcat-cli + +     File  .text    Size                 Crate Name +     0.4%   1.0% 53.2KiB                 regex <regex::exec::ExecNoSync as regex::re_trait::RegularExpression>::capture... +     0.4%   0.8% 44.1KiB          regex_syntax regex_syntax::ast::parse::ParserI<P>::parse_with_comments +     0.3%   0.7% 36.8KiB unicode_normalization unicode_normalization::tables::compatibility_fully_decomposed +     0.3%   0.6% 30.3KiB unicode_normalization unicode_normalization::tables::canonical_fully_decomposed +     0.2%   0.5% 25.2KiB         data_encoding data_encoding::Encoding::decode_mut +     0.2%   0.5% 24.0KiB       fatcat_openapi? <fatcat_openapi::models::_IMPL_DESERIALIZE_FOR_ReleaseEntity::<impl serd... +     0.2%   0.5% 23.5KiB                  clap clap::app::parser::Parser::get_matches_with +     0.2%   0.4% 21.7KiB                  clap clap::app::validator::Validator::validate +     0.2%   0.4% 20.6KiB                  http http::header::name::parse_hdr +     0.2%   0.4% 19.5KiB            fatcat_cli fatcat_cli::Specifier::get_from_api +     0.1%   0.3% 16.4KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 16.4KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 16.2KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 16.1KiB            fatcat_cli fatcat_cli::run +     0.1%   0.3% 15.2KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 14.3KiB           serde_json? <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 14.2KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +     0.1%   0.3% 14.0KiB                 regex regex::exec::ExecBuilder::build +     0.1%   0.3% 13.8KiB unicode_normalization unicode_normalization::tables::composition_table +     0.1%   0.3% 13.6KiB            fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser... +    38.6%  89.5%  4.5MiB                       And 13832 smaller methods. Use -n N to show more. +    43.1% 100.0%  5.1MiB                       .text section size, the file size is 11.8MiB + +Low hanging fruit includes: + +- reviewing features for reqwest, clap, regex, fatcat_openapi +- replace reqwest with something smaller +- use `ansi-term` (already part of clap) +- consider removing fancy clap features? meh +- look at graph; probably duplicate versions of things + diff --git a/notes/original_proposal.md b/notes/original_proposal.md new file mode 100644 index 0000000..2a0c8fa --- /dev/null +++ b/notes/original_proposal.md @@ -0,0 +1,124 @@ + +status: prototyping, side-project + + +Fatcat CLI Client +=================== + +    fatcat get release_awuvsvwrwzev7jcljyo34r6gem +    fatcat get --toml release_awuvsvwrwzev7jcljyo34r6gem + +    fatcat search containers "elife" +    => pretty prints in terminal/interactive; JSON rows in ES schema for non-interactive +    => limit, offset + +Editing commands: + +    fatcat editgroup new + +    fatcat editgroup list +    fatcat eg list + +    fatcat update container_tupsi5ep7bhhraup4irzk6tpuy publisher="Taylor and Francis" +    => prints URL of revision, and mentioned editgroup id + +    fatcat get container_tupsi5ep7bhhraup4irzk6tpuy --toml > wip.toml +    => --expand files, --hide as args or "expand==files"? +    # edit wip.toml +    fatcat update container_tupsi5ep7bhhraup4irzk6tpuy --toml < wip.toml + +    fatcat delete release_awuvsvwrwzev7jcljyo34r6gem + +    fatcat create release < some_release.json + +    fatcat create release --bulk < release_set.json +    => makes editgroups automatically + +    fatcat update container --bulk < container_set.json +    => or, "fatcat update containers"? + +    fatcat edit container_tupsi5ep7bhhraup4irzk6tpuy +    => or "update"? +    => container is fetched, $EDITOR is opened with JSON or TOML format, on +       save tool validates/prettyprints (a diff?) and asks whether to make edit + +Other: + +    fatcat download file_wuyi7kl4njehpg3yyngaqxcqfa + +    fatcat status +    => account info? +    => current editgroup? + +For editgroup ergonomics, entity mutating commands (which require an +editgroup), tool should fetch recent open editgroups for the user, filter to +those created with the CLI tool, and use the most recent. Behavior and be tuned +to be more or less conservative (let's start conservative). + +At least for prototyping, configure via environment variables (eg, API token, +specifying alternative API endpoints). + +Clever but already taken names: + +- `fcc` (FatCat Client) is a fortran compiler. Also the name of the USA Federal +  Communications Commission (a notable radio/internet/phone regulator) +- `fc` (FatCat) is a bash built-in. + +Argument conventions: + +    ':'     Lookup specifier for entity (eg, external identifier like `doi:10.123/abc`) + +    '='     Assign field to value in create or update contexts. Non-string +            values often can be infered by field type + +    ':='    Assign field to non-string value in create or update contexts + +Small details (mostly TODO): + +- pass through API warning headers to stderr + +## Similar Tools / Interfaces + +### `httpie` + +    ':'     HTTP headers + +    '=='    URL parameters + +    '='     data fields serialized into JSON, or as form data + +    ':='    non-string JSON data (eg, true (boolean), 42 (number), or lists) + +    '@'     Form field + +Output goes to stdout (pretty-printed), unless specified to `--download / -d`), +in which case output file is infered, or `--output` sets it explicitly. + +### Internet Archive `ia` Tool + +TODO + +#### `jq` / `toml` + +Rust `toml-cli` has a small DSL for making mutations. + +#### `ripgrep` + +## More Ideas + +Some sort of pretty-printer for work/release/file structure. Eg, like `tree` +unix command. See `ptree` rust crate. + +## Implementation + +Rust libraries: + +- `toml` +- `toml_edit`: format-preserving TOML loading/mutating/serializing +- `termcolor` +- `atty` ("are we connected to a terminal") +- `tabwriter` for tabular CLI output +- `human-panic` +- `synect` for highlighting +- `exitcode` + diff --git a/notes/plan.txt b/notes/plan.txt new file mode 100644 index 0000000..651acac --- /dev/null +++ b/notes/plan.txt @@ -0,0 +1,146 @@ + +x search release, query string, limit, dumping search doc JSON +x search release, query string, limit, fetching API for each +x search release, query string, scroll API, fetching API for each + +x handle stdout terminated + +x editgroup creation +    => set agent +x editgroup accept +x editgroup submit +x editgroup list + +x release create from json/TOML, to an editgroup +x release delete, to an editgroup +x release update from full json/TOML to API +x release edit (using $EDITOR, temp file) + +x release update fields and submit to editgroup +    => more fields, like 2-5 for all entity types +x expand/hide flags for get, search + +- search/update/etc containers (and files?) + +- polish and test so actually usable for release edits from search +    x  consider moving to new repo, with copy of fatcat-openapi-client +    x  manpage +    x  .deb generation +    => write actual manpage (and, HTML output? ronn? pandoc?) +    => write actual README + +- implement @-syntax for create/update +    => TODO: what was the proposal here? +    => some variant of @-syntax for stream of multiple updates/creations? + +- get revisions for all entity types + + +#### Milestones + +- ability (at all) to revise edits for a single entity in editgroup +    => clobber existing edits on update +    => edits: get entity in current edit state +- streaming updates from search, with either pipe (jq) or field mutations +    => syntax/commands +    => batching (syntax? subcommand?) +    => auto-accept mode +- download many PDFs from search query +    => parallelism could be GNU/parallel +    => don't clobber existing + +#### Editgroup Workflow + +- editgroup creation outputs just editgroup_id on stdout (unless output type selected), plus "success" to stderr +- parse editgroup specifier +    => "auto": fetch from recent; default? +    => "new": create +    => editgroup_blah or blah +- implement "delete from editgroup" for updates, edit +    => no updates with current setup +    => fetch editgroup helper +    => helper function that takes editgroup (model) and expanded specifier; deletes existing edit from editgroup if necessary +    => skip this codepath for "new" and batch creation + +#### File Downloads + +- download single file: +    => try archive.org files, then wayback, then original URLs +    => download to current directory as {sha1hex}.pdf.partial, then atomic move on success +- optional directory structure: {dir}/{hex}/{hex}/{sha1hex}.pdf +- parallelism of downloads + +#### Backburner + +- -o/--output and -i/--input for format/schema selection (including 'es-json') +- search release, filters, scroll API, fetching API for each +    => structopt parses: query, filter, anti-filter +- search release, filters, scroll API, fetching API for each, verifying revision and filters for each + +## Design Decisions + +- batch/multi behavior for mutations +    => need some option to do auto-accept batches +- updates and create, from-file vs. args +    => basically, could be any of specifier, input_file, mutations supplied on command-line +    => could use httpie @file.blah syntax to load entire file +    => "edit" as an option for reading single files from disk? meh +    proposal: +        create <type> +            either reads a file from path/stdin, or has mutation args +            optionally --new-editgroup +        create-multi <type> +            reads multiple JSON from file or stdin +            optionally --auto-batch in chunks +            optionally --new-editgroup +        update <specifier> +            takes a specifier +            either reads a file from path/stdin, or has mutation args +        update-multi <type> +            reads multiple JSON from file or stdin +            creates new editgroup? +        edit <specifier> +        delete <specifier> +        delete-multi <type> +            reads multiple entities from stdin + +        --skip-check controls whether to do a GET and validate mutations +            => eg, don't update if equal +- holding state about current editgroup +    => env var, with helpful output to show how to export +    => spawn sub-shell with FATCAT_EDITGROUP set +    => state in a config file somewhere (user homedir?) +    => "smart" select most recent fatcat-cli editgroup from editor's list +- release revision checking on updates +    => could re-fetch and check rev and/or mutations against current before making edit +- delete edit from editgroup + +## Rust refactors + +In rust code, all entity responses could have trait object implementations, +which would transform to either returning the entity (trait object) or error. + +## API refactors + +Could significantly reduce number of response types and endpoints by making +many methods generic (same endpoint URL, but entity type as a keyword): + +- entity history +- delete +- get edit + +Should allow destructive updates in editgroups with "clobber" flag. In +implementation, could either delete first or on conflict do upsert. + +More consistent use of generic success/error? + +## Feature Ideas + +- changelog (table): under editgroup command? +- syntect coloring of output for stdout +- cross build for OS X? homebrew? +- shell (bash) completions from clap +- fcid/UUID helper +- history for all entity types +    => pretty table, json optional +- "edit editgroup" as a text file, `git rebase -i` style  | 
