aboutsummaryrefslogtreecommitdiffstats
path: root/notes
diff options
context:
space:
mode:
authorBryan Newbold <bnewbold@archive.org>2021-02-09 20:13:05 -0800
committerBryan Newbold <bnewbold@archive.org>2021-02-09 20:13:05 -0800
commit260ae4ec7ee4a836b35f036ead102be8347103e8 (patch)
treeaa044a914be9cb01a5fff287b558f5ff95cb59cf /notes
parent0f89be721b820fe295cacfd5e3c875d3037b9874 (diff)
downloadfatcat-cli-260ae4ec7ee4a836b35f036ead102be8347103e8.tar.gz
fatcat-cli-260ae4ec7ee4a836b35f036ead102be8347103e8.zip
move around notes, docs, other text files
Diffstat (limited to 'notes')
-rw-r--r--notes/binary_size.md88
-rw-r--r--notes/original_proposal.md124
-rw-r--r--notes/plan.txt146
3 files changed, 358 insertions, 0 deletions
diff --git a/notes/binary_size.md b/notes/binary_size.md
new file mode 100644
index 0000000..a79cf9b
--- /dev/null
+++ b/notes/binary_size.md
@@ -0,0 +1,88 @@
+
+## Binary Size
+
+As of 2020-05-24, in early development, the relative binary sizes are:
+
+ 121 MB default debug build
+ 12 MB default release build
+ 8.2 MB release build w/ LTO
+ 6.6 MB release build w/ LTO, striped
+
+After some small changes:
+
+ 5.9 MB release build w/ LTO, size optimization, other flags
+ 4.1 MB release build w/ LTO, size optimization, other flags, striped
+
+Replacing reqwest with minreq:
+
+ 6.3 MB release build w/ LTO, size optimization, other flags
+ 4.1 MB release build w/ LTO, size optimization, other flags, striped
+
+ (so, not worth it, at least while using fatcat_openapi with hyper+tokio)
+
+Note that release builds with LTO take *quite* a long time (many minutes). We
+probably don't want that to be the defualt for `fatcatd` builds.
+
+ cargo bloat --release --crates
+
+ File .text Size Crate
+ 12.2% 21.4% 1021.5KiB fatcat_cli
+ 7.1% 12.5% 596.7KiB fatcat_openapi
+ 6.3% 11.1% 529.6KiB reqwest
+ 6.2% 10.9% 518.5KiB std
+ 3.5% 6.1% 290.3KiB clap
+ 2.5% 4.3% 205.9KiB regex
+ 2.4% 4.2% 198.7KiB regex_syntax
+ 2.1% 3.6% 172.8KiB h2
+ 1.9% 3.4% 162.7KiB hyper
+ 1.8% 3.1% 149.9KiB futures
+ 1.4% 2.4% 116.9KiB serde_json
+ 1.3% 2.3% 111.2KiB macaroon
+ 1.0% 1.8% 85.3KiB unicode_normalization
+ 0.7% 1.3% 62.4KiB http
+ 0.6% 1.0% 50.1KiB serde
+ 0.6% 1.0% 47.5KiB url
+ 0.5% 0.9% 41.9KiB [Unknown]
+ 0.4% 0.8% 36.5KiB tokio_reactor
+ 0.4% 0.7% 31.8KiB env_logger
+ 0.3% 0.6% 26.6KiB chrono
+ 3.4% 5.9% 283.3KiB And 57 more crates. Use -n N to show more.
+ 57.2% 100.0% 4.7MiB .text section size, the file size is 8.2MiB
+
+
+ bnewbold@orithena$ cargo bloat --release
+ Finished release [optimized] target(s) in 0.27s
+ Analyzing target/release/fatcat-cli
+
+ File .text Size Crate Name
+ 0.4% 1.0% 53.2KiB regex <regex::exec::ExecNoSync as regex::re_trait::RegularExpression>::capture...
+ 0.4% 0.8% 44.1KiB regex_syntax regex_syntax::ast::parse::ParserI<P>::parse_with_comments
+ 0.3% 0.7% 36.8KiB unicode_normalization unicode_normalization::tables::compatibility_fully_decomposed
+ 0.3% 0.6% 30.3KiB unicode_normalization unicode_normalization::tables::canonical_fully_decomposed
+ 0.2% 0.5% 25.2KiB data_encoding data_encoding::Encoding::decode_mut
+ 0.2% 0.5% 24.0KiB fatcat_openapi? <fatcat_openapi::models::_IMPL_DESERIALIZE_FOR_ReleaseEntity::<impl serd...
+ 0.2% 0.5% 23.5KiB clap clap::app::parser::Parser::get_matches_with
+ 0.2% 0.4% 21.7KiB clap clap::app::validator::Validator::validate
+ 0.2% 0.4% 20.6KiB http http::header::name::parse_hdr
+ 0.2% 0.4% 19.5KiB fatcat_cli fatcat_cli::Specifier::get_from_api
+ 0.1% 0.3% 16.4KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 16.4KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 16.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 16.1KiB fatcat_cli fatcat_cli::run
+ 0.1% 0.3% 15.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 14.3KiB serde_json? <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 14.2KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 0.1% 0.3% 14.0KiB regex regex::exec::ExecBuilder::build
+ 0.1% 0.3% 13.8KiB unicode_normalization unicode_normalization::tables::composition_table
+ 0.1% 0.3% 13.6KiB fatcat_cli <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deser...
+ 38.6% 89.5% 4.5MiB And 13832 smaller methods. Use -n N to show more.
+ 43.1% 100.0% 5.1MiB .text section size, the file size is 11.8MiB
+
+Low hanging fruit includes:
+
+- reviewing features for reqwest, clap, regex, fatcat_openapi
+- replace reqwest with something smaller
+- use `ansi-term` (already part of clap)
+- consider removing fancy clap features? meh
+- look at graph; probably duplicate versions of things
+
diff --git a/notes/original_proposal.md b/notes/original_proposal.md
new file mode 100644
index 0000000..2a0c8fa
--- /dev/null
+++ b/notes/original_proposal.md
@@ -0,0 +1,124 @@
+
+status: prototyping, side-project
+
+
+Fatcat CLI Client
+===================
+
+ fatcat get release_awuvsvwrwzev7jcljyo34r6gem
+ fatcat get --toml release_awuvsvwrwzev7jcljyo34r6gem
+
+ fatcat search containers "elife"
+ => pretty prints in terminal/interactive; JSON rows in ES schema for non-interactive
+ => limit, offset
+
+Editing commands:
+
+ fatcat editgroup new
+
+ fatcat editgroup list
+ fatcat eg list
+
+ fatcat update container_tupsi5ep7bhhraup4irzk6tpuy publisher="Taylor and Francis"
+ => prints URL of revision, and mentioned editgroup id
+
+ fatcat get container_tupsi5ep7bhhraup4irzk6tpuy --toml > wip.toml
+ => --expand files, --hide as args or "expand==files"?
+ # edit wip.toml
+ fatcat update container_tupsi5ep7bhhraup4irzk6tpuy --toml < wip.toml
+
+ fatcat delete release_awuvsvwrwzev7jcljyo34r6gem
+
+ fatcat create release < some_release.json
+
+ fatcat create release --bulk < release_set.json
+ => makes editgroups automatically
+
+ fatcat update container --bulk < container_set.json
+ => or, "fatcat update containers"?
+
+ fatcat edit container_tupsi5ep7bhhraup4irzk6tpuy
+ => or "update"?
+ => container is fetched, $EDITOR is opened with JSON or TOML format, on
+ save tool validates/prettyprints (a diff?) and asks whether to make edit
+
+Other:
+
+ fatcat download file_wuyi7kl4njehpg3yyngaqxcqfa
+
+ fatcat status
+ => account info?
+ => current editgroup?
+
+For editgroup ergonomics, entity mutating commands (which require an
+editgroup), tool should fetch recent open editgroups for the user, filter to
+those created with the CLI tool, and use the most recent. Behavior and be tuned
+to be more or less conservative (let's start conservative).
+
+At least for prototyping, configure via environment variables (eg, API token,
+specifying alternative API endpoints).
+
+Clever but already taken names:
+
+- `fcc` (FatCat Client) is a fortran compiler. Also the name of the USA Federal
+ Communications Commission (a notable radio/internet/phone regulator)
+- `fc` (FatCat) is a bash built-in.
+
+Argument conventions:
+
+ ':' Lookup specifier for entity (eg, external identifier like `doi:10.123/abc`)
+
+ '=' Assign field to value in create or update contexts. Non-string
+ values often can be infered by field type
+
+ ':=' Assign field to non-string value in create or update contexts
+
+Small details (mostly TODO):
+
+- pass through API warning headers to stderr
+
+## Similar Tools / Interfaces
+
+### `httpie`
+
+ ':' HTTP headers
+
+ '==' URL parameters
+
+ '=' data fields serialized into JSON, or as form data
+
+ ':=' non-string JSON data (eg, true (boolean), 42 (number), or lists)
+
+ '@' Form field
+
+Output goes to stdout (pretty-printed), unless specified to `--download / -d`),
+in which case output file is infered, or `--output` sets it explicitly.
+
+### Internet Archive `ia` Tool
+
+TODO
+
+#### `jq` / `toml`
+
+Rust `toml-cli` has a small DSL for making mutations.
+
+#### `ripgrep`
+
+## More Ideas
+
+Some sort of pretty-printer for work/release/file structure. Eg, like `tree`
+unix command. See `ptree` rust crate.
+
+## Implementation
+
+Rust libraries:
+
+- `toml`
+- `toml_edit`: format-preserving TOML loading/mutating/serializing
+- `termcolor`
+- `atty` ("are we connected to a terminal")
+- `tabwriter` for tabular CLI output
+- `human-panic`
+- `synect` for highlighting
+- `exitcode`
+
diff --git a/notes/plan.txt b/notes/plan.txt
new file mode 100644
index 0000000..651acac
--- /dev/null
+++ b/notes/plan.txt
@@ -0,0 +1,146 @@
+
+x search release, query string, limit, dumping search doc JSON
+x search release, query string, limit, fetching API for each
+x search release, query string, scroll API, fetching API for each
+
+x handle stdout terminated
+
+x editgroup creation
+ => set agent
+x editgroup accept
+x editgroup submit
+x editgroup list
+
+x release create from json/TOML, to an editgroup
+x release delete, to an editgroup
+x release update from full json/TOML to API
+x release edit (using $EDITOR, temp file)
+
+x release update fields and submit to editgroup
+ => more fields, like 2-5 for all entity types
+x expand/hide flags for get, search
+
+- search/update/etc containers (and files?)
+
+- polish and test so actually usable for release edits from search
+ x consider moving to new repo, with copy of fatcat-openapi-client
+ x manpage
+ x .deb generation
+ => write actual manpage (and, HTML output? ronn? pandoc?)
+ => write actual README
+
+- implement @-syntax for create/update
+ => TODO: what was the proposal here?
+ => some variant of @-syntax for stream of multiple updates/creations?
+
+- get revisions for all entity types
+
+
+#### Milestones
+
+- ability (at all) to revise edits for a single entity in editgroup
+ => clobber existing edits on update
+ => edits: get entity in current edit state
+- streaming updates from search, with either pipe (jq) or field mutations
+ => syntax/commands
+ => batching (syntax? subcommand?)
+ => auto-accept mode
+- download many PDFs from search query
+ => parallelism could be GNU/parallel
+ => don't clobber existing
+
+#### Editgroup Workflow
+
+- editgroup creation outputs just editgroup_id on stdout (unless output type selected), plus "success" to stderr
+- parse editgroup specifier
+ => "auto": fetch from recent; default?
+ => "new": create
+ => editgroup_blah or blah
+- implement "delete from editgroup" for updates, edit
+ => no updates with current setup
+ => fetch editgroup helper
+ => helper function that takes editgroup (model) and expanded specifier; deletes existing edit from editgroup if necessary
+ => skip this codepath for "new" and batch creation
+
+#### File Downloads
+
+- download single file:
+ => try archive.org files, then wayback, then original URLs
+ => download to current directory as {sha1hex}.pdf.partial, then atomic move on success
+- optional directory structure: {dir}/{hex}/{hex}/{sha1hex}.pdf
+- parallelism of downloads
+
+#### Backburner
+
+- -o/--output and -i/--input for format/schema selection (including 'es-json')
+- search release, filters, scroll API, fetching API for each
+ => structopt parses: query, filter, anti-filter
+- search release, filters, scroll API, fetching API for each, verifying revision and filters for each
+
+## Design Decisions
+
+- batch/multi behavior for mutations
+ => need some option to do auto-accept batches
+- updates and create, from-file vs. args
+ => basically, could be any of specifier, input_file, mutations supplied on command-line
+ => could use httpie @file.blah syntax to load entire file
+ => "edit" as an option for reading single files from disk? meh
+ proposal:
+ create <type>
+ either reads a file from path/stdin, or has mutation args
+ optionally --new-editgroup
+ create-multi <type>
+ reads multiple JSON from file or stdin
+ optionally --auto-batch in chunks
+ optionally --new-editgroup
+ update <specifier>
+ takes a specifier
+ either reads a file from path/stdin, or has mutation args
+ update-multi <type>
+ reads multiple JSON from file or stdin
+ creates new editgroup?
+ edit <specifier>
+ delete <specifier>
+ delete-multi <type>
+ reads multiple entities from stdin
+
+ --skip-check controls whether to do a GET and validate mutations
+ => eg, don't update if equal
+- holding state about current editgroup
+ => env var, with helpful output to show how to export
+ => spawn sub-shell with FATCAT_EDITGROUP set
+ => state in a config file somewhere (user homedir?)
+ => "smart" select most recent fatcat-cli editgroup from editor's list
+- release revision checking on updates
+ => could re-fetch and check rev and/or mutations against current before making edit
+- delete edit from editgroup
+
+## Rust refactors
+
+In rust code, all entity responses could have trait object implementations,
+which would transform to either returning the entity (trait object) or error.
+
+## API refactors
+
+Could significantly reduce number of response types and endpoints by making
+many methods generic (same endpoint URL, but entity type as a keyword):
+
+- entity history
+- delete
+- get edit
+
+Should allow destructive updates in editgroups with "clobber" flag. In
+implementation, could either delete first or on conflict do upsert.
+
+More consistent use of generic success/error?
+
+## Feature Ideas
+
+- changelog (table): under editgroup command?
+- syntect coloring of output for stdout
+- cross build for OS X? homebrew?
+- shell (bash) completions from clap
+- fcid/UUID helper
+- history for all entity types
+ => pretty table, json optional
+- "edit editgroup" as a text file, `git rebase -i` style