From f0aa8010401e3872f8f1dcc85c409e77c6b5a1d8 Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Mon, 17 Aug 2020 23:22:52 -0700 Subject: init repo with README, gitignore, etc --- .gitignore | 22 +++++++++++++ README.md | 43 +++++++++++++++++++++++++ plan.txt | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 170 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 plan.txt diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..2ead7e1 --- /dev/null +++ b/.gitignore @@ -0,0 +1,22 @@ +target/ +*.o +*.a +*.pyc +#*# +*~ +*.swp +.* +*.tmp +*.old +*.profile +*.bkp +*.bak +[Tt]humbs.db +*.DS_Store +build/ +_build/ +src/build/ +*.log + +# Don't ignore this file itself +!.gitignore diff --git a/README.md b/README.md new file mode 100644 index 0000000..5920b9c --- /dev/null +++ b/README.md @@ -0,0 +1,43 @@ + +**es-public-proxy**: Elasticsearch API proxy intended to be exposed to the +public internet (or any non-localhost clients) for safe read-only queries + +This is intended as a simple alternative to other "read-only" plugins or +authentication solutions for elasticsearch. A benefit of keeping the +elasticsearch API itself, instead of building a application-layer wrapper, is +that there already exist client libraries, tools, and integrations in many +languages. + +Plan: + +- single Rust executable +- fast and simple enough to never impact performance or latency +- TOML configuration +- some modern async/await framework +- use official elasticsearch crate? or just reqwest? +- small subset of total public API: get, search, scroll +- per-index permissions +- return response bodies untouched +- parse queries with serde JSON, then re-serialize + +Stretch or future goals: + +- parsing Lucene `query_string` +- provide an alternate simpler API +- query caching +- index aliases and routing +- version mapping (eg, expose 7.x API for 6.x index) + +Non-features: + +- TLS (use a general purpose reverse proxy) + +## Deployment + +The imagined use case is that you have elasticsearch proper listening only to +localhost connections with plain HTTP. This makes adminstration easy from +authenticated local UNIX users. No non-localhost connections to elasticsearch +are allowed, even from trusted clients. This daemon runs as a small sidecar +proxy on localhost, listening on a public port. All non-localhost clients +direct queries through the proxy, which parses the query, ensures it is "safe", +then passes through to backend. diff --git a/plan.txt b/plan.txt new file mode 100644 index 0000000..9ab837a --- /dev/null +++ b/plan.txt @@ -0,0 +1,105 @@ + +TODO: see what other requests the default python and javascript client libraries use + +## basics + +- config: TOML, env, args +- filter requests by method and endpoint +- filter query parameters +- parse request bodies (queries) +- method/body for denied requests +- async streaming responses +- minimize tokio feature flags + +factoring: +- validate query method (method, path, query, body) + +## general endpoints + +- ping + (?) +- basic info + GET / + (?) +- scroll + POST /_search/scroll +- clear scroll + DELETE /_search/scroll + +## per-index endpoints + +- basic info; mapping + (?) +- count + GET //_count +- get document + GET //_doc/<_id> + HEAD //_doc/<_id> + GET //_source/<_id> + HEAD //_source/<_id> +- search + GET //_search + POST //_search + +later: + +- multi-get (`_mget`) +- multi-search (`_msearch`) + +## query types + +compound: +- bool +- boosting +- constant_score + filter (query) + boost (float, optional) + +fulltext: +- match + + (bare str allowed) + query (str) +- match_phrase + + (bare str allowed) + value (str) +- multi_match +- query_string +- simple_query_string + +term-level: +- range + + gt, gte, lt, lte: str or number +- term + + value: str or number +- terms + + (array of str or number) +- wildcard + + value (str) + boost (float, optional) + rewrite (str, optional) +- exists + field (str) +- ids + values (array of str) +- match_all + boost (float, optional) +- match_none + boost (float, optional) + + +TODO: +- terms_set +- span queries +- fuzzy (configurable) + +## additional stuff + +- HTTP content-encoding: gzip +- content-type header; always JSON? +- https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html -- cgit v1.2.3