From 276ac2aa24166660bc6ffe7601cee44b5d848dae Mon Sep 17 00:00:00 2001 From: Bryan Newbold Date: Wed, 4 Jan 2023 19:55:30 -0800 Subject: proposals: update status; add some old ones; consistent file names --- proposals/2019-09-11_search_query_parsing.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 proposals/2019-09-11_search_query_parsing.md (limited to 'proposals/2019-09-11_search_query_parsing.md') diff --git a/proposals/2019-09-11_search_query_parsing.md b/proposals/2019-09-11_search_query_parsing.md new file mode 100644 index 00000000..f1fb0128 --- /dev/null +++ b/proposals/2019-09-11_search_query_parsing.md @@ -0,0 +1,28 @@ + +Status: brainstorm + +## Search Query Parsing + +The default "release" search on fatcat.wiki currently uses the elasticsearch +built-in `query_string` parser, which is explicitly not recommended for +public/production use. + +The best way forward is likely a custom query parser (eg, PEG-generated parser) +that generates a complete elasticsearch query JSON structure. + +A couple search issues this would help with: + +- better parsing of keywords (year, year-range, DOI, ISSN, etc) in complex + queries and turning these in to keyword term sub-queries +- queries including terms from multiple fields which aren't explicitly tagged + (eg, "lovelace computer" vs. "author:lovelace title:computer") +- avoiding unsustainably expensive queries (eg, prefix wildcard, regex) +- handling single-character mispellings and synonyms +- collapsing multiple releases under the same work in search results + +In the near future, we may also create a fulltext search index, which will have +it's own issues. + +## Tech Changes + +If we haven't already, should also switch to using elasticsearch client library. -- cgit v1.2.3