From 276ac2aa24166660bc6ffe7601cee44b5d848dae Mon Sep 17 00:00:00 2001
From: Bryan Newbold <bnewbold@robocracy.org>
Date: Wed, 4 Jan 2023 19:55:30 -0800
Subject: proposals: update status; add some old ones; consistent file names

---
 proposals/2019-09-11_search_query_parsing.md | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 proposals/2019-09-11_search_query_parsing.md

(limited to 'proposals/2019-09-11_search_query_parsing.md')

diff --git a/proposals/2019-09-11_search_query_parsing.md b/proposals/2019-09-11_search_query_parsing.md
new file mode 100644
index 00000000..f1fb0128
--- /dev/null
+++ b/proposals/2019-09-11_search_query_parsing.md
@@ -0,0 +1,28 @@
+
+Status: brainstorm
+
+## Search Query Parsing
+
+The default "release" search on fatcat.wiki currently uses the elasticsearch
+built-in `query_string` parser, which is explicitly not recommended for
+public/production use.
+
+The best way forward is likely a custom query parser (eg, PEG-generated parser)
+that generates a complete elasticsearch query JSON structure.
+
+A couple search issues this would help with:
+
+- better parsing of keywords (year, year-range, DOI, ISSN, etc) in complex
+  queries and turning these in to keyword term sub-queries
+- queries including terms from multiple fields which aren't explicitly tagged
+  (eg, "lovelace computer" vs. "author:lovelace title:computer")
+- avoiding unsustainably expensive queries (eg, prefix wildcard, regex)
+- handling single-character mispellings and synonyms
+- collapsing multiple releases under the same work in search results
+
+In the near future, we may also create a fulltext search index, which will have
+it's own issues.
+
+## Tech Changes
+
+If we haven't already, should also switch to using elasticsearch client library.
-- 
cgit v1.2.3