blob: f1fb01283adbec4e6abbbfe756ebd337b7077c13 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
Status: brainstorm
## Search Query Parsing
The default "release" search on fatcat.wiki currently uses the elasticsearch
built-in `query_string` parser, which is explicitly not recommended for
public/production use.
The best way forward is likely a custom query parser (eg, PEG-generated parser)
that generates a complete elasticsearch query JSON structure.
A couple search issues this would help with:
- better parsing of keywords (year, year-range, DOI, ISSN, etc) in complex
queries and turning these in to keyword term sub-queries
- queries including terms from multiple fields which aren't explicitly tagged
(eg, "lovelace computer" vs. "author:lovelace title:computer")
- avoiding unsustainably expensive queries (eg, prefix wildcard, regex)
- handling single-character mispellings and synonyms
- collapsing multiple releases under the same work in search results
In the near future, we may also create a fulltext search index, which will have
it's own issues.
## Tech Changes
If we haven't already, should also switch to using elasticsearch client library.
|