diff options
author | Bryan Newbold <bnewbold@archive.org> | 2021-01-18 19:52:37 -0800 |
---|---|---|
committer | Bryan Newbold <bnewbold@archive.org> | 2021-01-19 19:49:04 -0800 |
commit | 0adb490ae2ba8f961bac559a981f89d6d264af60 (patch) | |
tree | 59e916cbd380c0b6c4d9e17c9fbc6353dd744a91 /proposals | |
parent | 78ad484db9d7deb09410e49407cd036cdc9363d2 (diff) | |
download | fatcat-scholar-0adb490ae2ba8f961bac559a981f89d6d264af60.tar.gz fatcat-scholar-0adb490ae2ba8f961bac559a981f89d6d264af60.zip |
initial notes on crude query parsing
Diffstat (limited to 'proposals')
-rw-r--r-- | proposals/2021_crude_query_parse.md | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/proposals/2021_crude_query_parse.md b/proposals/2021_crude_query_parse.md new file mode 100644 index 0000000..2a7663b --- /dev/null +++ b/proposals/2021_crude_query_parse.md @@ -0,0 +1,18 @@ + + +Thinking of simple ways to reduce query parse errors and handle more queries as +expected. In particular: + +- handle slashes in query tokens (eg, "N/A" without quotes) +- handle semi-colons in queries, when they are not intended as filters +- if query "looks like" a raw citation string, detect that and do citation + parsing in to a structured format, then do a query or fuzzy lookup from there + + +## Questions/Thoughts + +Should we detect title lookups in addition to full citation lookups? Probably +too complicated. + +Do we have a static list of colon-prefixes, or load from the schema mapping +file itself? |