aboutsummaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/search.py
Commit message (Collapse)AuthorAgeFilesLines
* iterate on access redirects and landing page implementationBryan Newbold2021-04-271-4/+7
| | | | Small code refactors and minimal test coverage
* web: initial implementation of work landing page and citation_pdf_url access ↵Bryan Newbold2021-04-231-1/+37
| | | | | | | | | | | | redirect The initial intent is to have something that can be used by indexing services to pull the citation_pdf_url meta tag and bounce to a direct IA PDF access URL. For now the landing page stubs are just formatted as SERP results. Presumbably these will get re-styled at some point and include citation graph links, etc.
* search: more aggressively skip fuzzy match exceptionsBryan Newbold2021-04-121-5/+5
|
* health check: use /<index>/_count endpoint; verify shardsBryan Newbold2021-04-061-7/+12
| | | | | In actual production verification, the /_mapping endpoint didn't seem to work.
* change health check from .exists(index) to .mapping(index)Bryan Newbold2021-04-061-4/+13
| | | | | | | | | | | | In cases where the cluser leader node is unavilable, the health check was returning false even when the local node had full shard replicas and could return requests. A refinement of this change would be to use the /<index>/_count API endpoint to ensure that the "failed" and "skipped" shard numbers are 0 (aka, "successful == total"). However, not sure where that endpoint is exposed in the elasticsearch-py API. the CatClient method doesn't seem right.
* make fmtBryan Newbold2021-03-291-0/+1
|
* web and API health check endpointBryan Newbold2021-03-291-0/+14
| | | | | | Because scholar is primarily a search service, the endpoint does a pass-through health check to the elasticsearch backend (aka, es-public-proxy).
* Revert undesirable changesChristian Clauss2021-02-231-1/+1
|
* Modernize Python syntax with pyupgrade --py38-plus **/*.pyChristian Clauss2021-02-231-2/+2
|
* refactor ES configuration setting namesBryan Newbold2021-01-251-2/+2
|
* add permalink icon/linkBryan Newbold2021-01-211-0/+2
|
* add citation query feature (disabled by default)Bryan Newbold2021-01-191-14/+69
| | | | | | This is operationally complex (queries hit 3x backend services!), so not enabled by default. Will need more testing; possibly circuit-breaking. Though haproxy should provide some of that automatically at this point.
* lint: fix small bugs and type annotationsBryan Newbold2021-01-181-1/+1
|
* search: parse and embed a copy of ScholarDoc object in resultsBryan Newbold2021-01-141-1/+6
| | | | Maybe should refactor this to simply replace the object? Hrm.
* search: show fewer, shorter highlights. sort by score.Bryan Newbold2021-01-141-1/+2
|
* work around mypy complaint about exception union typeBryan Newbold2020-12-221-1/+2
|
* remove minor unused importsBryan Newbold2020-10-221-1/+0
|
* improve search logging and exception chainingBryan Newbold2020-10-211-5/+6
|
* refactor do_fulltext_search into smaller methodsBryan Newbold2020-10-161-52/+70
|
* Upgrade Dynaconf to 3+Bruno Rocha2020-10-051-1/+1
| | | | | | In dynaconf 3+ it is no more recommended to use `from dynaconf import settings` now the recommendation is to create your own instance of the settings object based on Dynaconf class.
* search: handle direct DOI and PMCID queriesBryan Newbold2020-09-171-9/+16
| | | | | | If query is a single token which looks like a valid PMCID or DOI, with no surrounding quotes, then expand scope and filter to that single external identifier.
* use container_name, not container_ident, in boostBryan Newbold2020-08-121-1/+1
| | | | | This should result in SIM page fulltext matches not getting pushed down as much, as well as things like biorxiv (*rxiv) results.
* fmt/lint tweaksBryan Newbold2020-08-121-5/+2
|
* search: include 'article' in papers filterBryan Newbold2020-08-121-1/+1
|
* search: use simplified query for highlightingBryan Newbold2020-08-121-1/+8
| | | | | | | | This fixes broken phrase query highlighting. I found this issues but it may have been unrelated: https://github.com/elastic/elasticsearch/issues/40227
* re-use ES sync API clientBryan Newbold2020-08-061-3/+4
|
* report ES API query time as server-timing headerBryan Newbold2020-08-061-0/+4
|
* add debug mode flag (to control json tag/link)Bryan Newbold2020-08-061-0/+1
|
* make fmtBryan Newbold2020-08-061-14/+14
|
* microfilm access filter; broader access matchingBryan Newbold2020-08-061-3/+6
|
* fix acknowledgement highlighting (typo)Bryan Newbold2020-08-061-1/+1
|
* reduce title boost; use only base query for highlightingBryan Newbold2020-08-061-1/+2
|
* special case '*' queriesBryan Newbold2020-08-061-6/+16
| | | | | More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
* remove 'title' from poor metadata scoringBryan Newbold2020-08-061-1/+0
|
* better time ranges (don't search future)Bryan Newbold2020-08-061-4/+7
|
* add title back to match queryBryan Newbold2020-08-061-0/+1
|
* query fewer fields; highlight all fulltext fields regardless of matchBryan Newbold2020-08-061-3/+1
|
* search tweaks to be forwards-compatible with ES 7.xBryan Newbold2020-08-061-2/+10
| | | | | | When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
* extend ES client timeout to 25 secondsBryan Newbold2020-08-061-1/+1
|
* Revert "remove duplicate fulltext search from query"Bryan Newbold2020-07-301-0/+1
| | | | | | This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
* include container_ident in metadata completeness boostBryan Newbold2020-07-281-0/+1
|
* search: smaller default result setBryan Newbold2020-07-271-1/+1
|
* remove duplicate fulltext search from queryBryan Newbold2020-07-271-1/+0
| | | | | | may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
* search: tweak 'past week' date range to not include futureBryan Newbold2020-07-271-2/+4
|
* include fulltext acknowledgements in highlightingBryan Newbold2020-07-211-0/+1
|
* fix search filter bug (papers is default)Bryan Newbold2020-06-291-2/+2
|
* make fmtBryan Newbold2020-06-291-3/+3
|
* note about highlight encoding in ES 7.xBryan Newbold2020-06-291-0/+2
|
* un-collapse only to same issue, not uncollapse-all-hitsBryan Newbold2020-06-291-9/+15
| | | | | This is user expecation, and was a lingering TODO with initial implementation.
* fix search order default labelBryan Newbold2020-06-291-1/+1
| | | | Thanks for the catch Alexis R!