aboutsummaryrefslogtreecommitdiffstats
path: root/fatcat_scholar/search.py
Commit message (Collapse)AuthorAgeFilesLines
* search: update 'Metadata' availablity to 'All Records'Bryan Newbold2022-04-061-1/+1
|
* bugfix: elasticsearch per-request timeout for _health (arg name)Bryan Newbold2022-02-141-1/+1
|
* increase ES default timeout to 50sec, and _health specifically to 90secBryan Newbold2022-02-141-2/+4
| | | | | | This is because we are getting lots of alert chunder on the health check. It might be better to revisit which endpoint is being checked... 'count' is usually fast, but might be slow during bulk indexing.
* fix before_1927 query filter typoBryan Newbold2022-01-181-1/+1
|
* elasticsearch: bump query timeout to 40 seconds (from 25)Bryan Newbold2022-01-101-1/+1
|
* move public domain wall to 1926 ('before 1927')Bryan Newbold2022-01-051-3/+4
|
* lint: small cleanups, mostly E711 and E713Bryan Newbold2021-10-271-1/+1
|
* lint: remove all 'import *' usesBryan Newbold2021-10-271-1/+1
|
* make fmt (black 21.9b0)Bryan Newbold2021-10-271-5/+27
|
* re-style imports (isort) on all core python filesBryan Newbold2021-10-271-9/+10
|
* ES: add 'preference' query param; default to '_local' in prodBryan Newbold2021-08-031-0/+3
|
* update access redirect URL endpointsBryan Newbold2021-06-111-24/+1
|
* make fmtBryan Newbold2021-05-171-1/+4
|
* iterate on PDF redirect linksBryan Newbold2021-05-171-1/+1
|
* web: don't clobber user input query when parsingBryan Newbold2021-04-301-3/+4
| | | | | | | This is intended to be a UX improvement, to avoid adding double quotes around the query a user has pasted in. This does make the "parsing" behavior less transparent.
* iterate on access redirects and landing page implementationBryan Newbold2021-04-271-4/+7
| | | | Small code refactors and minimal test coverage
* web: initial implementation of work landing page and citation_pdf_url access ↵Bryan Newbold2021-04-231-1/+37
| | | | | | | | | | | | redirect The initial intent is to have something that can be used by indexing services to pull the citation_pdf_url meta tag and bounce to a direct IA PDF access URL. For now the landing page stubs are just formatted as SERP results. Presumbably these will get re-styled at some point and include citation graph links, etc.
* search: more aggressively skip fuzzy match exceptionsBryan Newbold2021-04-121-5/+5
|
* health check: use /<index>/_count endpoint; verify shardsBryan Newbold2021-04-061-7/+12
| | | | | In actual production verification, the /_mapping endpoint didn't seem to work.
* change health check from .exists(index) to .mapping(index)Bryan Newbold2021-04-061-4/+13
| | | | | | | | | | | | In cases where the cluser leader node is unavilable, the health check was returning false even when the local node had full shard replicas and could return requests. A refinement of this change would be to use the /<index>/_count API endpoint to ensure that the "failed" and "skipped" shard numbers are 0 (aka, "successful == total"). However, not sure where that endpoint is exposed in the elasticsearch-py API. the CatClient method doesn't seem right.
* make fmtBryan Newbold2021-03-291-0/+1
|
* web and API health check endpointBryan Newbold2021-03-291-0/+14
| | | | | | Because scholar is primarily a search service, the endpoint does a pass-through health check to the elasticsearch backend (aka, es-public-proxy).
* Revert undesirable changesChristian Clauss2021-02-231-1/+1
|
* Modernize Python syntax with pyupgrade --py38-plus **/*.pyChristian Clauss2021-02-231-2/+2
|
* refactor ES configuration setting namesBryan Newbold2021-01-251-2/+2
|
* add permalink icon/linkBryan Newbold2021-01-211-0/+2
|
* add citation query feature (disabled by default)Bryan Newbold2021-01-191-14/+69
| | | | | | This is operationally complex (queries hit 3x backend services!), so not enabled by default. Will need more testing; possibly circuit-breaking. Though haproxy should provide some of that automatically at this point.
* lint: fix small bugs and type annotationsBryan Newbold2021-01-181-1/+1
|
* search: parse and embed a copy of ScholarDoc object in resultsBryan Newbold2021-01-141-1/+6
| | | | Maybe should refactor this to simply replace the object? Hrm.
* search: show fewer, shorter highlights. sort by score.Bryan Newbold2021-01-141-1/+2
|
* work around mypy complaint about exception union typeBryan Newbold2020-12-221-1/+2
|
* remove minor unused importsBryan Newbold2020-10-221-1/+0
|
* improve search logging and exception chainingBryan Newbold2020-10-211-5/+6
|
* refactor do_fulltext_search into smaller methodsBryan Newbold2020-10-161-52/+70
|
* Upgrade Dynaconf to 3+Bruno Rocha2020-10-051-1/+1
| | | | | | In dynaconf 3+ it is no more recommended to use `from dynaconf import settings` now the recommendation is to create your own instance of the settings object based on Dynaconf class.
* search: handle direct DOI and PMCID queriesBryan Newbold2020-09-171-9/+16
| | | | | | If query is a single token which looks like a valid PMCID or DOI, with no surrounding quotes, then expand scope and filter to that single external identifier.
* use container_name, not container_ident, in boostBryan Newbold2020-08-121-1/+1
| | | | | This should result in SIM page fulltext matches not getting pushed down as much, as well as things like biorxiv (*rxiv) results.
* fmt/lint tweaksBryan Newbold2020-08-121-5/+2
|
* search: include 'article' in papers filterBryan Newbold2020-08-121-1/+1
|
* search: use simplified query for highlightingBryan Newbold2020-08-121-1/+8
| | | | | | | | This fixes broken phrase query highlighting. I found this issues but it may have been unrelated: https://github.com/elastic/elasticsearch/issues/40227
* re-use ES sync API clientBryan Newbold2020-08-061-3/+4
|
* report ES API query time as server-timing headerBryan Newbold2020-08-061-0/+4
|
* add debug mode flag (to control json tag/link)Bryan Newbold2020-08-061-0/+1
|
* make fmtBryan Newbold2020-08-061-14/+14
|
* microfilm access filter; broader access matchingBryan Newbold2020-08-061-3/+6
|
* fix acknowledgement highlighting (typo)Bryan Newbold2020-08-061-1/+1
|
* reduce title boost; use only base query for highlightingBryan Newbold2020-08-061-1/+2
|
* special case '*' queriesBryan Newbold2020-08-061-6/+16
| | | | | More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
* remove 'title' from poor metadata scoringBryan Newbold2020-08-061-1/+0
|
* better time ranges (don't search future)Bryan Newbold2020-08-061-4/+7
|