aboutsummaryrefslogtreecommitdiffstats
path: root/fatcat_scholar
Commit message (Collapse)AuthorAgeFilesLines
* experiment with rescoring for metadata boostx-attic-rescoreBryan Newbold2020-08-121-1/+29
|
* use simple names, not domain names, for some platformsBryan Newbold2020-08-121-3/+3
|
* fmt/lint tweaksBryan Newbold2020-08-122-6/+2
|
* biblio metadata hacks at transform timeBryan Newbold2020-08-121-2/+98
|
* transform: more string cleaningBryan Newbold2020-08-121-12/+59
|
* search: include 'article' in papers filterBryan Newbold2020-08-121-1/+1
|
* search: use simplified query for highlightingBryan Newbold2020-08-121-1/+8
| | | | | | | | This fixes broken phrase query highlighting. I found this issues but it may have been unrelated: https://github.com/elastic/elasticsearch/issues/40227
* don't print config; make fmtBryan Newbold2020-08-061-3/+7
|
* re-use ES sync API clientBryan Newbold2020-08-061-3/+4
|
* 'more versions' dropdown tableBryan Newbold2020-08-061-0/+82
|
* small HTML simplificationsBryan Newbold2020-08-061-6/+6
|
* report ES API query time as server-timing headerBryan Newbold2020-08-062-1/+13
|
* squish collapse button in with tagsBryan Newbold2020-08-061-8/+7
|
* have search buttons animate after submitBryan Newbold2020-08-062-3/+10
| | | | Extremely minimal javascript used
* add debug mode flag (to control json tag/link)Bryan Newbold2020-08-063-5/+11
|
* slightly more padding in SERP box at max screen sizeBryan Newbold2020-08-062-1/+4
|
* remove javascript includesBryan Newbold2020-08-061-0/+4
|
* basic placeholder thumbnail imageBryan Newbold2020-08-063-3/+191
|
* sort tags, and show JSTOR as a color tagBryan Newbold2020-08-061-1/+3
|
* show language code as a tagBryan Newbold2020-08-062-2/+7
|
* set HTML language to locale correctlyBryan Newbold2020-08-061-1/+1
|
* don't index sim_page without issue_item and first_pageBryan Newbold2020-08-061-0/+3
|
* volume_int/issue_int as actual intsBryan Newbold2020-08-061-2/+2
|
* make fmtBryan Newbold2020-08-061-14/+14
|
* handle integer conversion and bounding for ES schemaBryan Newbold2020-08-062-19/+35
|
* microfilm access filter; broader access matchingBryan Newbold2020-08-061-3/+6
|
* handle longer query timesBryan Newbold2020-08-061-2/+10
|
* scrub_text: single-token strings skippedBryan Newbold2020-08-061-0/+4
|
* strip ACKNOWLEDGEMENTS prefixBryan Newbold2020-08-061-0/+1
|
* fix acknowledgement highlighting (typo)Bryan Newbold2020-08-061-1/+1
|
* reduce title boost; use only base query for highlightingBryan Newbold2020-08-061-1/+2
|
* special case '*' queriesBryan Newbold2020-08-061-6/+16
| | | | | More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
* remove 'title' from poor metadata scoringBryan Newbold2020-08-061-1/+0
|
* better time ranges (don't search future)Bryan Newbold2020-08-061-4/+7
|
* add title back to match queryBryan Newbold2020-08-061-0/+1
|
* query fewer fields; highlight all fulltext fields regardless of matchBryan Newbold2020-08-061-3/+1
|
* fix typo in SERP page macroBryan Newbold2020-08-061-1/+1
|
* search tweaks to be forwards-compatible with ES 7.xBryan Newbold2020-08-061-2/+10
| | | | | | When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
* extend ES client timeout to 25 secondsBryan Newbold2020-08-061-1/+1
|
* fix display of papers missing fulltextBryan Newbold2020-08-061-1/+1
| | | | | | I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering.
* Revert "remove duplicate fulltext search from query"Bryan Newbold2020-07-301-0/+1
| | | | | | This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
* transform: catch more cases of null extraBryan Newbold2020-07-301-10/+10
| | | | Also correctly pull issne/issnp from container.extra, not release.extra.
* include container_ident in metadata completeness boostBryan Newbold2020-07-281-0/+1
|
* search: smaller default result setBryan Newbold2020-07-271-1/+1
|
* pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftextBryan Newbold2020-07-271-1/+3
|
* remove duplicate fulltext search from queryBryan Newbold2020-07-271-1/+0
| | | | | | may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
* json: exclude None in output, and sort keysBryan Newbold2020-07-273-4/+4
| | | | | | | | | | These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents
* search: tweak 'past week' date range to not include futureBryan Newbold2020-07-271-2/+4
|
* abstracts: more prefixes to ignoreBryan Newbold2020-07-271-0/+3
|
* more careful watermark removalBryan Newbold2020-07-222-0/+0
|