summaryrefslogtreecommitdiffstats
path: root/fatcat_scholar
Commit message (Collapse)AuthorAgeFilesLines
* scrub_text: single-token strings skippedBryan Newbold2020-08-061-0/+4
|
* strip ACKNOWLEDGEMENTS prefixBryan Newbold2020-08-061-0/+1
|
* fix acknowledgement highlighting (typo)Bryan Newbold2020-08-061-1/+1
|
* reduce title boost; use only base query for highlightingBryan Newbold2020-08-061-1/+2
|
* special case '*' queriesBryan Newbold2020-08-061-6/+16
| | | | | More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
* remove 'title' from poor metadata scoringBryan Newbold2020-08-061-1/+0
|
* better time ranges (don't search future)Bryan Newbold2020-08-061-4/+7
|
* add title back to match queryBryan Newbold2020-08-061-0/+1
|
* query fewer fields; highlight all fulltext fields regardless of matchBryan Newbold2020-08-061-3/+1
|
* fix typo in SERP page macroBryan Newbold2020-08-061-1/+1
|
* search tweaks to be forwards-compatible with ES 7.xBryan Newbold2020-08-061-2/+10
| | | | | | When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
* extend ES client timeout to 25 secondsBryan Newbold2020-08-061-1/+1
|
* fix display of papers missing fulltextBryan Newbold2020-08-061-1/+1
| | | | | | I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering.
* Revert "remove duplicate fulltext search from query"Bryan Newbold2020-07-301-0/+1
| | | | | | This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
* transform: catch more cases of null extraBryan Newbold2020-07-301-10/+10
| | | | Also correctly pull issne/issnp from container.extra, not release.extra.
* include container_ident in metadata completeness boostBryan Newbold2020-07-281-0/+1
|
* search: smaller default result setBryan Newbold2020-07-271-1/+1
|
* pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftextBryan Newbold2020-07-271-1/+3
|
* remove duplicate fulltext search from queryBryan Newbold2020-07-271-1/+0
| | | | | | may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
* json: exclude None in output, and sort keysBryan Newbold2020-07-273-4/+4
| | | | | | | | | | These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents
* search: tweak 'past week' date range to not include futureBryan Newbold2020-07-271-2/+4
|
* abstracts: more prefixes to ignoreBryan Newbold2020-07-271-0/+3
|
* more careful watermark removalBryan Newbold2020-07-222-0/+0
|
* hide overflow link domain text (for mobile SERPs)Bryan Newbold2020-07-211-1/+1
|
* gaudy placeholder vaporwave logoBryan Newbold2020-07-214-12/+11
|
* differentiate SERP card size from other card divsBryan Newbold2020-07-212-2/+2
|
* include fulltext acknowledgements in highlightingBryan Newbold2020-07-211-0/+1
|
* ensure SIM release date parses before assigningBryan Newbold2020-07-211-1/+6
|
* strip <em> tags explicitlyBryan Newbold2020-07-211-0/+1
|
* display Szczepanski as an OA quality labelBryan Newbold2020-07-211-1/+1
|
* load issue rows: handle empty metadataBryan Newbold2020-07-211-0/+2
|
* skip partial/stub issue itemsBryan Newbold2020-07-011-0/+2
|
* tweak CSS of last commit so it worksBryan Newbold2020-06-291-1/+1
|
* at full screen width, show full thumbnailsBryan Newbold2020-06-291-0/+3
|
* fix search filter bug (papers is default)Bryan Newbold2020-06-291-2/+2
|
* handle large/bad 'first_page' metadataBryan Newbold2020-06-291-0/+3
| | | | This was causing elasticsearch indexing errors
* more conservative container_original_nameBryan Newbold2020-06-291-0/+2
|
* fix lint errors (and some small bugs)Bryan Newbold2020-06-295-27/+28
|
* seaweedfs for S3 API; pull config from dynaconfBryan Newbold2020-06-291-11/+2
|
* make fmtBryan Newbold2020-06-294-13/+22
|
* fixes to schema parsing from prodBryan Newbold2020-06-291-9/+13
|
* include GROBID-extracted abstracts in search documentsBryan Newbold2020-06-292-10/+23
|
* Search Inside -> SearchBryan Newbold2020-06-291-1/+1
|
* fix SIM highlight HTML escapesBryan Newbold2020-06-291-3/+7
| | | | Thanks to Merlijn for finding the broken examples in QA.
* recommend search filter changes on no hits pageBryan Newbold2020-06-291-0/+18
|
* note about highlight encoding in ES 7.xBryan Newbold2020-06-291-0/+2
|
* OA logo SVG file (small) (unused)Bryan Newbold2020-06-291-0/+19
| | | | via wikimedia commons. Public Domain.
* small improvements to SIM metadata mapsBryan Newbold2020-06-291-6/+11
|
* update stage and withdrawn display; tweak other result stylesBryan Newbold2020-06-291-10/+12
|
* remove confusing unlock logo from OA tagBryan Newbold2020-06-291-1/+1
|