Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | microfilm access filter; broader access matching | Bryan Newbold | 2020-08-06 | 1 | -3/+6 | |
| | ||||||
* | handle longer query times | Bryan Newbold | 2020-08-06 | 1 | -2/+10 | |
| | ||||||
* | scrub_text: single-token strings skipped | Bryan Newbold | 2020-08-06 | 2 | -1/+5 | |
| | ||||||
* | strip ACKNOWLEDGEMENTS prefix | Bryan Newbold | 2020-08-06 | 1 | -0/+1 | |
| | ||||||
* | fix acknowledgement highlighting (typo) | Bryan Newbold | 2020-08-06 | 1 | -1/+1 | |
| | ||||||
* | more notes on scaling | Bryan Newbold | 2020-08-06 | 1 | -0/+363 | |
| | ||||||
* | reduce title boost; use only base query for highlighting | Bryan Newbold | 2020-08-06 | 1 | -1/+2 | |
| | ||||||
* | special case '*' queries | Bryan Newbold | 2020-08-06 | 1 | -6/+16 | |
| | | | | | More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization. | |||||
* | remove 'title' from poor metadata scoring | Bryan Newbold | 2020-08-06 | 1 | -1/+0 | |
| | ||||||
* | better time ranges (don't search future) | Bryan Newbold | 2020-08-06 | 1 | -4/+7 | |
| | ||||||
* | add title back to match query | Bryan Newbold | 2020-08-06 | 1 | -0/+1 | |
| | ||||||
* | enable index_phrases on everything, biblio_all, title_all | Bryan Newbold | 2020-08-06 | 1 | -5/+3 | |
| | | | | | Want phrase queries to be faster. Expect this to increase term index size, requiring more disk space. | |||||
* | ES schema: do not index fulltext.body or fulltext.annex separately from ↵ | Bryan Newbold | 2020-08-06 | 1 | -3/+2 | |
| | | | | | | | | 'everything' The goal here is to reduce term index size. This means that querying/matching only on these fields (distinct from "everything") will not work. | |||||
* | ES schema: use smaller integer size (short) for most numbers | Bryan Newbold | 2020-08-06 | 1 | -5/+5 | |
| | ||||||
* | ES schema: copy_to titles into single title_all field | Bryan Newbold | 2020-08-06 | 1 | -4/+4 | |
| | ||||||
* | query fewer fields; highlight all fulltext fields regardless of match | Bryan Newbold | 2020-08-06 | 1 | -3/+1 | |
| | ||||||
* | fix typo in SERP page macro | Bryan Newbold | 2020-08-06 | 1 | -1/+1 | |
| | ||||||
* | search tweaks to be forwards-compatible with ES 7.x | Bryan Newbold | 2020-08-06 | 1 | -2/+10 | |
| | | | | | | When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x. | |||||
* | extend ES client timeout to 25 seconds | Bryan Newbold | 2020-08-06 | 1 | -1/+1 | |
| | ||||||
* | fix display of papers missing fulltext | Bryan Newbold | 2020-08-06 | 1 | -1/+1 | |
| | | | | | | I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering. | |||||
* | Revert "remove duplicate fulltext search from query" | Bryan Newbold | 2020-07-30 | 1 | -0/+1 | |
| | | | | | | This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work. | |||||
* | transform: catch more cases of null extra | Bryan Newbold | 2020-07-30 | 1 | -10/+10 | |
| | | | | Also correctly pull issne/issnp from container.extra, not release.extra. | |||||
* | include container_ident in metadata completeness boost | Bryan Newbold | 2020-07-28 | 1 | -0/+1 | |
| | ||||||
* | search: smaller default result set | Bryan Newbold | 2020-07-27 | 2 | -1/+4 | |
| | ||||||
* | pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftext | Bryan Newbold | 2020-07-27 | 1 | -1/+3 | |
| | ||||||
* | scaling notes (ES) | Bryan Newbold | 2020-07-27 | 1 | -1/+71 | |
| | ||||||
* | remove duplicate fulltext search from query | Bryan Newbold | 2020-07-27 | 1 | -1/+0 | |
| | | | | | | may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change | |||||
* | json: exclude None in output, and sort keys | Bryan Newbold | 2020-07-27 | 3 | -4/+4 | |
| | | | | | | | | | | These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents | |||||
* | search: tweak 'past week' date range to not include future | Bryan Newbold | 2020-07-27 | 1 | -2/+4 | |
| | ||||||
* | schema: 12 shards, 0 replicas, more compression | Bryan Newbold | 2020-07-27 | 1 | -0/+3 | |
| | ||||||
* | abstracts: more prefixes to ignore | Bryan Newbold | 2020-07-27 | 1 | -0/+3 | |
| | ||||||
* | more careful watermark removal | Bryan Newbold | 2020-07-22 | 2 | -0/+0 | |
| | ||||||
* | hide overflow link domain text (for mobile SERPs) | Bryan Newbold | 2020-07-21 | 1 | -1/+1 | |
| | ||||||
* | gaudy placeholder vaporwave logo | Bryan Newbold | 2020-07-21 | 4 | -12/+11 | |
| | ||||||
* | differentiate SERP card size from other card divs | Bryan Newbold | 2020-07-21 | 2 | -2/+2 | |
| | ||||||
* | include fulltext acknowledgements in highlighting | Bryan Newbold | 2020-07-21 | 1 | -0/+1 | |
| | ||||||
* | ensure SIM release date parses before assigning | Bryan Newbold | 2020-07-21 | 1 | -1/+6 | |
| | ||||||
* | strip <em> tags explicitly | Bryan Newbold | 2020-07-21 | 1 | -0/+1 | |
| | ||||||
* | display Szczepanski as an OA quality label | Bryan Newbold | 2020-07-21 | 1 | -1/+1 | |
| | ||||||
* | load issue rows: handle empty metadata | Bryan Newbold | 2020-07-21 | 1 | -0/+2 | |
| | ||||||
* | scale-up notes | Bryan Newbold | 2020-07-21 | 1 | -0/+26 | |
| | ||||||
* | TODO items | Bryan Newbold | 2020-07-21 | 1 | -0/+4 | |
| | ||||||
* | more notes on SIM/fatcat intersections | Bryan Newbold | 2020-07-21 | 1 | -1/+77 | |
| | ||||||
* | schema: access as object (list), not nested | Bryan Newbold | 2020-07-21 | 1 | -1/+1 | |
| | | | | | | Nested allows more precise filter queries, but it seems that simple "dot notation" filters/queries don't work. We don't have anything doing the sophisticated queries yet, so keep it simple. | |||||
* | update README instructions for issue_db generation | Bryan Newbold | 2020-07-01 | 1 | -2/+3 | |
| | ||||||
* | skip partial/stub issue items | Bryan Newbold | 2020-07-01 | 1 | -0/+2 | |
| | ||||||
* | tweak CSS of last commit so it works | Bryan Newbold | 2020-06-29 | 1 | -1/+1 | |
| | ||||||
* | at full screen width, show full thumbnails | Bryan Newbold | 2020-06-29 | 1 | -0/+3 | |
| | ||||||
* | fix search filter bug (papers is default) | Bryan Newbold | 2020-06-29 | 1 | -2/+2 | |
| | ||||||
* | update COVID-19 ingest for refactors | Bryan Newbold | 2020-06-29 | 1 | -2/+2 | |
| |