fatcat-scholar - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	set HTML language to locale correctly	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	don't index sim_page without issue_item and first_page	Bryan Newbold	2020-08-06	1	-0/+3
\|
*	volume_int/issue_int as actual ints	Bryan Newbold	2020-08-06	1	-2/+2
\|
*	make fmt	Bryan Newbold	2020-08-06	1	-14/+14
\|
*	ES schema: access_type should be any option, not just 'best'	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	handle integer conversion and bounding for ES schema	Bryan Newbold	2020-08-06	2	-19/+35
\|
*	microfilm access filter; broader access matching	Bryan Newbold	2020-08-06	1	-3/+6
\|
*	handle longer query times	Bryan Newbold	2020-08-06	1	-2/+10
\|
*	scrub_text: single-token strings skipped	Bryan Newbold	2020-08-06	2	-1/+5
\|
*	strip ACKNOWLEDGEMENTS prefix	Bryan Newbold	2020-08-06	1	-0/+1
\|
*	fix acknowledgement highlighting (typo)	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	more notes on scaling	Bryan Newbold	2020-08-06	1	-0/+363
\|
*	reduce title boost; use only base query for highlighting	Bryan Newbold	2020-08-06	1	-1/+2
\|
*	special case '*' queries	Bryan Newbold	2020-08-06	1	-6/+16
\| \| \| \| \|	More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
*	remove 'title' from poor metadata scoring	Bryan Newbold	2020-08-06	1	-1/+0
\|
*	better time ranges (don't search future)	Bryan Newbold	2020-08-06	1	-4/+7
\|
*	add title back to match query	Bryan Newbold	2020-08-06	1	-0/+1
\|
*	enable index_phrases on everything, biblio_all, title_all	Bryan Newbold	2020-08-06	1	-5/+3
\| \| \| \| \|	Want phrase queries to be faster. Expect this to increase term index size, requiring more disk space.
*	ES schema: do not index fulltext.body or fulltext.annex separately from ↵	Bryan Newbold	2020-08-06	1	-3/+2
\| \| \| \| \| \| \| \|	'everything' The goal here is to reduce term index size. This means that querying/matching only on these fields (distinct from "everything") will not work.
*	ES schema: use smaller integer size (short) for most numbers	Bryan Newbold	2020-08-06	1	-5/+5
\|
*	ES schema: copy_to titles into single title_all field	Bryan Newbold	2020-08-06	1	-4/+4
\|
*	query fewer fields; highlight all fulltext fields regardless of match	Bryan Newbold	2020-08-06	1	-3/+1
\|
*	fix typo in SERP page macro	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	search tweaks to be forwards-compatible with ES 7.x	Bryan Newbold	2020-08-06	1	-2/+10
\| \| \| \| \| \|	When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
*	extend ES client timeout to 25 seconds	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	fix display of papers missing fulltext	Bryan Newbold	2020-08-06	1	-1/+1
\| \| \| \| \| \|	I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering.
*	Revert "remove duplicate fulltext search from query"	Bryan Newbold	2020-07-30	1	-0/+1
\| \| \| \| \| \|	This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
*	transform: catch more cases of null extra	Bryan Newbold	2020-07-30	1	-10/+10
\| \| \| \|	Also correctly pull issne/issnp from container.extra, not release.extra.
*	include container_ident in metadata completeness boost	Bryan Newbold	2020-07-28	1	-0/+1
\|
*	search: smaller default result set	Bryan Newbold	2020-07-27	2	-1/+4
\|
*	pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftext	Bryan Newbold	2020-07-27	1	-1/+3
\|
*	scaling notes (ES)	Bryan Newbold	2020-07-27	1	-1/+71
\|
*	remove duplicate fulltext search from query	Bryan Newbold	2020-07-27	1	-1/+0
\| \| \| \| \| \|	may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
*	json: exclude None in output, and sort keys	Bryan Newbold	2020-07-27	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents
*	search: tweak 'past week' date range to not include future	Bryan Newbold	2020-07-27	1	-2/+4
\|
*	schema: 12 shards, 0 replicas, more compression	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	abstracts: more prefixes to ignore	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	more careful watermark removal	Bryan Newbold	2020-07-22	2	-0/+0
\|
*	hide overflow link domain text (for mobile SERPs)	Bryan Newbold	2020-07-21	1	-1/+1
\|
*	gaudy placeholder vaporwave logo	Bryan Newbold	2020-07-21	4	-12/+11
\|
*	differentiate SERP card size from other card divs	Bryan Newbold	2020-07-21	2	-2/+2
\|
*	include fulltext acknowledgements in highlighting	Bryan Newbold	2020-07-21	1	-0/+1
\|
*	ensure SIM release date parses before assigning	Bryan Newbold	2020-07-21	1	-1/+6
\|
*	strip <em> tags explicitly	Bryan Newbold	2020-07-21	1	-0/+1
\|
*	display Szczepanski as an OA quality label	Bryan Newbold	2020-07-21	1	-1/+1
\|
*	load issue rows: handle empty metadata	Bryan Newbold	2020-07-21	1	-0/+2
\|
*	scale-up notes	Bryan Newbold	2020-07-21	1	-0/+26
\|
*	TODO items	Bryan Newbold	2020-07-21	1	-0/+4
\|
*	more notes on SIM/fatcat intersections	Bryan Newbold	2020-07-21	1	-1/+77
\|
*	schema: access as object (list), not nested	Bryan Newbold	2020-07-21	1	-1/+1
\| \| \| \| \| \|	Nested allows more precise filter queries, but it seems that simple "dot notation" filters/queries don't work. We don't have anything doing the sophisticated queries yet, so keep it simple.