fatcat-scholar - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	experiment with rescoring for metadata boostx-attic-rescore	Bryan Newbold	2020-08-12	1	-1/+29
\|
*	use simple names, not domain names, for some platforms	Bryan Newbold	2020-08-12	1	-3/+3
\|
*	fmt/lint tweaks	Bryan Newbold	2020-08-12	2	-6/+2
\|
*	biblio metadata hacks at transform time	Bryan Newbold	2020-08-12	1	-2/+98
\|
*	transform: more string cleaning	Bryan Newbold	2020-08-12	1	-12/+59
\|
*	search: include 'article' in papers filter	Bryan Newbold	2020-08-12	1	-1/+1
\|
*	search: use simplified query for highlighting	Bryan Newbold	2020-08-12	1	-1/+8
\| \| \| \| \| \| \| \|	This fixes broken phrase query highlighting. I found this issues but it may have been unrelated: https://github.com/elastic/elasticsearch/issues/40227
*	don't print config; make fmt	Bryan Newbold	2020-08-06	1	-3/+7
\|
*	re-use ES sync API client	Bryan Newbold	2020-08-06	1	-3/+4
\|
*	'more versions' dropdown table	Bryan Newbold	2020-08-06	1	-0/+82
\|
*	small HTML simplifications	Bryan Newbold	2020-08-06	1	-6/+6
\|
*	report ES API query time as server-timing header	Bryan Newbold	2020-08-06	2	-1/+13
\|
*	squish collapse button in with tags	Bryan Newbold	2020-08-06	1	-8/+7
\|
*	have search buttons animate after submit	Bryan Newbold	2020-08-06	2	-3/+10
\| \| \| \|	Extremely minimal javascript used
*	add debug mode flag (to control json tag/link)	Bryan Newbold	2020-08-06	3	-5/+11
\|
*	slightly more padding in SERP box at max screen size	Bryan Newbold	2020-08-06	2	-1/+4
\|
*	remove javascript includes	Bryan Newbold	2020-08-06	1	-0/+4
\|
*	basic placeholder thumbnail image	Bryan Newbold	2020-08-06	3	-3/+191
\|
*	sort tags, and show JSTOR as a color tag	Bryan Newbold	2020-08-06	1	-1/+3
\|
*	show language code as a tag	Bryan Newbold	2020-08-06	2	-2/+7
\|
*	set HTML language to locale correctly	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	don't index sim_page without issue_item and first_page	Bryan Newbold	2020-08-06	1	-0/+3
\|
*	volume_int/issue_int as actual ints	Bryan Newbold	2020-08-06	1	-2/+2
\|
*	make fmt	Bryan Newbold	2020-08-06	1	-14/+14
\|
*	handle integer conversion and bounding for ES schema	Bryan Newbold	2020-08-06	2	-19/+35
\|
*	microfilm access filter; broader access matching	Bryan Newbold	2020-08-06	1	-3/+6
\|
*	handle longer query times	Bryan Newbold	2020-08-06	1	-2/+10
\|
*	scrub_text: single-token strings skipped	Bryan Newbold	2020-08-06	1	-0/+4
\|
*	strip ACKNOWLEDGEMENTS prefix	Bryan Newbold	2020-08-06	1	-0/+1
\|
*	fix acknowledgement highlighting (typo)	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	reduce title boost; use only base query for highlighting	Bryan Newbold	2020-08-06	1	-1/+2
\|
*	special case '*' queries	Bryan Newbold	2020-08-06	1	-6/+16
\| \| \| \| \|	More/better query parsing in the client could detect if this was a "filter only" query and do the same kind of optimization.
*	remove 'title' from poor metadata scoring	Bryan Newbold	2020-08-06	1	-1/+0
\|
*	better time ranges (don't search future)	Bryan Newbold	2020-08-06	1	-4/+7
\|
*	add title back to match query	Bryan Newbold	2020-08-06	1	-0/+1
\|
*	query fewer fields; highlight all fulltext fields regardless of match	Bryan Newbold	2020-08-06	1	-3/+1
\|
*	fix typo in SERP page macro	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	search tweaks to be forwards-compatible with ES 7.x	Bryan Newbold	2020-08-06	1	-2/+10
\| \| \| \| \| \|	When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
*	extend ES client timeout to 25 seconds	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	fix display of papers missing fulltext	Bryan Newbold	2020-08-06	1	-1/+1
\| \| \| \| \| \|	I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering.
*	Revert "remove duplicate fulltext search from query"	Bryan Newbold	2020-07-30	1	-0/+1
\| \| \| \| \| \|	This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
*	transform: catch more cases of null extra	Bryan Newbold	2020-07-30	1	-10/+10
\| \| \| \|	Also correctly pull issne/issnp from container.extra, not release.extra.
*	include container_ident in metadata completeness boost	Bryan Newbold	2020-07-28	1	-0/+1
\|
*	search: smaller default result set	Bryan Newbold	2020-07-27	1	-1/+1
\|
*	pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftext	Bryan Newbold	2020-07-27	1	-1/+3
\|
*	remove duplicate fulltext search from query	Bryan Newbold	2020-07-27	1	-1/+0
\| \| \| \| \| \|	may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
*	json: exclude None in output, and sort keys	Bryan Newbold	2020-07-27	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents
*	search: tweak 'past week' date range to not include future	Bryan Newbold	2020-07-27	1	-2/+4
\|
*	abstracts: more prefixes to ignore	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	more careful watermark removal	Bryan Newbold	2020-07-22	2	-0/+0
\|