fatcat-scholar - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	enable index_phrases on everything, biblio_all, title_all	Bryan Newbold	2020-08-06	1	-5/+3
\| \| \| \| \|	Want phrase queries to be faster. Expect this to increase term index size, requiring more disk space.
*	ES schema: do not index fulltext.body or fulltext.annex separately from ↵	Bryan Newbold	2020-08-06	1	-3/+2
\| \| \| \| \| \| \| \|	'everything' The goal here is to reduce term index size. This means that querying/matching only on these fields (distinct from "everything") will not work.
*	ES schema: use smaller integer size (short) for most numbers	Bryan Newbold	2020-08-06	1	-5/+5
\|
*	ES schema: copy_to titles into single title_all field	Bryan Newbold	2020-08-06	1	-4/+4
\|
*	query fewer fields; highlight all fulltext fields regardless of match	Bryan Newbold	2020-08-06	1	-3/+1
\|
*	fix typo in SERP page macro	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	search tweaks to be forwards-compatible with ES 7.x	Bryan Newbold	2020-08-06	1	-2/+10
\| \| \| \| \| \|	When we fully commit to ES 7.x we should upgrade the client library correspondingly, and then can remove these work-arounds. But for now we have one instance of ES 6.x and one ES 7.x.
*	extend ES client timeout to 25 seconds	Bryan Newbold	2020-08-06	1	-1/+1
\|
*	fix display of papers missing fulltext	Bryan Newbold	2020-08-06	1	-1/+1
\| \| \| \| \| \|	I think the bug happened now that we do not serialize the pydantic structures with empty values. A better solution might be to deserialize search hits into pydantic objects before rendering.
*	Revert "remove duplicate fulltext search from query"	Bryan Newbold	2020-07-30	1	-0/+1
\| \| \| \| \| \|	This reverts commit 0d3fd83493c7307a2b9593c7add90b8b6f4b4152. Seems like we do need to query on this field for highlighting to work.
*	transform: catch more cases of null extra	Bryan Newbold	2020-07-30	1	-10/+10
\| \| \| \|	Also correctly pull issne/issnp from container.extra, not release.extra.
*	include container_ident in metadata completeness boost	Bryan Newbold	2020-07-28	1	-0/+1
\|
*	search: smaller default result set	Bryan Newbold	2020-07-27	2	-1/+4
\|
*	pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftext	Bryan Newbold	2020-07-27	1	-1/+3
\|
*	scaling notes (ES)	Bryan Newbold	2020-07-27	1	-1/+71
\|
*	remove duplicate fulltext search from query	Bryan Newbold	2020-07-27	1	-1/+0
\| \| \| \| \| \|	may also remove the 'title' and 'abstracts' searches, though they currently help with boosting, and will want to measure actual preformance difference before that change
*	json: exclude None in output, and sort keys	Bryan Newbold	2020-07-27	3	-4/+4
\| \| \| \| \| \| \| \| \| \|	These are both size/performance enhancements. Not including 'None' values will reduce document sizes on-disk and over network, particularly for intermediate objects. Sorting by key should improve compression ratios across multiple documents, both on-disk (gzip) and in elasticsearch itself: https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html#_put_fields_in_the_same_order_in_documents
*	search: tweak 'past week' date range to not include future	Bryan Newbold	2020-07-27	1	-2/+4
\|
*	schema: 12 shards, 0 replicas, more compression	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	abstracts: more prefixes to ignore	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	more careful watermark removal	Bryan Newbold	2020-07-22	2	-0/+0
\|
*	hide overflow link domain text (for mobile SERPs)	Bryan Newbold	2020-07-21	1	-1/+1
\|
*	gaudy placeholder vaporwave logo	Bryan Newbold	2020-07-21	4	-12/+11
\|
*	differentiate SERP card size from other card divs	Bryan Newbold	2020-07-21	2	-2/+2
\|
*	include fulltext acknowledgements in highlighting	Bryan Newbold	2020-07-21	1	-0/+1
\|
*	ensure SIM release date parses before assigning	Bryan Newbold	2020-07-21	1	-1/+6
\|
*	strip <em> tags explicitly	Bryan Newbold	2020-07-21	1	-0/+1
\|
*	display Szczepanski as an OA quality label	Bryan Newbold	2020-07-21	1	-1/+1
\|
*	load issue rows: handle empty metadata	Bryan Newbold	2020-07-21	1	-0/+2
\|
*	scale-up notes	Bryan Newbold	2020-07-21	1	-0/+26
\|
*	TODO items	Bryan Newbold	2020-07-21	1	-0/+4
\|
*	more notes on SIM/fatcat intersections	Bryan Newbold	2020-07-21	1	-1/+77
\|
*	schema: access as object (list), not nested	Bryan Newbold	2020-07-21	1	-1/+1
\| \| \| \| \| \|	Nested allows more precise filter queries, but it seems that simple "dot notation" filters/queries don't work. We don't have anything doing the sophisticated queries yet, so keep it simple.
*	update README instructions for issue_db generation	Bryan Newbold	2020-07-01	1	-2/+3
\|
*	skip partial/stub issue items	Bryan Newbold	2020-07-01	1	-0/+2
\|
*	tweak CSS of last commit so it works	Bryan Newbold	2020-06-29	1	-1/+1
\|
*	at full screen width, show full thumbnails	Bryan Newbold	2020-06-29	1	-0/+3
\|
*	fix search filter bug (papers is default)	Bryan Newbold	2020-06-29	1	-2/+2
\|
*	update COVID-19 ingest for refactors	Bryan Newbold	2020-06-29	1	-2/+2
\|
*	handle large/bad 'first_page' metadata	Bryan Newbold	2020-06-29	1	-0/+3
\| \| \| \|	This was causing elasticsearch indexing errors
*	update plan doc	Bryan Newbold	2020-06-29	1	-67/+2
\|
*	more conservative container_original_name	Bryan Newbold	2020-06-29	1	-0/+2
\|
*	fix lint errors (and some small bugs)	Bryan Newbold	2020-06-29	5	-27/+28
\|
*	seaweedfs for S3 API; pull config from dynaconf	Bryan Newbold	2020-06-29	2	-11/+4
\|
*	make fmt	Bryan Newbold	2020-06-29	4	-13/+22
\|
*	fixes to schema parsing from prod	Bryan Newbold	2020-06-29	1	-9/+13
\|
*	include GROBID-extracted abstracts in search documents	Bryan Newbold	2020-06-29	2	-10/+23
\|
*	update TODO.txt	Bryan Newbold	2020-06-29	1	-28/+6
\|
*	Search Inside -> Search	Bryan Newbold	2020-06-29	1	-1/+1
\|
*	fix SIM highlight HTML escapes	Bryan Newbold	2020-06-29	1	-3/+7
\| \| \| \|	Thanks to Merlijn for finding the broken examples in QA.