fatcat-scholar - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	refactor use of grobid_tei_xml	Bryan Newbold	2021-10-27	1	-4/+5
\|
*	scrub_text: remove unused mimetype arg	Bryan Newbold	2021-10-27	1	-1/+1
\| \| \| \|	To resolve a warning caught by pytype
*	replace classmethods with staticmethods	Bryan Newbold	2021-10-27	1	-2/+2
\|
*	lint: small cleanups, mostly E711 and E713	Bryan Newbold	2021-10-27	1	-10/+10
\|
*	make fmt (black 21.9b0)	Bryan Newbold	2021-10-27	1	-1/+0
\|
*	re-style imports (isort) on all core python files	Bryan Newbold	2021-10-27	1	-6/+7
\|
*	better parsing of year as integer in refs pipeline	Bryan Newbold	2021-07-26	1	-2/+6
\|
*	bibref: add version field; isbn13 -> isbn	Bryan Newbold	2021-07-25	1	-1/+2
\|
*	refs transform: 1-index refs.index, not 0-index	Bryan Newbold	2021-07-25	1	-1/+1
\| \| \| \| \| \| \| \|	This was not matching expectations/schema of downstream refs pipeline (cgraph), and wasn't matching documented schema. Note care required when checking if the index is set, to distinguish between '0' and 'None' values.
*	refs: include (source) release_stage in output	Bryan Newbold	2021-06-30	1	-0/+1
\|
*	schema: add 'crossref' to bundle schema, and add from_json() helper	Bryan Newbold	2021-06-02	1	-1/+20
\| \| \| \| \|	from_json() refactor was an earlier TODO, to reduce duplication when updating fields on this class
*	indexing: defer to creator.display_name over contrib.raw_name	Bryan Newbold	2021-04-12	1	-1/+3
\|
*	catch HTML parsing error from withing html (via bs4)	Bryan Newbold	2021-02-01	1	-2/+9
\|
*	bugfix: container_sherpa_color not defined	Bryan Newbold	2021-01-29	1	-1/+1
\|
*	make fmt	Bryan Newbold	2021-01-25	1	-1/+3
\|
*	basic support for excluding web content from index	Bryan Newbold	2021-01-22	1	-0/+14
\| \| \| \|	Based on particular patterns in metadata, or exclusion lists in settings
*	add container_sherpa_color field, and populate it	Bryan Newbold	2021-01-22	1	-18/+18
\|
*	refactor DOI domain lookup into python code; expand table	Bryan Newbold	2021-01-21	1	-0/+14
\|
*	citation: fixes to generic hack; remove bibtex hack	Bryan Newbold	2021-01-21	1	-31/+6
\|
*	fixup: check for container.extra in indexing pipeline	Bryan Newbold	2021-01-21	1	-1/+3
\|
*	fix indexing bug (false-y publisher_type?)	Bryan Newbold	2021-01-18	1	-0/+2
\|
*	lint: fix small bugs and type annotations	Bryan Newbold	2021-01-18	1	-1/+2
\|
*	small corrections to schema/transform	Bryan Newbold	2021-01-16	1	-1/+4
\|
*	make fmt	Bryan Newbold	2021-01-15	1	-6/+6
\|
*	crude bibtex and citation formatting, as a demo	Bryan Newbold	2021-01-14	1	-0/+49
\|
*	schema: make fulltext body optional (eg, for search results)	Bryan Newbold	2021-01-14	1	-1/+1
\|
*	add support for new identifiers and size_bytes schema fields	Bryan Newbold	2021-01-14	1	-4/+13
\|
*	add basic html fulltext support to fetch pipeline	Bryan Newbold	2020-11-18	1	-0/+1
\|
*	schema: optional 'fetched' field on bundles	Bryan Newbold	2020-10-16	1	-0/+2
\|
*	make fmt	Bryan Newbold	2020-09-13	1	-6/+12
\|
*	ref transform: support more GROBID fields	Bryan Newbold	2020-09-13	1	-1/+4
\|
*	URL cleanup helper	Bryan Newbold	2020-09-13	1	-0/+28
\|
*	heavy to refs command	Bryan Newbold	2020-09-04	1	-0/+36
\|
*	handle small ints better (signed/unsigned abs size)	Bryan Newbold	2020-08-12	1	-1/+2
\|
*	transform: more string cleaning	Bryan Newbold	2020-08-12	1	-12/+59
\|
*	volume_int/issue_int as actual ints	Bryan Newbold	2020-08-06	1	-2/+2
\|
*	handle integer conversion and bounding for ES schema	Bryan Newbold	2020-08-06	1	-9/+22
\|
*	scrub_text: single-token strings skipped	Bryan Newbold	2020-08-06	1	-0/+4
\|
*	strip ACKNOWLEDGEMENTS prefix	Bryan Newbold	2020-08-06	1	-0/+1
\|
*	transform: catch more cases of null extra	Bryan Newbold	2020-07-30	1	-10/+10
\| \| \| \|	Also correctly pull issne/issnp from container.extra, not release.extra.
*	abstracts: more prefixes to ignore	Bryan Newbold	2020-07-27	1	-0/+3
\|
*	strip <em> tags explicitly	Bryan Newbold	2020-07-21	1	-0/+1
\|
*	handle large/bad 'first_page' metadata	Bryan Newbold	2020-06-29	1	-0/+3
\| \| \| \|	This was causing elasticsearch indexing errors
*	more conservative container_original_name	Bryan Newbold	2020-06-29	1	-0/+2
\|
*	fix lint errors (and some small bugs)	Bryan Newbold	2020-06-29	1	-2/+1
\|
*	fixes to schema parsing from prod	Bryan Newbold	2020-06-29	1	-9/+13
\|
*	include GROBID-extracted abstracts in search documents	Bryan Newbold	2020-06-29	1	-0/+8
\|
*	fetch pdftotext and pdf_meta from blobs, postgrest	Bryan Newbold	2020-06-29	1	-4/+5
\| \| \| \| \|	This replaces the temporary COVID-19 content hack with production content (text, thumbnail URLs) stored in postgrest and seaweedfs.
*	commit production work-around (temporarily)	Bryan Newbold	2020-06-04	1	-1/+2
\|
*	collapse pages by SIM issue	Bryan Newbold	2020-06-04	1	-0/+1
\|