fatcat - [no description]

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	refactor release and container search	Bryan Newbold	2020-07-24	6	-136/+235
\| \| \| \| \| \| \| \| \| \|	Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly.
*	web search: fix pylint error	Bryan Newbold	2020-07-24	1	-2/+2
\|
*	WIP: refactoring search to use elasticsearch-dsl	Bryan Newbold	2020-07-24	2	-153/+137
\|
*	Merge branch 'bnewbold-more-lint-fixes' into 'master'	Martin Czygan	2020-07-24	14	-34/+26
\|\ \| \| \| \| \| \| \| \|	more lint fixes See merge request webgroup/fatcat!69
\| *	fix issnl typo in pubmed	Bryan Newbold	2020-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Oh no! This bug may actually have had significant negative impact on metadata in fatcat, in terms of missing container_id associations with pubmed entities. There are about 500k release entities with a PMID but no container_id. Of those, 89k have at least a container_name. Unclear how many would have matched to ISSN-L and thus to a container.
\| *	remove isascii() work around definition in importers/datacite.py	Bryan Newbold	2020-07-23	1	-7/+1
\| \| \| \| \| \| \| \|	We are python3.7 now, so this isn't needed.
\| *	simple lint (flake8) fixes over python codebase	Bryan Newbold	2020-07-23	7	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \|	These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements.
\| *	fix actual typo in tests (caught by lint)	Bryan Newbold	2020-07-23	1	-2/+2
\| \|
\| *	simple lint (flake8) fixes in tests	Bryan Newbold	2020-07-23	5	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \|	The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/
* \|	simplify in_kbart check statement	Bryan Newbold	2020-07-23	1	-1/+1
\| \| \| \| \| \| \| \|	Thanks @martin
* \|	make in_kbart transform inclusive of last year	Bryan Newbold	2020-07-23	2	-0/+55
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year.
*	Merge branch 'martin-datacite-duplicated-author-gh-59' into 'master'	bnewbold	2020-07-11	13	-251/+619
\|\ \| \| \| \| \| \| \| \|	datacite: address duplicated contributor issue See merge request webgroup/fatcat!65
\| *	datacite: resolve formatting issues in tests	Martin Czygan	2020-07-10	96	-340/+182
\| \|\
\| * \|	datacite: adjust tests	Martin Czygan	2020-07-10	4	-10/+6
\| \| \|
\| * \|	datacite: there should be no index gaps	Martin Czygan	2020-07-10	1	-2/+8
\| \| \|
\| * \|	datacite: document contributor types	Martin Czygan	2020-07-10	1	-0/+25
\| \| \|
\| * \|	wip: contrib, GH59	Martin Czygan	2020-07-10	2	-245/+383
\| \| \|
\| * \|	wip: contrib, GH59	Martin Czygan	2020-07-10	5	-3/+105
\| \| \|
\| * \|	datacite: address duplicated contributor issue	Martin Czygan	2020-07-07	6	-11/+110
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
* \| \|	Merge branch 'martin-datacite-bugfix-sentry-44035' into 'master'	bnewbold	2020-07-11	1	-0/+4
\|\ \ \ \| \|_\|/ \|/\| \| \| \| \| \| \| \|	datacite: mitigate sentry #44035 See merge request webgroup/fatcat!66
\| * \|	datacite: mitigate sentry #44035	Martin Czygan	2020-07-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value: ``` {'affiliation': [], 'familyName': 'Guidon', 'givenName': 'Manuel', 'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683', 'nameIdentifierScheme': 'ORCID', 'schemeUri': 'https://orcid.org'}, 'nameType': 'Personal'} ``` results in a string, which I cannot reproduce. The document in question at: https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too.
* \| \|	Merge branch 'martin-arxiv-fix-http-503' into 'master'	bnewbold	2020-07-10	1	-1/+1
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	arxiv: address 503, "Retry after specified interval" error See merge request webgroup/fatcat!64
\| * \| \|	arxiv: do retry five times of HTTP 503	Martin Czygan	2020-07-10	1	-1/+1
\| \| \| \|
* \| \| \|	get mediawiki username creation working with spaces	Bryan Newbold	2020-07-09	1	-1/+2
\| \|/ / \|/\| \|
* \| \|	datacite: fix attribute error	Martin Czygan	2020-07-07	1	-1/+1
\|/ / \| \| \| \| \| \|	refs: #44035
* \|	tweak flake8 params	Bryan Newbold	2020-07-01	1	-2/+8
\| \|
* \|	lint (flake8) python test files	Bryan Newbold	2020-07-01	45	-168/+71
\| \|
* \|	lint (flake8) tool python files	Bryan Newbold	2020-07-01	33	-130/+46
\| \|
* \|	lint (flake8) web interface python files	Bryan Newbold	2020-07-01	7	-26/+16
\| \|
* \|	lint (flake8) top-level python files	Bryan Newbold	2020-07-01	8	-25/+11
\| \|
* \|	updates to Makefile	Bryan Newbold	2020-07-01	2	-5/+32
\| \|
* \|	reviewer: fix bugs in common code found by mypy	Bryan Newbold	2020-07-01	1	-2/+3
\| \|
* \|	update TODO with some old examples	Bryan Newbold	2020-07-01	1	-0/+10
\|/
*	add new license mappings	Bryan Newbold	2020-06-30	2	-0/+27
\|
*	datacite: improve license mapping	Martin Czygan	2020-06-30	2	-9/+29
\| \| \| \|	via "missed potential license", refs #58
*	datacite: hard cast possible date value to string	Martin Czygan	2020-06-29	2	-1/+2
\|
*	disallow a specific unicode character from DOIs	Bryan Newbold	2020-06-26	1	-0/+6
\|
*	make fulltext-only label clickable	Martin Czygan	2020-06-16	1	-2/+2
\|
*	Merge branch 'bnewbold-better-button-links' into 'master'	Martin Czygan	2020-06-05	4	-4/+18
\|\ \| \| \| \| \| \| \| \|	better download button links See merge request webgroup/fatcat!57
\| *	use ES 'best_url' in file download pages	Bryan Newbold	2020-06-04	2	-2/+4
\| \| \| \| \| \| \| \|	Similar to recent change for release download pages.
\| *	ES schema: add best_url to file schema	Bryan Newbold	2020-06-04	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
\| *	re-use 'best pdf url' for release green button	Bryan Newbold	2020-06-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I thought this was the existing behavior, but it looks like we were just taking the first link from the first file. In the future may refactor this out even further.
* \|	fix 'dev' target in python makefile	Bryan Newbold	2020-06-04	1	-1/+1
\|/
*	Merge remote-tracking branch 'origin/martin-harvest-fail-on-400'	Bryan Newbold	2020-05-29	1	-4/+0
\|\ \| \| \| \| \| \| \| \| \| \|	Manually resolved conflicts: python/fatcat_tools/harvest/doi_registrars.py
\| *	harvest: fail on HTTP 400	Martin Czygan	2020-05-29	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the past harvest of datacite resulted in occasional HTTP 400. Meanwhile, various API bugs have been fixed (most recently: https://github.com/datacite/lupo/pull/537, https://github.com/datacite/datacite/issues/1038). Downside of ignoring this error was that state lives in kafka, which has limited support for deletion of arbitrary messages from a topic.
* \|	Merge branch 'martin-datacite-harvest-log-output' into 'master'	Martin Czygan	2020-05-29	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	harvest: log the failed url See merge request webgroup/fatcat!55
\| * \|	harvest: log the failed url	Martin Czygan	2020-05-29	1	-1/+1
\| \|/
* /	datacite: fix test docs	Martin Czygan	2020-05-29	1	-3/+3
\|/
*	Merge branch 'bnewbold-ingest-stage' into 'master'	Martin Czygan	2020-05-28	3	-7/+46
\|\ \| \| \| \| \| \| \| \|	verify release_stage in ingest importer See merge request webgroup/fatcat!52
\| *	ingest importer: check that stage is consistent with release	Bryan Newbold	2020-05-26	1	-0/+5
\| \|