summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* refactor release and container searchBryan Newbold2020-07-246-136/+235
| | | | | | | | | | Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly.
* web search: fix pylint errorBryan Newbold2020-07-241-2/+2
|
* WIP: refactoring search to use elasticsearch-dslBryan Newbold2020-07-242-153/+137
|
* Merge branch 'bnewbold-more-lint-fixes' into 'master'Martin Czygan2020-07-2414-34/+26
|\ | | | | | | | | more lint fixes See merge request webgroup/fatcat!69
| * fix issnl typo in pubmedBryan Newbold2020-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | Oh no! This bug may actually have had significant negative impact on metadata in fatcat, in terms of missing container_id associations with pubmed entities. There are about 500k release entities with a PMID but no container_id. Of those, 89k have at least a container_name. Unclear how many would have matched to ISSN-L and thus to a container.
| * remove isascii() work around definition in importers/datacite.pyBryan Newbold2020-07-231-7/+1
| | | | | | | | We are python3.7 now, so this isn't needed.
| * simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-237-19/+18
| | | | | | | | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements.
| * fix actual typo in tests (caught by lint)Bryan Newbold2020-07-231-2/+2
| |
| * simple lint (flake8) fixes in testsBryan Newbold2020-07-235-5/+4
| | | | | | | | | | | | The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/
* | Merge branch 'bnewbold-preservation-year-offset' into 'master'bnewbold2020-07-242-0/+55
|\ \ | |/ |/| | | | | preservation year offset See merge request webgroup/fatcat!67
| * simplify in_kbart check statementBryan Newbold2020-07-231-1/+1
| | | | | | | | Thanks @martin
| * make in_kbart transform inclusive of last yearBryan Newbold2020-07-232-0/+55
|/ | | | | | | | | | | | | | | | | Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year.
* example bad MAG matchBryan Newbold2020-07-231-0/+6
|
* update table/database size statsBryan Newbold2020-07-222-0/+48
|
* Merge branch 'martin-datacite-duplicated-author-gh-59' into 'master'bnewbold2020-07-1113-251/+619
|\ | | | | | | | | datacite: address duplicated contributor issue See merge request webgroup/fatcat!65
| * datacite: resolve formatting issues in testsMartin Czygan2020-07-10103-341/+319
| |\
| * | datacite: adjust testsMartin Czygan2020-07-104-10/+6
| | |
| * | datacite: there should be no index gapsMartin Czygan2020-07-101-2/+8
| | |
| * | datacite: document contributor typesMartin Czygan2020-07-101-0/+25
| | |
| * | wip: contrib, GH59Martin Czygan2020-07-102-245/+383
| | |
| * | wip: contrib, GH59Martin Czygan2020-07-105-3/+105
| | |
| * | datacite: address duplicated contributor issueMartin Czygan2020-07-076-11/+110
| | | | | | | | | | | | | | | | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
* | | Merge branch 'martin-datacite-bugfix-sentry-44035' into 'master'bnewbold2020-07-111-0/+4
|\ \ \ | |_|/ |/| | | | | | | | datacite: mitigate sentry #44035 See merge request webgroup/fatcat!66
| * | datacite: mitigate sentry #44035Martin Czygan2020-07-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value: ``` {'affiliation': [], 'familyName': 'Guidon', 'givenName': 'Manuel', 'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683', 'nameIdentifierScheme': 'ORCID', 'schemeUri': 'https://orcid.org'}, 'nameType': 'Personal'} ``` results in a string, which I cannot reproduce. The document in question at: https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too.
* | | Merge branch 'martin-arxiv-fix-http-503' into 'master'bnewbold2020-07-101-1/+1
|\ \ \ | | | | | | | | | | | | | | | | arxiv: address 503, "Retry after specified interval" error See merge request webgroup/fatcat!64
| * | | arxiv: do retry five times of HTTP 503Martin Czygan2020-07-101-1/+1
| | | |
* | | | get mediawiki username creation working with spacesBryan Newbold2020-07-091-1/+2
| | | |
* | | | Merge branch 'martin-datacite-bugfix-sentry-44035' into 'master'Martin Czygan2020-07-061-1/+1
|\ \ \ \ | |/ / / |/| / / | |/ / | | | datacite: fix attribute error See merge request webgroup/fatcat!63
| * / datacite: fix attribute errorMartin Czygan2020-07-071-1/+1
|/ / | | | | | | refs: #44035
* | Merge branch 'bnewbold-lint' into 'master'Martin Czygan2020-07-0694-351/+152
|\ \ | | | | | | | | | | | | lint cleanups See merge request webgroup/fatcat!62
| * | tweak flake8 paramsBryan Newbold2020-07-011-2/+8
| | |
| * | lint (flake8) python test filesBryan Newbold2020-07-0145-168/+71
| | |
| * | lint (flake8) tool python filesBryan Newbold2020-07-0133-130/+46
| | |
| * | lint (flake8) web interface python filesBryan Newbold2020-07-017-26/+16
| | |
| * | lint (flake8) top-level python filesBryan Newbold2020-07-018-25/+11
|/ /
* | updates to MakefileBryan Newbold2020-07-013-6/+33
| |
* | reviewer: fix bugs in common code found by mypyBryan Newbold2020-07-011-2/+3
| |
* | update TODO with some old examplesBryan Newbold2020-07-011-0/+10
| |
* | commit old example notesBryan Newbold2020-07-013-0/+65
| |
* | JALC bulk edit notes from 2020-03-23Bryan Newbold2020-07-011-0/+23
| |
* | commit example of an elasticsearch SQL queryBryan Newbold2020-07-011-0/+8
| |
* | commit old README about bulk downloadsBryan Newbold2020-07-011-0/+40
|/
* CLI proposalBryan Newbold2020-06-301-0/+124
|
* add new license mappingsBryan Newbold2020-06-302-0/+27
|
* datacite: improve license mappingMartin Czygan2020-06-302-9/+29
| | | | via "missed potential license", refs #58
* Merge branch 'martin-datacite-fix-strptime-36559' into 'master'bnewbold2020-06-292-1/+2
|\ | | | | | | | | datacite: hard cast possible date value to string See merge request webgroup/fatcat!59
| * datacite: hard cast possible date value to stringMartin Czygan2020-06-292-1/+2
|/
* remove accidentally-commited lines from rust MakefileBryan Newbold2020-06-261-3/+0
|
* disallow a specific unicode character from DOIsBryan Newbold2020-06-261-0/+6
|
* Merge branch 'martin-fulltext-checkbox-label' into 'master'bnewbold2020-06-171-2/+2
|\ | | | | | | | | make fulltext-only label clickable See merge request webgroup/fatcat!58