summaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* generic API error pageBryan Newbold2020-07-282-0/+50
| | | | | | | | | | This error handler and view page currently works much better than the "flash()" infrastructure built-in to flask, which uses cookies and mostly does not work with our views and layouts. Would like to gradually migrate almost all API errors in the web interface to just raising errors that get rendered on an error page, instead of calling `abort(ae.status)`.
* search: catch ES errors and display betterBryan Newbold2020-07-285-20/+46
|
* refactor search macros into new fileBryan Newbold2020-07-284-45/+72
|
* include container_id as a query boost termBryan Newbold2020-07-281-0/+1
|
* re-order search params to satisfy pylintBryan Newbold2020-07-241-6/+6
| | | | | | Moved all the request_cache=True param calls to just before ES request exectuation. The former ordering "just worked", but pylint didn't like it, and I suspose it was not as idiomatic as it should have been.
* small lint fixesBryan Newbold2020-07-242-3/+1
|
* finish backend refactoring of search codeBryan Newbold2020-07-242-135/+185
|
* update web_search tests to mock ES clientBryan Newbold2020-07-242-45/+47
| | | | | | Instead of using 'responses' mock of 'requests' library. Tried using 'elasticmock' helper but it didn't work.
* refactor release and container searchBryan Newbold2020-07-246-136/+235
| | | | | | | | | | Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly.
* web search: fix pylint errorBryan Newbold2020-07-241-2/+2
|
* WIP: refactoring search to use elasticsearch-dslBryan Newbold2020-07-242-153/+137
|
* Merge branch 'bnewbold-more-lint-fixes' into 'master'Martin Czygan2020-07-2414-34/+26
|\ | | | | | | | | more lint fixes See merge request webgroup/fatcat!69
| * fix issnl typo in pubmedBryan Newbold2020-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | Oh no! This bug may actually have had significant negative impact on metadata in fatcat, in terms of missing container_id associations with pubmed entities. There are about 500k release entities with a PMID but no container_id. Of those, 89k have at least a container_name. Unclear how many would have matched to ISSN-L and thus to a container.
| * remove isascii() work around definition in importers/datacite.pyBryan Newbold2020-07-231-7/+1
| | | | | | | | We are python3.7 now, so this isn't needed.
| * simple lint (flake8) fixes over python codebaseBryan Newbold2020-07-237-19/+18
| | | | | | | | | | | | These should not have any behavior changes, though a number of exception catches are now more general, and there may be long-tail exceptions getting thrown in these statements.
| * fix actual typo in tests (caught by lint)Bryan Newbold2020-07-231-2/+2
| |
| * simple lint (flake8) fixes in testsBryan Newbold2020-07-235-5/+4
| | | | | | | | | | | | The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/
* | simplify in_kbart check statementBryan Newbold2020-07-231-1/+1
| | | | | | | | Thanks @martin
* | make in_kbart transform inclusive of last yearBryan Newbold2020-07-232-0/+55
|/ | | | | | | | | | | | | | | | | Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year.
* Merge branch 'martin-datacite-duplicated-author-gh-59' into 'master'bnewbold2020-07-1113-251/+619
|\ | | | | | | | | datacite: address duplicated contributor issue See merge request webgroup/fatcat!65
| * datacite: resolve formatting issues in testsMartin Czygan2020-07-1096-340/+182
| |\
| * | datacite: adjust testsMartin Czygan2020-07-104-10/+6
| | |
| * | datacite: there should be no index gapsMartin Czygan2020-07-101-2/+8
| | |
| * | datacite: document contributor typesMartin Czygan2020-07-101-0/+25
| | |
| * | wip: contrib, GH59Martin Czygan2020-07-102-245/+383
| | |
| * | wip: contrib, GH59Martin Czygan2020-07-105-3/+105
| | |
| * | datacite: address duplicated contributor issueMartin Czygan2020-07-076-11/+110
| | | | | | | | | | | | | | | | | | | | | Use string comparison. * https://fatcat.wiki/release/spjysmrnsrgyzgq6ise5o44rlu/contribs * https://api.datacite.org/dois/10.25940/roper-31098406
* | | Merge branch 'martin-datacite-bugfix-sentry-44035' into 'master'bnewbold2020-07-111-0/+4
|\ \ \ | |_|/ |/| | | | | | | | datacite: mitigate sentry #44035 See merge request webgroup/fatcat!66
| * | datacite: mitigate sentry #44035Martin Czygan2020-07-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to sentry, running `c.get('nameIdentifiers', []) or []` on a c with value: ``` {'affiliation': [], 'familyName': 'Guidon', 'givenName': 'Manuel', 'nameIdentifiers': {'nameIdentifier': 'https://orcid.org/0000-0003-3543-6683', 'nameIdentifierScheme': 'ORCID', 'schemeUri': 'https://orcid.org'}, 'nameType': 'Personal'} ``` results in a string, which I cannot reproduce. The document in question at: https://api.datacite.org/dois/10.26275/kuw1-fdls seems fine, too.
* | | Merge branch 'martin-arxiv-fix-http-503' into 'master'bnewbold2020-07-101-1/+1
|\ \ \ | | | | | | | | | | | | | | | | arxiv: address 503, "Retry after specified interval" error See merge request webgroup/fatcat!64
| * | | arxiv: do retry five times of HTTP 503Martin Czygan2020-07-101-1/+1
| | | |
* | | | get mediawiki username creation working with spacesBryan Newbold2020-07-091-1/+2
| |/ / |/| |
* | | datacite: fix attribute errorMartin Czygan2020-07-071-1/+1
|/ / | | | | | | refs: #44035
* | tweak flake8 paramsBryan Newbold2020-07-011-2/+8
| |
* | lint (flake8) python test filesBryan Newbold2020-07-0145-168/+71
| |
* | lint (flake8) tool python filesBryan Newbold2020-07-0133-130/+46
| |
* | lint (flake8) web interface python filesBryan Newbold2020-07-017-26/+16
| |
* | lint (flake8) top-level python filesBryan Newbold2020-07-018-25/+11
| |
* | updates to MakefileBryan Newbold2020-07-012-5/+32
| |
* | reviewer: fix bugs in common code found by mypyBryan Newbold2020-07-011-2/+3
| |
* | update TODO with some old examplesBryan Newbold2020-07-011-0/+10
|/
* add new license mappingsBryan Newbold2020-06-302-0/+27
|
* datacite: improve license mappingMartin Czygan2020-06-302-9/+29
| | | | via "missed potential license", refs #58
* datacite: hard cast possible date value to stringMartin Czygan2020-06-292-1/+2
|
* disallow a specific unicode character from DOIsBryan Newbold2020-06-261-0/+6
|
* make fulltext-only label clickableMartin Czygan2020-06-161-2/+2
|
* Merge branch 'bnewbold-better-button-links' into 'master'Martin Czygan2020-06-054-4/+18
|\ | | | | | | | | better download button links See merge request webgroup/fatcat!57
| * use ES 'best_url' in file download pagesBryan Newbold2020-06-042-2/+4
| | | | | | | | Similar to recent change for release download pages.
| * ES schema: add best_url to file schemaBryan Newbold2020-06-041-0/+12
| | | | | | | | | | | | | | | | | | This will increase index size (URLs are often long in our corpus, and we have many file entities), but seems worth it. Initially added `ia_url` as a second field, guaranteed to always be an *.archive.org URL, but `best_url` defaults to that anyways so didn't seem worthwhile.
| * re-use 'best pdf url' for release green buttonBryan Newbold2020-06-041-2/+2
| | | | | | | | | | | | | | I thought this was the existing behavior, but it looks like we were just taking the first link from the first file. In the future may refactor this out even further.