summaryrefslogtreecommitdiffstats
path: root/python
Commit message (Collapse)AuthorAgeFilesLines
* address spammy datacite titlesMartin Czygan2020-09-232-0/+25
| | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently.
* homepage: small grammar tweaks (The/the)Bryan Newbold2020-09-111-3/+3
|
* ingest: default to crawl protocols.io DOIsBryan Newbold2020-09-101-0/+2
|
* datacite: handle case of empty-string versionBryan Newbold2020-09-103-2/+3
| | | | | Includes a tiny tweak to the datacite import sample file to test this code path.
* remove spurious print statementBryan Newbold2020-09-031-1/+0
|
* generic file entity clean-ups as part of file_meta importerBryan Newbold2020-09-023-0/+149
|
* Merge branch 'bnewbold-filemeta'Bryan Newbold2020-08-275-0/+162
|\
| * fix comment typo (thanks martin)Bryan Newbold2020-08-271-1/+1
| |
| * fixes and test coverage for file_meta importerBryan Newbold2020-08-214-6/+82
| |
| * initial implementation of file_meta importerBryan Newbold2020-08-213-0/+86
| |
* | remove typo (isbn:) from metadata DC.language fieldBryan Newbold2020-08-211-1/+1
| |
* | remove placeholder description meta tagBryan Newbold2020-08-201-1/+0
|/
* fix SearchAction nesting in WebSite (schema.org)Bryan Newbold2020-08-201-5/+2
| | | | | This is not related to sitemap changes, but I was reminded in google search tools when validating site.
* sitemap fixes from testingBryan Newbold2020-08-191-5/+5
|
* update robots.txt and sitemap.xmlBryan Newbold2020-08-194-2/+52
| | | | | | - show minimal robots/sitemap if not in prod environment - default to allow all in robots.txt; link to sitemap index files - basic sitemap.xml without entity-level links
* entity updater: handle doi=None case betterBryan Newbold2020-08-141-1/+1
|
* entity updater: es['publisher_type'] not always setBryan Newbold2020-08-141-1/+1
| | | | This is a small bugfix for a production issue.
* Merge branch 'bnewbold-ingest-improvements' into 'master'Martin Czygan2020-08-138-38/+120
|\ | | | | | | | | ingest behavior changes; some datacite metadata tweaks See merge request webgroup/fatcat!78
| * entity update: change big5 ingest behaviorBryan Newbold2020-08-111-9/+15
| | | | | | | | | | | | | | | | | | In addition to changing the OA default, this was the main intended behavior change in this group of commits: want to ingest fewer attempts that we *expect* to fail, but default to ingest/crawl attempt if we are uncertain. This is because there is a long tail of journals that register DOIs and are defacto OA (fulltext is available), but we don't have metadata indicating them as such.
| * datacite importer: update test cases for 'Additional file' as component, not ↵Bryan Newbold2020-08-115-5/+5
| | | | | | | | stub
| * entity update: default to ingest non-OA worksBryan Newbold2020-08-111-9/+10
| |
| * entity update: skip ingest of figshare+zenodo 'group' DOIsBryan Newbold2020-08-111-0/+15
| |
| * datacite import: figshare-specific hacksBryan Newbold2020-08-112-3/+4
| |
| * datacite import: refactor release_type detection into static methodBryan Newbold2020-08-111-14/+51
| |
| * datacite import: refactor publisher-specific hacks into static methodBryan Newbold2020-08-111-15/+29
| | | | | | | | Also tweak title/publisher detection to use DOI prefixes
| * update crawl blocklist for SPNv2 requests which mostly failBryan Newbold2020-08-101-2/+10
| |
* | harvest: datacite API yields HTTP 200 with broken JSONMartin Czygan2020-08-101-1/+8
|/ | | | As a first step: log response body for debugging.
* release ES transform tweaksBryan Newbold2020-08-071-3/+5
| | | | | | | | pass-through publisher_type from container extra metadata (ES field already existed; this is from newer chocula metadata) count arxiv and PMCID papers which haven't been crawled (by IA) as "dark", not "bright"
* chocula import update tweaksBryan Newbold2020-08-041-10/+14
|
* more update keys and cases for chocula importerBryan Newbold2020-08-041-5/+11
|
* fix key name mismatch in chocula importerBryan Newbold2020-08-041-1/+1
| | | | chocula 'export-fatcat' uses 'ident', not 'fatcat_ident'
* web: add links to deletion pages from edit pagesBryan Newbold2020-07-314-0/+13
|
* editing: withdrawn_status, release_yearBryan Newbold2020-07-312-24/+44
|
* release form validators and tweak labelsBryan Newbold2020-07-311-8/+37
|
* fix typo bug resulting in lost/bad ext_id web editsBryan Newbold2020-07-312-2/+16
|
* implement webface entity deletionBryan Newbold2020-07-313-27/+308
|
* routes: handle case of viewing deleted entity in editgroup contextBryan Newbold2020-07-304-8/+35
| | | | | | Eg, consider deleting an entity. When viewing the editgroup, want to be able to click the deleted entity and see the "deleted entity" page instead of a generic 404.
* remove some meta-fields from TOML form (all entities)Bryan Newbold2020-07-301-1/+5
|
* fix search redirect codes in new testsBryan Newbold2020-07-301-4/+4
|
* wire up new TOML viewsBryan Newbold2020-07-3014-83/+256
|
* generic HTML views for TOML editingBryan Newbold2020-07-304-0/+80
|
* editing: more 'raise' status instead of 'abort()'Bryan Newbold2020-07-301-1/+1
|
* generic helpers for TOML editing routesBryan Newbold2020-07-302-10/+201
|
* basic toml transform helperBryan Newbold2020-07-303-4/+42
|
* pipenv: lock pycountry to 19.10 versionBryan Newbold2020-07-302-7/+7
| | | | datacite importer had errors otherwise
* pipenv: add toml library (and update lock)Bryan Newbold2020-07-302-276/+327
|
* lock loginpass version to prevent conflicting authlib versionBryan Newbold2020-07-301-1/+1
| | | | | May be possible to upgrade both of these libraries together, but that isn't the purpose of current development.
* simple search route increased coverageBryan Newbold2020-07-301-0/+27
|
* comments documenting tuple/dict types in graphics.pyBryan Newbold2020-07-301-0/+11
|
* minor lint fixesBryan Newbold2020-07-302-3/+1
|