Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | python: isort everything | Bryan Newbold | 2021-11-02 | 39 | -73/+98 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 17 | -125/+125 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | lint/fmt: remove all 'import *' | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | |||||
* | hacks to work around new pylint false positives | Bryan Newbold | 2021-11-02 | 1 | -9/+15 |
| | |||||
* | cleanup imports after fatcat_tools.transforms change | Bryan Newbold | 2021-11-02 | 4 | -16/+33 |
| | |||||
* | temporary hack around filesets.manifest order instability | Bryan Newbold | 2021-11-02 | 1 | -3/+4 |
| | | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests. | ||||
* | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 2 | -0/+60 |
| | |||||
* | web: editor username /u/<username> helper | Bryan Newbold | 2021-10-13 | 1 | -0/+8 |
| | |||||
* | python: additional test coverage for v0.4 changes | Bryan Newbold | 2021-10-13 | 2 | -2/+19 |
| | |||||
* | python: test coverage of rust schema changes | Bryan Newbold | 2021-10-13 | 4 | -2/+59 |
| | |||||
* | datacite: skip empty abstracts | Martin Czygan | 2021-10-01 | 3 | -1/+91 |
| | | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required` | ||||
* | trivial blank line lint | Bryan Newbold | 2021-09-08 | 1 | -1/+0 |
| | |||||
* | refs: web UI tweaks for iterated CSL schema | Bryan Newbold | 2021-08-03 | 1 | -3/+7 |
| | |||||
* | refs: start the most basic/minimal web refs test coverage ('integration' level) | Bryan Newbold | 2021-07-27 | 4 | -0/+1094 |
| | |||||
* | tests: small citeproc style changes (to match Pipfile.lock update) | Bryan Newbold | 2021-06-23 | 2 | -3/+4 |
| | |||||
* | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 3 | -1/+96 |
| | | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'} | ||||
* | dblp tests: skip redundant seek(0) | Bryan Newbold | 2021-06-03 | 1 | -6/+1 |
| | |||||
* | ingest: add per-container ingest type overrides | Bryan Newbold | 2021-05-21 | 1 | -0/+6 |
| | |||||
* | fix arabesque sqlite3 examples to have 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -0/+0 |
| | |||||
* | make dblp tests more robust | Bryan Newbold | 2021-04-12 | 1 | -2/+11 |
| | | | | | | These were causing a lot of spurious errors in local development. Not sure these tweaks will entirely fix the problem. | ||||
* | transform tool: container transform stats lookup support | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | search container stats: changes to be called from index code path | Bryan Newbold | 2021-04-06 | 1 | -0/+10 |
| | | | | Eg, allowing injection of more config values | ||||
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -5/+42 |
| | | | | Includes transform code updates and partial test coverage. | ||||
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 2 | -2/+0 |
| | | | | refs sentry #77700 | ||||
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -3/+4 |
| | |||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 4 | -5/+77 |
| | |||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 4 | -0/+503 |
| | |||||
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | |||||
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 |
| | |||||
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 3 | -2/+99 |
|\ | | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92 | ||||
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases. | ||||
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+14 |
| | | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity. | ||||
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 2 | -0/+85 |
| | | | | | | | | Using fuzzycat. Add basic test coverage. | ||||
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 |
|/ | |||||
* | DOAJ: remove accidentally commited 'skip' of a test | Bryan Newbold | 2020-11-20 | 1 | -1/+0 |
| | |||||
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 2 | -11/+67 |
| | | | | Also add missing code coverage for update path (disabled by default). | ||||
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -11/+6 |
| | |||||
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 2 | -0/+97 |
| | | | | Several things to finish implementing and polish. | ||||
* | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 2 | -2/+44 |
| | |||||
* | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 2 | -0/+18 |
| | |||||
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 2 | -1/+7 |
| | |||||
* | Merge branch 'bnewbold-202009-polish' into 'master' | Martin Czygan | 2020-09-29 | 2 | -6/+6 |
|\ | | | | | | | | | fatcat.wiki 2020-09 polish See merge request webgroup/fatcat!84 | ||||
| * | lint cleanups | Bryan Newbold | 2020-09-17 | 1 | -2/+0 |
| | | |||||
| * | web: route constraints on fcids and UUIDs | Bryan Newbold | 2020-09-17 | 1 | -4/+6 |
| | | | | | | | | | | | | | | | | | | | | | | Instead of accepting any string for these parameters and throwing a 400 error if not the correct type, implement better route matching at the framework level and return more 404s. This resolves several outstanding sentry exceptions. The "flask-uuid" was imported and seems to have been configured for this purpose previously, but I guess I never finished configuring it. | ||||
* | | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+6 |
|/ | | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently. | ||||
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 2 | -1/+2 |
| | | | | | Includes a tiny tweak to the datacite import sample file to test this code path. | ||||
* | generic file entity clean-ups as part of file_meta importer | Bryan Newbold | 2020-09-02 | 1 | -0/+99 |
| | |||||
* | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 2 | -0/+68 |
| | |||||
* | datacite importer: update test cases for 'Additional file' as component, not ↵ | Bryan Newbold | 2020-08-11 | 5 | -5/+5 |
| | | | | stub |