Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | single-file variant of fileset importer for dataset attempts | Bryan Newbold | 2022-03-23 | 2 | -0/+80 |
| | |||||
* | fix typo in fileset comparison helper | Bryan Newbold | 2022-03-23 | 1 | -3/+2 |
| | |||||
* | ingest fileset fixes, and some test coverage | Bryan Newbold | 2022-03-23 | 2 | -3/+80 |
| | |||||
* | initial implementation of container merger | Bryan Newbold | 2021-11-24 | 1 | -0/+116 |
| | |||||
* | initial file merger, with tests | Bryan Newbold | 2021-11-23 | 1 | -0/+160 |
| | |||||
* | Merge branch 'bnewbold-content-scope' | Bryan Newbold | 2021-11-22 | 3 | -0/+6 |
|\ | |||||
| * | minimal python test coverage of content_scope fields | Bryan Newbold | 2021-11-17 | 3 | -0/+6 |
| | | |||||
* | | polish editgroup diff view | Bryan Newbold | 2021-11-18 | 1 | -0/+8 |
|/ | | | | Still not as great as it could be, but useful in this state. | ||||
* | update datacite tests for license slug changes | Bryan Newbold | 2021-11-10 | 2 | -8/+7 |
| | | | | | Use datacite-specific wrapper function, and remove a couple non-OA/TDM-limited licenses. | ||||
* | remove deprecated extid sqlite3 lookup table feature from importers | Bryan Newbold | 2021-11-09 | 5 | -22/+6 |
| | | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall. | ||||
* | python tests: verify array sort order | Bryan Newbold | 2021-11-05 | 4 | -20/+18 |
| | | | | | | | In a couple cases (eg, filesets), had made tests agnostic to sort order, because the sort order was not stable. In other cases, simply small cleanups and comment improvements. | ||||
* | typing: first batch of python bulk type annotations | Bryan Newbold | 2021-11-03 | 2 | -4/+5 |
| | | | | | | While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was *relatively* simple, and resulted in few code changes other than function signature additions. | ||||
* | fmt (black): tests/ | Bryan Newbold | 2021-11-02 | 55 | -1430/+1852 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 39 | -73/+98 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 17 | -125/+125 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | lint/fmt: remove all 'import *' | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | |||||
* | hacks to work around new pylint false positives | Bryan Newbold | 2021-11-02 | 1 | -9/+15 |
| | |||||
* | cleanup imports after fatcat_tools.transforms change | Bryan Newbold | 2021-11-02 | 4 | -16/+33 |
| | |||||
* | temporary hack around filesets.manifest order instability | Bryan Newbold | 2021-11-02 | 1 | -3/+4 |
| | | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests. | ||||
* | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 2 | -0/+60 |
| | |||||
* | web: editor username /u/<username> helper | Bryan Newbold | 2021-10-13 | 1 | -0/+8 |
| | |||||
* | python: additional test coverage for v0.4 changes | Bryan Newbold | 2021-10-13 | 2 | -2/+19 |
| | |||||
* | python: test coverage of rust schema changes | Bryan Newbold | 2021-10-13 | 4 | -2/+59 |
| | |||||
* | datacite: skip empty abstracts | Martin Czygan | 2021-10-01 | 3 | -1/+91 |
| | | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required` | ||||
* | trivial blank line lint | Bryan Newbold | 2021-09-08 | 1 | -1/+0 |
| | |||||
* | refs: web UI tweaks for iterated CSL schema | Bryan Newbold | 2021-08-03 | 1 | -3/+7 |
| | |||||
* | refs: start the most basic/minimal web refs test coverage ('integration' level) | Bryan Newbold | 2021-07-27 | 4 | -0/+1094 |
| | |||||
* | tests: small citeproc style changes (to match Pipfile.lock update) | Bryan Newbold | 2021-06-23 | 2 | -3/+4 |
| | |||||
* | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 3 | -1/+96 |
| | | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'} | ||||
* | dblp tests: skip redundant seek(0) | Bryan Newbold | 2021-06-03 | 1 | -6/+1 |
| | |||||
* | ingest: add per-container ingest type overrides | Bryan Newbold | 2021-05-21 | 1 | -0/+6 |
| | |||||
* | fix arabesque sqlite3 examples to have 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -0/+0 |
| | |||||
* | make dblp tests more robust | Bryan Newbold | 2021-04-12 | 1 | -2/+11 |
| | | | | | | These were causing a lot of spurious errors in local development. Not sure these tweaks will entirely fix the problem. | ||||
* | transform tool: container transform stats lookup support | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | search container stats: changes to be called from index code path | Bryan Newbold | 2021-04-06 | 1 | -0/+10 |
| | | | | Eg, allowing injection of more config values | ||||
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -5/+42 |
| | | | | Includes transform code updates and partial test coverage. | ||||
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 2 | -2/+0 |
| | | | | refs sentry #77700 | ||||
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -3/+4 |
| | |||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 4 | -5/+77 |
| | |||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 4 | -0/+503 |
| | |||||
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | |||||
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 |
| | |||||
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 3 | -2/+99 |
|\ | | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92 | ||||
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases. | ||||
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+14 |
| | | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity. | ||||
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 2 | -0/+85 |
| | | | | | | | | Using fuzzycat. Add basic test coverage. | ||||
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 |
|/ | |||||
* | DOAJ: remove accidentally commited 'skip' of a test | Bryan Newbold | 2020-11-20 | 1 | -1/+0 |
| | |||||
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 2 | -11/+67 |
| | | | | Also add missing code coverage for update path (disabled by default). | ||||
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -11/+6 |
| |