Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | remove deprecated extid sqlite3 lookup table feature from importers | Bryan Newbold | 2021-11-09 | 5 | -22/+6 |
| | | | | | | | | This was used during initial bulk imports, but is no longer used and could create serious metadata problems if used accidentially. In retrospect, it also made metadata provenance less transparent, and may have done more harm than good overall. | ||||
* | python tests: verify array sort order | Bryan Newbold | 2021-11-05 | 4 | -20/+18 |
| | | | | | | | In a couple cases (eg, filesets), had made tests agnostic to sort order, because the sort order was not stable. In other cases, simply small cleanups and comment improvements. | ||||
* | typing: first batch of python bulk type annotations | Bryan Newbold | 2021-11-03 | 2 | -4/+5 |
| | | | | | | While these changes are more delicate than simple lint changes, this specific batch of edits and annotations was *relatively* simple, and resulted in few code changes other than function signature additions. | ||||
* | fmt (black): tests/ | Bryan Newbold | 2021-11-02 | 55 | -1430/+1852 |
| | |||||
* | python: isort everything | Bryan Newbold | 2021-11-02 | 39 | -73/+98 |
| | |||||
* | lint: simple, safe inline lint fixes | Bryan Newbold | 2021-11-02 | 17 | -125/+125 |
| | | | | '==' vs 'is'; 'not a in b' vs 'a not in b'; etc | ||||
* | lint/fmt: remove all 'import *' | Bryan Newbold | 2021-11-02 | 1 | -2/+2 |
| | |||||
* | hacks to work around new pylint false positives | Bryan Newbold | 2021-11-02 | 1 | -9/+15 |
| | |||||
* | cleanup imports after fatcat_tools.transforms change | Bryan Newbold | 2021-11-02 | 4 | -16/+33 |
| | |||||
* | temporary hack around filesets.manifest order instability | Bryan Newbold | 2021-11-02 | 1 | -3/+4 |
| | | | | | | May need some change in fatcatd or schema? This isn't a new issue, that part of schema has been around for a long time, just getting detected now with these tests. | ||||
* | generic fileset importer class, with test coverage | Bryan Newbold | 2021-10-14 | 2 | -0/+60 |
| | |||||
* | web: editor username /u/<username> helper | Bryan Newbold | 2021-10-13 | 1 | -0/+8 |
| | |||||
* | python: additional test coverage for v0.4 changes | Bryan Newbold | 2021-10-13 | 2 | -2/+19 |
| | |||||
* | python: test coverage of rust schema changes | Bryan Newbold | 2021-10-13 | 4 | -2/+59 |
| | |||||
* | datacite: skip empty abstracts | Martin Czygan | 2021-10-01 | 3 | -1/+91 |
| | | | | | Do not add abstracts where `clean` results in the empty string - this violates a constraint: `either abstract_sha1 or content is required` | ||||
* | trivial blank line lint | Bryan Newbold | 2021-09-08 | 1 | -1/+0 |
| | |||||
* | refs: web UI tweaks for iterated CSL schema | Bryan Newbold | 2021-08-03 | 1 | -3/+7 |
| | |||||
* | refs: start the most basic/minimal web refs test coverage ('integration' level) | Bryan Newbold | 2021-07-27 | 4 | -0/+1094 |
| | |||||
* | tests: small citeproc style changes (to match Pipfile.lock update) | Bryan Newbold | 2021-06-23 | 2 | -3/+4 |
| | |||||
* | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 3 | -1/+96 |
| | | | | | Caused by a partial "title entry without title" coming *first* (e.g. just holding, e.g. a language, like: {'lang': 'da'} | ||||
* | dblp tests: skip redundant seek(0) | Bryan Newbold | 2021-06-03 | 1 | -6/+1 |
| | |||||
* | ingest: add per-container ingest type overrides | Bryan Newbold | 2021-05-21 | 1 | -0/+6 |
| | |||||
* | fix arabesque sqlite3 examples to have 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -0/+0 |
| | |||||
* | make dblp tests more robust | Bryan Newbold | 2021-04-12 | 1 | -2/+11 |
| | | | | | | These were causing a lot of spurious errors in local development. Not sure these tweaks will entirely fix the problem. | ||||
* | transform tool: container transform stats lookup support | Bryan Newbold | 2021-04-06 | 1 | -0/+1 |
| | |||||
* | search container stats: changes to be called from index code path | Bryan Newbold | 2021-04-06 | 1 | -0/+10 |
| | | | | Eg, allowing injection of more config values | ||||
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -5/+42 |
| | | | | Includes transform code updates and partial test coverage. | ||||
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 2 | -2/+0 |
| | | | | refs sentry #77700 | ||||
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -3/+4 |
| | |||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 4 | -5/+77 |
| | |||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 4 | -0/+503 |
| | |||||
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | |||||
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 |
| | |||||
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 3 | -2/+99 |
|\ | | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92 | ||||
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases. | ||||
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+14 |
| | | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity. | ||||
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 2 | -0/+85 |
| | | | | | | | | Using fuzzycat. Add basic test coverage. | ||||
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 |
|/ | |||||
* | DOAJ: remove accidentally commited 'skip' of a test | Bryan Newbold | 2020-11-20 | 1 | -1/+0 |
| | |||||
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 2 | -11/+67 |
| | | | | Also add missing code coverage for update path (disabled by default). | ||||
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -11/+6 |
| | |||||
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 2 | -0/+97 |
| | | | | Several things to finish implementing and polish. | ||||
* | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 2 | -2/+44 |
| | |||||
* | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 2 | -0/+18 |
| | |||||
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 2 | -1/+7 |
| | |||||
* | Merge branch 'bnewbold-202009-polish' into 'master' | Martin Czygan | 2020-09-29 | 2 | -6/+6 |
|\ | | | | | | | | | fatcat.wiki 2020-09 polish See merge request webgroup/fatcat!84 | ||||
| * | lint cleanups | Bryan Newbold | 2020-09-17 | 1 | -2/+0 |
| | | |||||
| * | web: route constraints on fcids and UUIDs | Bryan Newbold | 2020-09-17 | 1 | -4/+6 |
| | | | | | | | | | | | | | | | | | | | | | | Instead of accepting any string for these parameters and throwing a 400 error if not the correct type, implement better route matching at the framework level and return more 404s. This resolves several outstanding sentry exceptions. The "flask-uuid" was imported and seems to have been configured for this purpose previously, but I guess I never finished configuring it. | ||||
* | | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+6 |
|/ | | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently. |