Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -3/+4 |
| | |||||
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 4 | -5/+77 |
| | |||||
* | basic test coverage of dblp release importer | Bryan Newbold | 2020-12-17 | 4 | -0/+503 |
| | |||||
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | |||||
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 |
| | |||||
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 3 | -2/+99 |
|\ | | | | | | | | | DOAJ import fuzzy match filter See merge request webgroup/fatcat!92 | ||||
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
| | | | | | | | | | | The motivation for this change is to enable passing the 'reason' through to edit extra metadata, in cases where we merge or cluster releases. | ||||
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+14 |
| | | | | | | | | | | | | | | | | | | | | | | In this default configuration, any entities with a fuzzy match (even "ambiguous") will be skipped at import time, to prevent creating duplicates. This is conservative towards not creating new/duplicate entities. In the future, as we get more confidence in fuzzy match/verification, we can start to ignore AMBIGUOUS, handle EXACT as same release, and merge STRONG (and WEAK?) matches under the same work entity. | ||||
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 2 | -0/+85 |
| | | | | | | | | Using fuzzycat. Add basic test coverage. | ||||
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 |
|/ | |||||
* | DOAJ: remove accidentally commited 'skip' of a test | Bryan Newbold | 2020-11-20 | 1 | -1/+0 |
| | |||||
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 2 | -11/+67 |
| | | | | Also add missing code coverage for update path (disabled by default). | ||||
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -11/+6 |
| | |||||
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 2 | -0/+97 |
| | | | | Several things to finish implementing and polish. | ||||
* | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
| | |||||
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 2 | -2/+44 |
| | |||||
* | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 2 | -0/+18 |
| | |||||
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 2 | -1/+7 |
| | |||||
* | Merge branch 'bnewbold-202009-polish' into 'master' | Martin Czygan | 2020-09-29 | 2 | -6/+6 |
|\ | | | | | | | | | fatcat.wiki 2020-09 polish See merge request webgroup/fatcat!84 | ||||
| * | lint cleanups | Bryan Newbold | 2020-09-17 | 1 | -2/+0 |
| | | |||||
| * | web: route constraints on fcids and UUIDs | Bryan Newbold | 2020-09-17 | 1 | -4/+6 |
| | | | | | | | | | | | | | | | | | | | | | | Instead of accepting any string for these parameters and throwing a 400 error if not the correct type, implement better route matching at the framework level and return more 404s. This resolves several outstanding sentry exceptions. The "flask-uuid" was imported and seems to have been configured for this purpose previously, but I guess I never finished configuring it. | ||||
* | | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+6 |
|/ | | | | | | | | | seemingly from zenodo: * https://fatcat.wiki/release/rzcpjwukobd4pj36ipla22cnoi * https://doi.org/10.5281/zenodo.4041777 About 3400 records with "FULL MOVIE" in title, currently. | ||||
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 2 | -1/+2 |
| | | | | | Includes a tiny tweak to the datacite import sample file to test this code path. | ||||
* | generic file entity clean-ups as part of file_meta importer | Bryan Newbold | 2020-09-02 | 1 | -0/+99 |
| | |||||
* | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 2 | -0/+68 |
| | |||||
* | datacite importer: update test cases for 'Additional file' as component, not ↵ | Bryan Newbold | 2020-08-11 | 5 | -5/+5 |
| | | | | stub | ||||
* | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -0/+1 |
| | |||||
* | fix typo bug resulting in lost/bad ext_id web edits | Bryan Newbold | 2020-07-31 | 1 | -0/+14 |
| | |||||
* | implement webface entity deletion | Bryan Newbold | 2020-07-31 | 1 | -0/+57 |
| | |||||
* | fix search redirect codes in new tests | Bryan Newbold | 2020-07-30 | 1 | -4/+4 |
| | |||||
* | wire up new TOML views | Bryan Newbold | 2020-07-30 | 2 | -20/+62 |
| | |||||
* | basic toml transform helper | Bryan Newbold | 2020-07-30 | 1 | -0/+22 |
| | |||||
* | simple search route increased coverage | Bryan Newbold | 2020-07-30 | 1 | -0/+27 |
| | |||||
* | minor lint fixes | Bryan Newbold | 2020-07-30 | 1 | -1/+0 |
| | |||||
* | coverage search: 'recent' endpoint test (minimal) | Bryan Newbold | 2020-07-30 | 1 | -1/+32 |
| | |||||
* | expand test coverage of new preservation views | Bryan Newbold | 2020-07-30 | 1 | -15/+122 |
| | |||||
* | refactor coverage tests/mocks | Bryan Newbold | 2020-07-30 | 5 | -39/+80 |
| | |||||
* | coverage test mock fixes | Bryan Newbold | 2020-07-30 | 1 | -14/+51 |
| | |||||
* | lint coverage changes (so far) | Bryan Newbold | 2020-07-30 | 2 | -15/+3 |
| | |||||
* | include new-style preservation+release_type aggs in container stats | Bryan Newbold | 2020-07-30 | 1 | -1/+12 |
| | |||||
* | add regression test for broken container coverage | Bryan Newbold | 2020-07-30 | 2 | -57/+98 |
| | | | | also shuffle around search/coverage test files | ||||
* | small lint fixes | Bryan Newbold | 2020-07-24 | 1 | -1/+0 |
| | |||||
* | finish backend refactoring of search code | Bryan Newbold | 2020-07-24 | 1 | -2/+77 |
| | |||||
* | update web_search tests to mock ES client | Bryan Newbold | 2020-07-24 | 2 | -45/+47 |
| | | | | | | Instead of using 'responses' mock of 'requests' library. Tried using 'elasticmock' helper but it didn't work. | ||||
* | refactor release and container search | Bryan Newbold | 2020-07-24 | 1 | -5/+2 |
| | | | | | | | | | | Based on fatcat-scholar refactoring. This doesn't include refactoring of stats, aggregates, or histograms yet, just the direct queries. Don't have any test coverage yet; intend to try elasticmock or figuring out how to ingest mock JSON results directly. | ||||
* | Merge branch 'bnewbold-more-lint-fixes' into 'master' | Martin Czygan | 2020-07-24 | 5 | -6/+5 |
|\ | | | | | | | | | more lint fixes See merge request webgroup/fatcat!69 | ||||
| * | fix actual typo in tests (caught by lint) | Bryan Newbold | 2020-07-23 | 1 | -2/+2 |
| | | |||||
| * | simple lint (flake8) fixes in tests | Bryan Newbold | 2020-07-23 | 4 | -4/+3 |
| | | | | | | | | | | | | The pytest fixture syntax interacts weirdly with flake8 tests, so ignore the "redefinition" and "unused variable" errors more carefully for .py files under ./tests/ | ||||
* | | make in_kbart transform inclusive of last year | Bryan Newbold | 2020-07-23 | 1 | -0/+46 |
|/ | | | | | | | | | | | | | | | | | Frequently when looking at preservation coverage of journals, the current year shows as "un-preserved" when in fact there is robust KBART (keepers, eg CLOCKSS/Portico) coverage. This is partially because we don't update containers with KBART year spans very frequently (which is on us), and partially because KBART reports are often a bit out of day (eg, doesn't show coverage for the current year. For that matter, they probably take a few months to update the previous year as well, but that is a larger time span to fudge over. This patch means we will count Portico/LOCKSS/etc coverage for "last year" to count as coverage of publications dated "this year". Note that for this to be effective/correct, it is assumed that we will update containers with coverage year spans at least once a year, and that we will re-index all releases at least once a year. | ||||
* | datacite: resolve formatting issues in tests | Martin Czygan | 2020-07-10 | 45 | -150/+54 |
|\ |