| Commit message (Expand) | Author | Age | Files | Lines |
... | |
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 3 | -19/+31 |
* | fix sloppy is_preserved ES transfom test failure | Bryan Newbold | 2020-12-17 | 1 | -1/+1 |
* | add dblp as an ingest source and identifier | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | ingest: allow doaj ingest responses | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | bug fix: is_preserved should always be bool | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 7 | -267/+544 |
|\ |
|
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 2 | -5/+5 |
| * | pipenv: bump fuzzycat to 0.1.9 | Bryan Newbold | 2020-12-17 | 2 | -5/+5 |
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 2 | -4/+23 |
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 3 | -2/+147 |
| * | pipenv: add fuzzycat dependency | Bryan Newbold | 2020-12-16 | 2 | -261/+374 |
* | | entity update worker: treat fileset and webcapture updates like file updates | Bryan Newbold | 2020-12-16 | 1 | -3/+25 |
* | | fix indentation | Bryan Newbold | 2020-12-16 | 1 | -2/+2 |
* | | have release elasticsearch transform count webcaptures and filesets towards p... | Bryan Newbold | 2020-12-16 | 1 | -26/+57 |
* | | improve release elasticsearch transform test coverage | Bryan Newbold | 2020-12-16 | 3 | -11/+86 |
* | | small release_to_elasticsearch refactors | Bryan Newbold | 2020-12-16 | 1 | -7/+12 |
* | | refactor release_to_elasticsearch transform | Bryan Newbold | 2020-12-16 | 1 | -131/+148 |
|/ |
|
* | html ingest: small fixes to try_update() code path | Bryan Newbold | 2020-12-15 | 1 | -5/+5 |
* | HACK: squash intermitent failure of detect_text_lang() test | Bryan Newbold | 2020-12-11 | 1 | -1/+2 |
* | DOAJ: remove accidentally commited 'skip' of a test | Bryan Newbold | 2020-11-20 | 1 | -1/+0 |
* | langdetect: more text for 'zh' test case | Bryan Newbold | 2020-11-20 | 1 | -1/+1 |
* | DOAJ: update importer README with example invocation | Bryan Newbold | 2020-11-20 | 1 | -0/+7 |
* | crossref+datacite: remove confusing early update bail | Bryan Newbold | 2020-11-20 | 2 | -4/+0 |
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 3 | -15/+70 |
* | DOAJ: handle empty identifier 'id' case | Bryan Newbold | 2020-11-20 | 1 | -0/+2 |
* | clean DOI: ban all non-ASCII characters | Bryan Newbold | 2020-11-19 | 1 | -1/+4 |
* | normal: handle langdetect of 'zh-cn' (not len=2) | Bryan Newbold | 2020-11-19 | 1 | -0/+3 |
* | tweak DOAJ importer class args and default for do_updates | Bryan Newbold | 2020-11-19 | 1 | -2/+2 |
* | show DOAJ (and dblp) identifiers in release view | Bryan Newbold | 2020-11-19 | 1 | -1/+7 |
* | if a release has DOAJ article id, count as OA | Bryan Newbold | 2020-11-19 | 1 | -0/+3 |
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 3 | -68/+168 |
* | handle more non-ASCII DOI cases | Bryan Newbold | 2020-11-19 | 1 | -1/+3 |
* | more python normalizers, and move from importer common | Bryan Newbold | 2020-11-19 | 2 | -154/+326 |
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 4 | -0/+387 |
* | html ingest: actual xhtml mimetype | Bryan Newbold | 2020-11-16 | 1 | -2/+2 |
* | ingest tool: support for setting ingest type | Bryan Newbold | 2020-11-06 | 2 | -6/+10 |
* | html ingest: remaining implementation | Bryan Newbold | 2020-11-06 | 1 | -22/+19 |
* | ingest: fix XML ingest test file | Bryan Newbold | 2020-11-05 | 1 | -1/+1 |
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 3 | -16/+74 |
* | ingest: initial 'web' worker implementation | Bryan Newbold | 2020-11-05 | 3 | -67/+301 |
* | refactor: white/black -> allow/block | Bryan Newbold | 2020-11-05 | 1 | -4/+4 |
* | ingest: whitelist -> allowlist | Bryan Newbold | 2020-11-05 | 2 | -6/+6 |
* | ingest: tests for basic XML ingest | Bryan Newbold | 2020-11-05 | 2 | -0/+18 |
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 3 | -4/+36 |
* | normalizer: filter out a specific non-ASCII character in DOI | Bryan Newbold | 2020-11-04 | 1 | -1/+3 |
* | entity updates: don't ingest JSTOR DOI prefixes | Bryan Newbold | 2020-10-23 | 1 | -0/+2 |
* | entity updater: new work update feed (ident and changelog metadata only) | Bryan Newbold | 2020-10-16 | 2 | -2/+26 |
* | container coverage: add keeper link and KBART holdings list | Bryan Newbold | 2020-10-13 | 1 | -0/+11 |
* | release view: remove abiguous OA status indicator | Bryan Newbold | 2020-10-13 | 1 | -4/+0 |
* | container view: fix non-OA empty box | Bryan Newbold | 2020-10-13 | 1 | -3/+3 |