| Commit message (Expand) | Author | Age | Files | Lines |
* | web ingest: terminal URL mismatch as skip, not assert | Bryan Newbold | 2020-12-30 | 1 | -1/+3 |
* | dblp release import: skip arxiv_id releases | Bryan Newbold | 2020-12-24 | 1 | -0/+9 |
* | dblp import: fix arxiv_id typo | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | ingest: allow dblp imports | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | fuzzy: set 120 second timeout on ES lookups | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | dblp: polish HTML scrape/extract pipeline | Bryan Newbold | 2020-12-17 | 1 | -0/+14 |
* | dblp: flesh out update code path (especially to add container_id linkage) | Bryan Newbold | 2020-12-17 | 1 | -2/+6 |
* | dblp: run fuzzy matching at try_update time (same as DOAJ) | Bryan Newbold | 2020-12-17 | 1 | -1/+8 |
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 2 | -0/+145 |
* | dblp release importer: container_id lookup TSV, and dump JSON mode | Bryan Newbold | 2020-12-17 | 1 | -10/+66 |
* | initial implementation of dblp release importer (in progress) | Bryan Newbold | 2020-12-17 | 2 | -0/+445 |
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -15/+28 |
* | add dblp as an ingest source and identifier | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | ingest: allow doaj ingest responses | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -3/+3 |
* | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+9 |
* | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 1 | -2/+62 |
* | html ingest: small fixes to try_update() code path | Bryan Newbold | 2020-12-15 | 1 | -5/+5 |
* | crossref+datacite: remove confusing early update bail | Bryan Newbold | 2020-11-20 | 2 | -4/+0 |
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 1 | -4/+3 |
* | DOAJ: handle empty identifier 'id' case | Bryan Newbold | 2020-11-20 | 1 | -0/+2 |
* | tweak DOAJ importer class args and default for do_updates | Bryan Newbold | 2020-11-19 | 1 | -2/+2 |
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -57/+125 |
* | more python normalizers, and move from importer common | Bryan Newbold | 2020-11-19 | 1 | -154/+4 |
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 2 | -0/+290 |
* | html ingest: actual xhtml mimetype | Bryan Newbold | 2020-11-16 | 1 | -2/+2 |
* | html ingest: remaining implementation | Bryan Newbold | 2020-11-06 | 1 | -22/+19 |
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 1 | -14/+30 |
* | ingest: initial 'web' worker implementation | Bryan Newbold | 2020-11-05 | 2 | -67/+259 |
* | refactor: white/black -> allow/block | Bryan Newbold | 2020-11-05 | 1 | -4/+4 |
* | ingest: whitelist -> allowlist | Bryan Newbold | 2020-11-05 | 1 | -3/+3 |
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 1 | -3/+29 |
* | chocula importer: small tweaks to update behavior | Bryan Newbold | 2020-10-08 | 1 | -8/+6 |
* | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+19 |
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 1 | -1/+1 |
* | remove spurious print statement | Bryan Newbold | 2020-09-03 | 1 | -1/+0 |
* | generic file entity clean-ups as part of file_meta importer | Bryan Newbold | 2020-09-02 | 2 | -0/+50 |
* | fix comment typo (thanks martin) | Bryan Newbold | 2020-08-27 | 1 | -1/+1 |
* | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 1 | -5/+10 |
* | initial implementation of file_meta importer | Bryan Newbold | 2020-08-21 | 2 | -0/+71 |
* | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -3/+3 |
* | datacite import: refactor release_type detection into static method | Bryan Newbold | 2020-08-11 | 1 | -14/+51 |
* | datacite import: refactor publisher-specific hacks into static method | Bryan Newbold | 2020-08-11 | 1 | -15/+29 |
* | chocula import update tweaks | Bryan Newbold | 2020-08-04 | 1 | -10/+14 |
* | more update keys and cases for chocula importer | Bryan Newbold | 2020-08-04 | 1 | -5/+11 |
* | fix key name mismatch in chocula importer | Bryan Newbold | 2020-08-04 | 1 | -1/+1 |
* | fix issnl typo in pubmed | Bryan Newbold | 2020-07-23 | 1 | -1/+1 |
* | remove isascii() work around definition in importers/datacite.py | Bryan Newbold | 2020-07-23 | 1 | -7/+1 |
* | simple lint (flake8) fixes over python codebase | Bryan Newbold | 2020-07-23 | 5 | -17/+16 |