| Commit message (Expand) | Author | Age | Files | Lines |
* | HACK: squash intermitent failure of detect_text_lang() test | Bryan Newbold | 2020-12-11 | 1 | -1/+2 |
* | langdetect: more text for 'zh' test case | Bryan Newbold | 2020-11-20 | 1 | -1/+1 |
* | crossref+datacite: remove confusing early update bail | Bryan Newbold | 2020-11-20 | 2 | -4/+0 |
* | doaj: fix update code path (getattr not __dict__) | Bryan Newbold | 2020-11-20 | 1 | -4/+3 |
* | DOAJ: handle empty identifier 'id' case | Bryan Newbold | 2020-11-20 | 1 | -0/+2 |
* | clean DOI: ban all non-ASCII characters | Bryan Newbold | 2020-11-19 | 1 | -1/+4 |
* | normal: handle langdetect of 'zh-cn' (not len=2) | Bryan Newbold | 2020-11-19 | 1 | -0/+3 |
* | tweak DOAJ importer class args and default for do_updates | Bryan Newbold | 2020-11-19 | 1 | -2/+2 |
* | if a release has DOAJ article id, count as OA | Bryan Newbold | 2020-11-19 | 1 | -0/+3 |
* | implement remainder of DOAJ article importer | Bryan Newbold | 2020-11-19 | 1 | -57/+125 |
* | handle more non-ASCII DOI cases | Bryan Newbold | 2020-11-19 | 1 | -1/+3 |
* | more python normalizers, and move from importer common | Bryan Newbold | 2020-11-19 | 2 | -154/+326 |
* | initial implementation of DOAJ importer | Bryan Newbold | 2020-11-19 | 2 | -0/+290 |
* | html ingest: actual xhtml mimetype | Bryan Newbold | 2020-11-16 | 1 | -2/+2 |
* | ingest tool: support for setting ingest type | Bryan Newbold | 2020-11-06 | 1 | -6/+6 |
* | html ingest: remaining implementation | Bryan Newbold | 2020-11-06 | 1 | -22/+19 |
* | ingest: progress on HTML ingest | Bryan Newbold | 2020-11-05 | 1 | -14/+30 |
* | ingest: initial 'web' worker implementation | Bryan Newbold | 2020-11-05 | 2 | -67/+259 |
* | refactor: white/black -> allow/block | Bryan Newbold | 2020-11-05 | 1 | -4/+4 |
* | ingest: whitelist -> allowlist | Bryan Newbold | 2020-11-05 | 1 | -3/+3 |
* | ingest: basic checks for ingest_type | Bryan Newbold | 2020-11-05 | 1 | -3/+29 |
* | normalizer: filter out a specific non-ASCII character in DOI | Bryan Newbold | 2020-11-04 | 1 | -1/+3 |
* | entity updates: don't ingest JSTOR DOI prefixes | Bryan Newbold | 2020-10-23 | 1 | -0/+2 |
* | entity updater: new work update feed (ident and changelog metadata only) | Bryan Newbold | 2020-10-16 | 1 | -2/+24 |
* | chocula importer: small tweaks to update behavior | Bryan Newbold | 2020-10-08 | 1 | -8/+6 |
* | elastic transform: more preservation keepers | Bryan Newbold | 2020-10-08 | 1 | -1/+2 |
* | address spammy datacite titles | Martin Czygan | 2020-09-23 | 1 | -0/+19 |
* | ingest: default to crawl protocols.io DOIs | Bryan Newbold | 2020-09-10 | 1 | -0/+2 |
* | datacite: handle case of empty-string version | Bryan Newbold | 2020-09-10 | 1 | -1/+1 |
* | remove spurious print statement | Bryan Newbold | 2020-09-03 | 1 | -1/+0 |
* | generic file entity clean-ups as part of file_meta importer | Bryan Newbold | 2020-09-02 | 2 | -0/+50 |
* | fix comment typo (thanks martin) | Bryan Newbold | 2020-08-27 | 1 | -1/+1 |
* | fixes and test coverage for file_meta importer | Bryan Newbold | 2020-08-21 | 1 | -5/+10 |
* | initial implementation of file_meta importer | Bryan Newbold | 2020-08-21 | 2 | -0/+71 |
* | entity updater: handle doi=None case better | Bryan Newbold | 2020-08-14 | 1 | -1/+1 |
* | entity updater: es['publisher_type'] not always set | Bryan Newbold | 2020-08-14 | 1 | -1/+1 |
* | Merge branch 'bnewbold-ingest-improvements' into 'master' | Martin Czygan | 2020-08-13 | 2 | -33/+114 |
|\ |
|
| * | entity update: change big5 ingest behavior | Bryan Newbold | 2020-08-11 | 1 | -9/+15 |
| * | entity update: default to ingest non-OA works | Bryan Newbold | 2020-08-11 | 1 | -9/+10 |
| * | entity update: skip ingest of figshare+zenodo 'group' DOIs | Bryan Newbold | 2020-08-11 | 1 | -0/+15 |
| * | datacite import: figshare-specific hacks | Bryan Newbold | 2020-08-11 | 1 | -3/+3 |
| * | datacite import: refactor release_type detection into static method | Bryan Newbold | 2020-08-11 | 1 | -14/+51 |
| * | datacite import: refactor publisher-specific hacks into static method | Bryan Newbold | 2020-08-11 | 1 | -15/+29 |
| * | update crawl blocklist for SPNv2 requests which mostly fail | Bryan Newbold | 2020-08-10 | 1 | -2/+10 |
* | | harvest: datacite API yields HTTP 200 with broken JSON | Martin Czygan | 2020-08-10 | 1 | -1/+8 |
|/ |
|
* | release ES transform tweaks | Bryan Newbold | 2020-08-07 | 1 | -3/+5 |
* | chocula import update tweaks | Bryan Newbold | 2020-08-04 | 1 | -10/+14 |
* | more update keys and cases for chocula importer | Bryan Newbold | 2020-08-04 | 1 | -5/+11 |
* | fix key name mismatch in chocula importer | Bryan Newbold | 2020-08-04 | 1 | -1/+1 |
* | basic toml transform helper | Bryan Newbold | 2020-07-30 | 2 | -4/+20 |