| Commit message (Expand) | Author | Age | Files | Lines |
* | datacite: more careful title string access; fixes sentry #88350 | Martin Czygan | 2021-06-11 | 1 | -1/+1 |
* | clean_doi() should lower-case returned DOI | Bryan Newbold | 2021-06-07 | 1 | -1/+4 |
* | ingest: swap ingest and file checks, to result in clearer stats/counts of ski... | Bryan Newbold | 2021-06-03 | 1 | -2/+2 |
* | ingest: don't accept mag and s2 URLs | Bryan Newbold | 2021-06-03 | 1 | -4/+4 |
* | changelog worker: fix file/fileset typo, caught by lint | Bryan Newbold | 2021-05-25 | 1 | -1/+1 |
* | small python lint fixes (no behavior change) | Bryan Newbold | 2021-05-25 | 3 | -4/+2 |
* | ingest: add per-container ingest type overrides | Bryan Newbold | 2021-05-21 | 1 | -1/+17 |
* | arabesque importer: ensure full 14-digit timestamps | Bryan Newbold | 2021-05-21 | 1 | -1/+3 |
* | transforms: fix 'display_ame' typo | Bryan Newbold | 2021-04-19 | 1 | -2/+2 |
* | prefer contrib.creator.display_name over contrib.raw_name | Bryan Newbold | 2021-04-12 | 2 | -4/+7 |
* | es worker: ensure kafka messages get cleared | Bryan Newbold | 2021-04-12 | 1 | -0/+2 |
* | es indexing: more 'wip' fixes | Bryan Newbold | 2021-04-12 | 1 | -1/+5 |
* | ES indexing: skip 'wip' entities with a warning | Bryan Newbold | 2021-04-12 | 1 | -11/+16 |
* | container ES index worker: support for querying status | Bryan Newbold | 2021-04-06 | 1 | -5/+32 |
* | ES schema updates: doc_index_ts as a str, not datetime | Bryan Newbold | 2021-04-06 | 1 | -4/+4 |
* | container search schema: preservation stats, new fields | Bryan Newbold | 2021-04-06 | 1 | -2/+18 |
* | release ES: add discipline field | Bryan Newbold | 2021-04-06 | 1 | -0/+2 |
* | ES schemas: add doc_index_ts to all mappings | Bryan Newbold | 2021-04-06 | 1 | -0/+4 |
* | indexing: don't use document names | Bryan Newbold | 2021-04-06 | 1 | -14/+4 |
* | datacite: a missing surname should be None, not the empty string | Martin Czygan | 2021-04-02 | 1 | -2/+1 |
* | elasticsearch: simple new dblp and doaj fields | Bryan Newbold | 2021-01-20 | 1 | -0/+4 |
* | web ingest: terminal URL mismatch as skip, not assert | Bryan Newbold | 2020-12-30 | 1 | -1/+3 |
* | dblp release import: skip arxiv_id releases | Bryan Newbold | 2020-12-24 | 1 | -0/+9 |
* | normalizer: test for un-versioned arxiv_id | Bryan Newbold | 2020-12-24 | 1 | -0/+4 |
* | dblp import: fix arxiv_id typo | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | ingest: allow dblp imports | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | fuzzy: set 120 second timeout on ES lookups | Bryan Newbold | 2020-12-23 | 1 | -1/+1 |
* | dblp: polish HTML scrape/extract pipeline | Bryan Newbold | 2020-12-17 | 1 | -0/+14 |
* | dblp: flesh out update code path (especially to add container_id linkage) | Bryan Newbold | 2020-12-17 | 1 | -2/+6 |
* | dblp: run fuzzy matching at try_update time (same as DOAJ) | Bryan Newbold | 2020-12-17 | 1 | -1/+8 |
* | improve dblp release import | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | very simple dblp container importer | Bryan Newbold | 2020-12-17 | 2 | -0/+145 |
* | dblp release importer: container_id lookup TSV, and dump JSON mode | Bryan Newbold | 2020-12-17 | 1 | -10/+66 |
* | wikidata QID normalize helper | Bryan Newbold | 2020-12-17 | 1 | -2/+24 |
* | initial implementation of dblp release importer (in progress) | Bryan Newbold | 2020-12-17 | 2 | -0/+445 |
* | add 'lxml' mode for large XML file import, and multi-tags | Bryan Newbold | 2020-12-17 | 1 | -15/+28 |
* | add dblp as an ingest source and identifier | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | ingest: allow doaj ingest responses | Bryan Newbold | 2020-12-17 | 1 | -1/+2 |
* | bug fix: is_preserved should always be bool | Bryan Newbold | 2020-12-17 | 1 | -2/+2 |
* | Merge branch 'bnewbold-doaj-fuzzy' into 'master' | bnewbold | 2020-12-18 | 2 | -4/+71 |
|\ |
|
| * | update fuzzy helper to pass 'reason' through to import code | Bryan Newbold | 2020-12-17 | 1 | -3/+3 |
| * | add fuzzy match filtering to DOAJ importer | Bryan Newbold | 2020-12-16 | 1 | -2/+9 |
| * | add fuzzy matching helper to importer base class | Bryan Newbold | 2020-12-16 | 1 | -2/+62 |
* | | entity update worker: treat fileset and webcapture updates like file updates | Bryan Newbold | 2020-12-16 | 1 | -3/+25 |
* | | fix indentation | Bryan Newbold | 2020-12-16 | 1 | -2/+2 |
* | | have release elasticsearch transform count webcaptures and filesets towards p... | Bryan Newbold | 2020-12-16 | 1 | -26/+57 |
* | | small release_to_elasticsearch refactors | Bryan Newbold | 2020-12-16 | 1 | -7/+12 |
* | | refactor release_to_elasticsearch transform | Bryan Newbold | 2020-12-16 | 1 | -131/+148 |
|/ |
|
* | html ingest: small fixes to try_update() code path | Bryan Newbold | 2020-12-15 | 1 | -5/+5 |
* | HACK: squash intermitent failure of detect_text_lang() test | Bryan Newbold | 2020-12-11 | 1 | -1/+2 |