summaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* dblp: polish HTML scrape/extract pipelineBryan Newbold2020-12-174-3/+30
* dblp: flesh out update code path (especially to add container_id linkage)Bryan Newbold2020-12-171-2/+6
* dblp: run fuzzy matching at try_update time (same as DOAJ)Bryan Newbold2020-12-171-1/+8
* small dblp proposal updatesBryan Newbold2020-12-171-5/+2
* dblp: script and notes on container metadata generationBryan Newbold2020-12-174-0/+134
* improve dblp release importBryan Newbold2020-12-173-4/+17
* very simple dblp container importerBryan Newbold2020-12-177-7/+256
* dblp release importer: container_id lookup TSV, and dump JSON modeBryan Newbold2020-12-172-13/+73
* commit DBLP proposal progressBryan Newbold2020-12-171-7/+10
* dblp import proposalBryan Newbold2020-12-171-0/+159
* basic test coverage of dblp release importerBryan Newbold2020-12-174-0/+503
* wikidata QID normalize helperBryan Newbold2020-12-171-2/+24
* initial implementation of dblp release importer (in progress)Bryan Newbold2020-12-173-0/+474
* add 'lxml' mode for large XML file import, and multi-tagsBryan Newbold2020-12-173-19/+31
* rust: fix malformed ext id error typeBryan Newbold2020-12-171-2/+2
* rust: rename and improve dblp key (id) syntax checkBryan Newbold2020-12-172-9/+17
* fix sloppy is_preserved ES transfom test failureBryan Newbold2020-12-171-1/+1
* DOAJ import notesBryan Newbold2020-12-172-2/+23
* add dblp as an ingest source and identifierBryan Newbold2020-12-171-1/+2
* ingest: allow doaj ingest responsesBryan Newbold2020-12-171-1/+2
* bug fix: is_preserved should always be boolBryan Newbold2020-12-171-2/+2
* Merge branch 'bnewbold-doaj-fuzzy' into 'master'bnewbold2020-12-187-267/+544
|\
| * update fuzzy helper to pass 'reason' through to import codeBryan Newbold2020-12-172-5/+5
| * pipenv: bump fuzzycat to 0.1.9Bryan Newbold2020-12-172-5/+5
| * add fuzzy match filtering to DOAJ importerBryan Newbold2020-12-162-4/+23
| * add fuzzy matching helper to importer base classBryan Newbold2020-12-163-2/+147
| * pipenv: add fuzzycat dependencyBryan Newbold2020-12-162-261/+374
* | Merge pull request #65 from ibnesayeed/patch-1bnewbold2020-12-171-1/+1
|\ \
| * | Improve status counting efficiencySawood Alam2020-12-171-1/+1
* | | Merge branch 'bnewbold-es-transform-html' into 'master'Martin Czygan2020-12-175-146/+296
|\ \ \ | |_|/ |/| |
| * | entity update worker: treat fileset and webcapture updates like file updatesBryan Newbold2020-12-161-3/+25
| * | fix indentationBryan Newbold2020-12-161-2/+2
| * | have release elasticsearch transform count webcaptures and filesets towards p...Bryan Newbold2020-12-161-26/+57
| * | improve release elasticsearch transform test coverageBryan Newbold2020-12-163-11/+86
| * | small release_to_elasticsearch refactorsBryan Newbold2020-12-161-7/+12
| * | refactor release_to_elasticsearch transformBryan Newbold2020-12-161-131/+148
|/ /
* | html ingest: small fixes to try_update() code pathBryan Newbold2020-12-151-5/+5
* | notes on partial-progress DOAJ release metadata importBryan Newbold2020-12-141-0/+105
* | bulk import notes on ORCIDBryan Newbold2020-12-141-0/+55
* | Revert "gitlab CI: explicitly use xenial tag of image"Bryan Newbold2020-12-111-1/+1
* | Revert "docker xenial base image: include python3.8"Bryan Newbold2020-12-111-6/+1
* | gitlab CI: explicitly use xenial tag of imageBryan Newbold2020-12-111-1/+1
* | docker xenial base image: include python3.8Bryan Newbold2020-12-111-1/+6
* | HACK: squash intermitent failure of detect_text_lang() testBryan Newbold2020-12-111-1/+2
* | guide: small updates to container extra schema notes (from dblp work)Bryan Newbold2020-12-111-2/+7
* | bulk edits: note ORCID updateBryan Newbold2020-12-111-1/+5
* | docker: how to push to dockerhubBryan Newbold2020-12-111-0/+4
* | Merge branch 'bnewbold-doaj-metadata' into 'master'Martin Czygan2020-11-2437-1549/+2845
|\ \
| * | cargo: update sentry to fix memory initialization issueBryan Newbold2020-11-202-274/+332
| * | DOAJ: remove accidentally commited 'skip' of a testBryan Newbold2020-11-201-1/+0