aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
* gitlab CI: explicitly use xenial tag of imageBryan Newbold2020-12-111-1/+1
* docker xenial base image: include python3.8Bryan Newbold2020-12-111-1/+6
* HACK: squash intermitent failure of detect_text_lang() testBryan Newbold2020-12-111-1/+2
* guide: small updates to container extra schema notes (from dblp work)Bryan Newbold2020-12-111-2/+7
* bulk edits: note ORCID updateBryan Newbold2020-12-111-1/+5
* docker: how to push to dockerhubBryan Newbold2020-12-111-0/+4
* Merge branch 'bnewbold-doaj-metadata' into 'master'Martin Czygan2020-11-2437-1549/+2845
|\
| * cargo: update sentry to fix memory initialization issueBryan Newbold2020-11-202-274/+332
| * DOAJ: remove accidentally commited 'skip' of a testBryan Newbold2020-11-201-1/+0
| * langdetect: more text for 'zh' test caseBryan Newbold2020-11-201-1/+1
| * DOAJ: update importer README with example invocationBryan Newbold2020-11-201-0/+7
| * crossref+datacite: remove confusing early update bailBryan Newbold2020-11-202-4/+0
| * doaj: fix update code path (getattr not __dict__)Bryan Newbold2020-11-203-15/+70
| * DOAJ: handle empty identifier 'id' caseBryan Newbold2020-11-201-0/+2
| * clean DOI: ban all non-ASCII charactersBryan Newbold2020-11-191-1/+4
| * normal: handle langdetect of 'zh-cn' (not len=2)Bryan Newbold2020-11-191-0/+3
| * update fatcatd rust code for 'oai' external identifierBryan Newbold2020-11-194-11/+189
| * codegen rust schema crateBryan Newbold2020-11-196-3/+20
| * codegen python openapi clientBryan Newbold2020-11-192-4/+36
| * schema: also add 'oai' identifer (OAI-PMH) for releasesBryan Newbold2020-11-192-2/+9
| * tweak DOAJ importer class args and default for do_updatesBryan Newbold2020-11-191-2/+2
| * show DOAJ (and dblp) identifiers in release viewBryan Newbold2020-11-191-1/+7
| * if a release has DOAJ article id, count as OABryan Newbold2020-11-191-0/+3
| * implement remainder of DOAJ article importerBryan Newbold2020-11-193-68/+168
| * handle more non-ASCII DOI casesBryan Newbold2020-11-191-1/+3
| * more python normalizers, and move from importer commonBryan Newbold2020-11-192-154/+326
| * initial implementation of DOAJ importerBryan Newbold2020-11-194-0/+387
| * python API client: resolve warning about '\d' in stringBryan Newbold2020-11-191-2/+2
| * rustfmtBryan Newbold2020-11-195-87/+138
| * rust: fatcatd changes for DOAJ+dblp identifiersBryan Newbold2020-11-196-949/+1062
| * codegen rust crate for v0.3.3Bryan Newbold2020-11-198-227/+244
| * codegen python client library for v0.3.3Bryan Newbold2020-11-197-16/+80
| * schema: DOAJ+dblp ext_ids; bump to v0.3.3Bryan Newbold2020-11-192-1/+25
|/
* ingest and proposal updatesBryan Newbold2020-11-192-0/+45
* Merge branch 'bnewbold-xml-html-ingest' into 'master'Martin Czygan2020-11-1910-66/+409
|\
| * html ingest: actual xhtml mimetypeBryan Newbold2020-11-161-2/+2
| * ingest tool: support for setting ingest typeBryan Newbold2020-11-062-6/+10
| * html ingest: remaining implementationBryan Newbold2020-11-061-22/+19
| * ingest: fix XML ingest test fileBryan Newbold2020-11-051-1/+1
| * ingest: progress on HTML ingestBryan Newbold2020-11-053-16/+74
| * ingest: initial 'web' worker implementationBryan Newbold2020-11-053-67/+301
| * refactor: white/black -> allow/blockBryan Newbold2020-11-051-4/+4
| * ingest: whitelist -> allowlistBryan Newbold2020-11-052-6/+6
| * ingest: tests for basic XML ingestBryan Newbold2020-11-052-0/+18
| * ingest: basic checks for ingest_typeBryan Newbold2020-11-053-4/+36
|/
* normalizer: filter out a specific non-ASCII character in DOIBryan Newbold2020-11-041-1/+3
* entity updates: don't ingest JSTOR DOI prefixesBryan Newbold2020-10-231-0/+2
* Merge branch 'bnewbold-scholar-pipeline' into 'master'bnewbold2020-10-202-2/+26
|\
| * entity updater: new work update feed (ident and changelog metadata only)Bryan Newbold2020-10-162-2/+26
|/
* bulk citation graph workflow proposalBryan Newbold2020-10-151-0/+160