aboutsummaryrefslogtreecommitdiffstats
Commit message (Expand)AuthorAgeFilesLines
...
* start handling trivial lint cleanups: unused imports, 'is None', etcBryan Newbold2021-10-2630-149/+86
* make fmtBryan Newbold2021-10-2659-1225/+1582
* tweak lint/fmt settingsBryan Newbold2021-10-262-4/+6
* update pytest warning filters (they are pretty expansive)Bryan Newbold2021-10-261-0/+3
* ingest_html: update trafilatura TEI-XML output kwargBryan Newbold2021-10-261-1/+1
* python: isort all importsBryan Newbold2021-10-2657-178/+207
* add pyproject.toml (for isort and yapf config), and update 'lint' and 'fmt' m...Bryan Newbold2021-10-262-3/+13
* pipenv: general update; add isort, yapf (over black), grobid_tei_xmlBryan Newbold2021-10-262-730/+880
* kafka monitoring commandsBryan Newbold2021-10-261-0/+4
* more small fileset ingest tweaksBryan Newbold2021-10-262-6/+21
* commit SPN account changesBryan Newbold2021-10-151-0/+14
* commit old ingest domain summaryBryan Newbold2021-10-151-0/+345
* python: more aggressive gitignoreBryan Newbold2021-10-151-0/+3
* persist support for ingest platform table, using existing persist workerBryan Newbold2021-10-153-4/+131
* sql fileset ingest table iterationBryan Newbold2021-10-151-12/+11
* document passing back platform_base_urlBryan Newbold2021-10-151-0/+1
* improve fileset ingest integration with file ingestBryan Newbold2021-10-154-5/+25
* more fileset iterationBryan Newbold2021-10-155-45/+81
* move SPNv2 'simple_get' logic to SPN clientBryan Newbold2021-10-153-52/+31
* filesets: iteration of implementation and docsBryan Newbold2021-10-155-96/+167
* updates to fileset ingest proposalBryan Newbold2021-10-152-239/+337
* fileset ingest notesBryan Newbold2021-10-151-3/+23
* fileset ingest: improve platform parsingBryan Newbold2021-10-151-12/+196
* fileset ingest: improve error handlingBryan Newbold2021-10-154-48/+106
* initial implementation of zenodo platform importBryan Newbold2021-10-151-0/+100
* initial figshare platform helperBryan Newbold2021-10-151-0/+95
* improvements to platform helpersBryan Newbold2021-10-153-34/+44
* component ingest support for dataverse files (individual)Bryan Newbold2021-10-152-13/+31
* progress on web ingest strategyBryan Newbold2021-10-153-12/+121
* fileset ingest progress for dataverseBryan Newbold2021-10-154-23/+291
* local-file version of gen_file_metadataBryan Newbold2021-10-153-3/+56
* progress on dataset ingestBryan Newbold2021-10-154-122/+333
* dataset ingest: start enumerating examplesBryan Newbold2021-10-151-0/+34
* ingest tool: always require ingest type as part of 'single' commandBryan Newbold2021-10-151-3/+3
* wrap up previous renaming workBryan Newbold2021-10-154-6/+4
* progress on fileset/dataset ingestBryan Newbold2021-10-154-0/+403
* scripts: example archiveorg-to-fileset importerBryan Newbold2021-10-151-0/+138
* initial dataset/fileset ingest proposalBryan Newbold2021-10-151-0/+185
* sql: initial ingest fileset tableBryan Newbold2021-10-151-0/+38
* sql: fix typo in CHECK statementBryan Newbold2021-10-151-1/+1
* refactoring; progress on filesetsBryan Newbold2021-10-153-9/+27
* rename some python files for clarityBryan Newbold2021-10-153-0/+0
* pdf ingest: journals.uchicago.edu patternBryan Newbold2021-10-111-0/+8
* spn: avoid 'None' job_idBryan Newbold2021-10-111-2/+2
* Merge branch 'bnewbold-backfill' into 'master'bnewbold2021-10-043-0/+384
|\
| * temporary please option for scala backfillBryan Newbold2018-07-241-0/+22
| * small CdxBackfillJob refactor (code quality)Bryan Newbold2018-07-241-5/+5
| * do sha1 pattern match correctlyBryan Newbold2018-07-242-3/+18
| * more PDF mimetypes; fix return refactorBryan Newbold2018-07-241-2/+5
| * CdxBackfillJob: comment cleanupBryan Newbold2018-07-241-6/+0