aboutsummaryrefslogtreecommitdiffstats
path: root/python/scripts
Commit message (Collapse)AuthorAgeFilesLines
* update oai-pmh ingest request transform scriptBryan Newbold2022-09-281-2/+38
|
* doaj and unpaywall transforms: more domains to skipBryan Newbold2022-07-202-3/+1
|
* row2json script: fix argument typeBryan Newbold2022-07-151-1/+1
|
* row2json script: add flag to enable recrawlingBryan Newbold2022-07-151-1/+8
|
* more sentry config changesBryan Newbold2022-02-253-3/+3
|
* switch from 'raven' to 'sentry-sdk'Bryan Newbold2022-02-243-17/+11
|
* add CDX sha1hex lookup/fetch helper scriptBryan Newbold2021-11-301-0/+170
|
* remove grobid2json helper file, replace with grobid_tei_xmlBryan Newbold2021-10-271-2/+4
|
* make fmt (black 21.9b0)Bryan Newbold2021-10-2718-525/+601
|
* make fmtBryan Newbold2021-10-2619-186/+230
|
* python: isort all importsBryan Newbold2021-10-2619-43/+51
|
* scripts: example archiveorg-to-fileset importerBryan Newbold2021-10-151-0/+138
|
* cdx_collection.py: minor lint issueBryan Newbold2021-10-041-1/+1
|
* another lowercase DOI in an (unused?) scriptBryan Newbold2021-07-131-1/+1
|
* add cdx_collection.py python script (from scratch repo)Bryan Newbold2021-05-041-0/+80
|
* doaj ingest request updates (from prod)Bryan Newbold2021-01-051-1/+5
|
* blacklist -> denylistBryan Newbold2020-11-101-9/+9
|
* DOAJ and HTML ingest tweaks from QA runBryan Newbold2020-11-101-1/+1
|
* basic DOAJ ingest request conversion scriptBryan Newbold2020-11-081-0/+139
|
* poppler: correct RGBA buffer endian-nessBryan Newbold2020-06-251-1/+1
|
* pdf_thumbnail script: demonstrate PDF thumbnail generationBryan Newbold2020-06-161-0/+35
|
* first iteration of oai2ingestrequest scriptBryan Newbold2020-05-051-0/+137
|
* COVID-19 chinese paper ingestBryan Newbold2020-04-151-0/+83
|
* unpaywall2ingestrequest: canonicalize URLBryan Newbold2020-04-071-1/+9
|
* use local env in python scriptsBryan Newbold2020-03-103-3/+3
| | | | | Without this correct/canonical shebang invocation, virtualenvs (pipenv) don't work.
* ingestrequest_row2json: skip on unicode errorsBryan Newbold2020-03-051-1/+4
|
* unpaywall2ingestrequest transform scriptBryan Newbold2020-02-181-0/+103
|
* add ingestrequest_row2json.pyBryan Newbold2020-02-051-0/+48
|
* arabesque2ingestrequest: ingest type flagBryan Newbold2020-01-141-1/+4
|
* basic arabesque2ingestrequest scriptBryan Newbold2019-12-241-0/+69
|
* grobid_affiliations fix from prod, and usage exampleBryan Newbold2019-10-021-0/+5
|
* deliver_dumpgrobid_to_s3: typo fix from old prodBryan Newbold2019-10-021-3/+4
|
* grobid affiliation extractor (script)Bryan Newbold2019-10-021-0/+47
|
* move a bunch of random old scripts to subdirBryan Newbold2019-09-259-0/+1088