aboutsummaryrefslogtreecommitdiffstats
path: root/python/scripts
Commit message (Expand)AuthorAgeFilesLines
* update oai-pmh ingest request transform scriptBryan Newbold2022-09-281-2/+38
* doaj and unpaywall transforms: more domains to skipBryan Newbold2022-07-202-3/+1
* row2json script: fix argument typeBryan Newbold2022-07-151-1/+1
* row2json script: add flag to enable recrawlingBryan Newbold2022-07-151-1/+8
* more sentry config changesBryan Newbold2022-02-253-3/+3
* switch from 'raven' to 'sentry-sdk'Bryan Newbold2022-02-243-17/+11
* add CDX sha1hex lookup/fetch helper scriptBryan Newbold2021-11-301-0/+170
* remove grobid2json helper file, replace with grobid_tei_xmlBryan Newbold2021-10-271-2/+4
* make fmt (black 21.9b0)Bryan Newbold2021-10-2718-525/+601
* make fmtBryan Newbold2021-10-2619-186/+230
* python: isort all importsBryan Newbold2021-10-2619-43/+51
* scripts: example archiveorg-to-fileset importerBryan Newbold2021-10-151-0/+138
* cdx_collection.py: minor lint issueBryan Newbold2021-10-041-1/+1
* another lowercase DOI in an (unused?) scriptBryan Newbold2021-07-131-1/+1
* add cdx_collection.py python script (from scratch repo)Bryan Newbold2021-05-041-0/+80
* doaj ingest request updates (from prod)Bryan Newbold2021-01-051-1/+5
* blacklist -> denylistBryan Newbold2020-11-101-9/+9
* DOAJ and HTML ingest tweaks from QA runBryan Newbold2020-11-101-1/+1
* basic DOAJ ingest request conversion scriptBryan Newbold2020-11-081-0/+139
* poppler: correct RGBA buffer endian-nessBryan Newbold2020-06-251-1/+1
* pdf_thumbnail script: demonstrate PDF thumbnail generationBryan Newbold2020-06-161-0/+35
* first iteration of oai2ingestrequest scriptBryan Newbold2020-05-051-0/+137
* COVID-19 chinese paper ingestBryan Newbold2020-04-151-0/+83
* unpaywall2ingestrequest: canonicalize URLBryan Newbold2020-04-071-1/+9
* use local env in python scriptsBryan Newbold2020-03-103-3/+3
* ingestrequest_row2json: skip on unicode errorsBryan Newbold2020-03-051-1/+4
* unpaywall2ingestrequest transform scriptBryan Newbold2020-02-181-0/+103
* add ingestrequest_row2json.pyBryan Newbold2020-02-051-0/+48
* arabesque2ingestrequest: ingest type flagBryan Newbold2020-01-141-1/+4
* basic arabesque2ingestrequest scriptBryan Newbold2019-12-241-0/+69
* grobid_affiliations fix from prod, and usage exampleBryan Newbold2019-10-021-0/+5
* deliver_dumpgrobid_to_s3: typo fix from old prodBryan Newbold2019-10-021-3/+4
* grobid affiliation extractor (script)Bryan Newbold2019-10-021-0/+47
* move a bunch of random old scripts to subdirBryan Newbold2019-09-259-0/+1088