index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
scripts
Commit message (
Collapse
)
Author
Age
Files
Lines
*
update oai-pmh ingest request transform script
Bryan Newbold
2022-09-28
1
-2
/
+38
|
*
doaj and unpaywall transforms: more domains to skip
Bryan Newbold
2022-07-20
2
-3
/
+1
|
*
row2json script: fix argument type
Bryan Newbold
2022-07-15
1
-1
/
+1
|
*
row2json script: add flag to enable recrawling
Bryan Newbold
2022-07-15
1
-1
/
+8
|
*
more sentry config changes
Bryan Newbold
2022-02-25
3
-3
/
+3
|
*
switch from 'raven' to 'sentry-sdk'
Bryan Newbold
2022-02-24
3
-17
/
+11
|
*
add CDX sha1hex lookup/fetch helper script
Bryan Newbold
2021-11-30
1
-0
/
+170
|
*
remove grobid2json helper file, replace with grobid_tei_xml
Bryan Newbold
2021-10-27
1
-2
/
+4
|
*
make fmt (black 21.9b0)
Bryan Newbold
2021-10-27
18
-525
/
+601
|
*
make fmt
Bryan Newbold
2021-10-26
19
-186
/
+230
|
*
python: isort all imports
Bryan Newbold
2021-10-26
19
-43
/
+51
|
*
scripts: example archiveorg-to-fileset importer
Bryan Newbold
2021-10-15
1
-0
/
+138
|
*
cdx_collection.py: minor lint issue
Bryan Newbold
2021-10-04
1
-1
/
+1
|
*
another lowercase DOI in an (unused?) script
Bryan Newbold
2021-07-13
1
-1
/
+1
|
*
add cdx_collection.py python script (from scratch repo)
Bryan Newbold
2021-05-04
1
-0
/
+80
|
*
doaj ingest request updates (from prod)
Bryan Newbold
2021-01-05
1
-1
/
+5
|
*
blacklist -> denylist
Bryan Newbold
2020-11-10
1
-9
/
+9
|
*
DOAJ and HTML ingest tweaks from QA run
Bryan Newbold
2020-11-10
1
-1
/
+1
|
*
basic DOAJ ingest request conversion script
Bryan Newbold
2020-11-08
1
-0
/
+139
|
*
poppler: correct RGBA buffer endian-ness
Bryan Newbold
2020-06-25
1
-1
/
+1
|
*
pdf_thumbnail script: demonstrate PDF thumbnail generation
Bryan Newbold
2020-06-16
1
-0
/
+35
|
*
first iteration of oai2ingestrequest script
Bryan Newbold
2020-05-05
1
-0
/
+137
|
*
COVID-19 chinese paper ingest
Bryan Newbold
2020-04-15
1
-0
/
+83
|
*
unpaywall2ingestrequest: canonicalize URL
Bryan Newbold
2020-04-07
1
-1
/
+9
|
*
use local env in python scripts
Bryan Newbold
2020-03-10
3
-3
/
+3
|
|
|
|
|
Without this correct/canonical shebang invocation, virtualenvs (pipenv) don't work.
*
ingestrequest_row2json: skip on unicode errors
Bryan Newbold
2020-03-05
1
-1
/
+4
|
*
unpaywall2ingestrequest transform script
Bryan Newbold
2020-02-18
1
-0
/
+103
|
*
add ingestrequest_row2json.py
Bryan Newbold
2020-02-05
1
-0
/
+48
|
*
arabesque2ingestrequest: ingest type flag
Bryan Newbold
2020-01-14
1
-1
/
+4
|
*
basic arabesque2ingestrequest script
Bryan Newbold
2019-12-24
1
-0
/
+69
|
*
grobid_affiliations fix from prod, and usage example
Bryan Newbold
2019-10-02
1
-0
/
+5
|
*
deliver_dumpgrobid_to_s3: typo fix from old prod
Bryan Newbold
2019-10-02
1
-3
/
+4
|
*
grobid affiliation extractor (script)
Bryan Newbold
2019-10-02
1
-0
/
+47
|
*
move a bunch of random old scripts to subdir
Bryan Newbold
2019-09-25
9
-0
/
+1088