index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
sandcrawler
Commit message (
Expand
)
Author
Age
Files
Lines
...
*
IA (wayback): actually use an HTTP session for replay fetches
Bryan Newbold
2021-11-03
1
-2
/
+3
*
remove grobid2json helper file, replace with grobid_tei_xml
Bryan Newbold
2021-10-27
2
-4
/
+5
*
small type annotation things from additional packages
Bryan Newbold
2021-10-27
2
-5
/
+14
*
make fmt (black 21.9b0)
Bryan Newbold
2021-10-27
18
-1840
/
+2332
*
fileset: refactor out tables of helpers
Bryan Newbold
2021-10-27
3
-21
/
+19
*
fix type annotations for petabox body fetch helper
Bryan Newbold
2021-10-26
5
-8
/
+11
*
small type annotation hack
Bryan Newbold
2021-10-26
1
-1
/
+1
*
fileset: fix field renaming bug (caught by mypy)
Bryan Newbold
2021-10-26
1
-2
/
+2
*
fileset ingest: fix table name typo (via mypy)
Bryan Newbold
2021-10-26
1
-1
/
+1
*
update 'XXX' notes from fileset ingest development
Bryan Newbold
2021-10-26
2
-9
/
+6
*
bugfix: setting html_biblio on ingest results
Bryan Newbold
2021-10-26
2
-2
/
+2
*
lint collection membership (last lint for now)
Bryan Newbold
2021-10-26
7
-32
/
+32
*
ingest fileset: fix silly import typo
Bryan Newbold
2021-10-26
1
-1
/
+1
*
type annotations for persist workers; required some work
Bryan Newbold
2021-10-26
1
-66
/
+59
*
ingest file HTTP API: fixes from type checking
Bryan Newbold
2021-10-26
1
-3
/
+3
*
more progress on type annotations
Bryan Newbold
2021-10-26
8
-34
/
+55
*
grobid: fix a bug with consolidate_mode header, exposed by type annotations
Bryan Newbold
2021-10-26
1
-1
/
+2
*
grobid: type annotations
Bryan Newbold
2021-10-26
1
-9
/
+19
*
type annotations on SandcrawlerWorker
Bryan Newbold
2021-10-26
1
-46
/
+57
*
more progress on type annotations and linting
Bryan Newbold
2021-10-26
8
-49
/
+80
*
ia: more tweaks to delicate code to satisfy type checker
Bryan Newbold
2021-10-26
1
-10
/
+12
*
ia helpers: enforce max_redirects count correctly
Bryan Newbold
2021-10-26
1
-1
/
+1
*
set CDX request params are str, not int or datetime
Bryan Newbold
2021-10-26
1
-3
/
+6
*
bugfix: was setting 'from' parameter as a tuple, not a string
Bryan Newbold
2021-10-26
1
-1
/
+1
*
start type annotating IA helper code
Bryan Newbold
2021-10-26
1
-37
/
+65
*
start adding python type annotations to db and persist code
Bryan Newbold
2021-10-26
2
-97
/
+124
*
flake8 clean (with current settings)
Bryan Newbold
2021-10-26
7
-24
/
+22
*
start handling trivial lint cleanups: unused imports, 'is None', etc
Bryan Newbold
2021-10-26
15
-97
/
+57
*
make fmt
Bryan Newbold
2021-10-26
19
-571
/
+741
*
ingest_html: update trafilatura TEI-XML output kwarg
Bryan Newbold
2021-10-26
1
-1
/
+1
*
python: isort all imports
Bryan Newbold
2021-10-26
18
-99
/
+108
*
more small fileset ingest tweaks
Bryan Newbold
2021-10-26
2
-6
/
+21
*
persist support for ingest platform table, using existing persist worker
Bryan Newbold
2021-10-15
2
-2
/
+129
*
improve fileset ingest integration with file ingest
Bryan Newbold
2021-10-15
3
-5
/
+24
*
more fileset iteration
Bryan Newbold
2021-10-15
4
-45
/
+80
*
move SPNv2 'simple_get' logic to SPN client
Bryan Newbold
2021-10-15
3
-52
/
+31
*
filesets: iteration of implementation and docs
Bryan Newbold
2021-10-15
4
-82
/
+148
*
fileset ingest: improve platform parsing
Bryan Newbold
2021-10-15
1
-12
/
+196
*
fileset ingest: improve error handling
Bryan Newbold
2021-10-15
4
-48
/
+106
*
initial implementation of zenodo platform import
Bryan Newbold
2021-10-15
1
-0
/
+100
*
initial figshare platform helper
Bryan Newbold
2021-10-15
1
-0
/
+95
*
improvements to platform helpers
Bryan Newbold
2021-10-15
3
-34
/
+44
*
component ingest support for dataverse files (individual)
Bryan Newbold
2021-10-15
2
-13
/
+31
*
progress on web ingest strategy
Bryan Newbold
2021-10-15
3
-12
/
+121
*
fileset ingest progress for dataverse
Bryan Newbold
2021-10-15
4
-23
/
+291
*
local-file version of gen_file_metadata
Bryan Newbold
2021-10-15
2
-2
/
+43
*
progress on dataset ingest
Bryan Newbold
2021-10-15
4
-122
/
+333
*
wrap up previous renaming work
Bryan Newbold
2021-10-15
3
-5
/
+3
*
progress on fileset/dataset ingest
Bryan Newbold
2021-10-15
4
-0
/
+403
*
refactoring; progress on filesets
Bryan Newbold
2021-10-15
2
-1
/
+7
[prev]
[next]