index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Collapse
)
Author
Age
Files
Lines
*
set CDX request params are str, not int or datetime
Bryan Newbold
2021-10-26
1
-3
/
+6
|
|
|
|
This might be a bugfix, changing CDX lookup behavior?
*
bugfix: was setting 'from' parameter as a tuple, not a string
Bryan Newbold
2021-10-26
1
-1
/
+1
|
*
start type annotating IA helper code
Bryan Newbold
2021-10-26
1
-37
/
+65
|
*
start adding python type annotations to db and persist code
Bryan Newbold
2021-10-26
2
-97
/
+124
|
*
Makefile: don't fail on isort error (consider these minor)
Bryan Newbold
2021-10-26
1
-1
/
+1
|
*
tweak flake8 config
Bryan Newbold
2021-10-26
1
-2
/
+11
|
*
flake8 clean (with current settings)
Bryan Newbold
2021-10-26
9
-25
/
+24
|
*
pipenv: import type annotations for requests and dateparser
Bryan Newbold
2021-10-26
2
-1
/
+19
|
*
start handling trivial lint cleanups: unused imports, 'is None', etc
Bryan Newbold
2021-10-26
30
-149
/
+86
|
*
make fmt
Bryan Newbold
2021-10-26
59
-1225
/
+1582
|
*
tweak lint/fmt settings
Bryan Newbold
2021-10-26
2
-4
/
+6
|
*
update pytest warning filters (they are pretty expansive)
Bryan Newbold
2021-10-26
1
-0
/
+3
|
*
ingest_html: update trafilatura TEI-XML output kwarg
Bryan Newbold
2021-10-26
1
-1
/
+1
|
*
python: isort all imports
Bryan Newbold
2021-10-26
57
-178
/
+207
|
*
add pyproject.toml (for isort and yapf config), and update 'lint' and 'fmt' ↵
Bryan Newbold
2021-10-26
2
-3
/
+13
|
|
|
|
make targets
*
pipenv: general update; add isort, yapf (over black), grobid_tei_xml
Bryan Newbold
2021-10-26
2
-730
/
+880
|
*
kafka monitoring commands
Bryan Newbold
2021-10-26
1
-0
/
+4
|
*
more small fileset ingest tweaks
Bryan Newbold
2021-10-26
2
-6
/
+21
|
*
commit SPN account changes
Bryan Newbold
2021-10-15
1
-0
/
+14
|
*
commit old ingest domain summary
Bryan Newbold
2021-10-15
1
-0
/
+345
|
*
python: more aggressive gitignore
Bryan Newbold
2021-10-15
1
-0
/
+3
|
*
persist support for ingest platform table, using existing persist worker
Bryan Newbold
2021-10-15
3
-4
/
+131
|
*
sql fileset ingest table iteration
Bryan Newbold
2021-10-15
1
-12
/
+11
|
*
document passing back platform_base_url
Bryan Newbold
2021-10-15
1
-0
/
+1
|
*
improve fileset ingest integration with file ingest
Bryan Newbold
2021-10-15
4
-5
/
+25
|
*
more fileset iteration
Bryan Newbold
2021-10-15
5
-45
/
+81
|
*
move SPNv2 'simple_get' logic to SPN client
Bryan Newbold
2021-10-15
3
-52
/
+31
|
*
filesets: iteration of implementation and docs
Bryan Newbold
2021-10-15
5
-96
/
+167
|
*
updates to fileset ingest proposal
Bryan Newbold
2021-10-15
2
-239
/
+337
|
*
fileset ingest notes
Bryan Newbold
2021-10-15
1
-3
/
+23
|
*
fileset ingest: improve platform parsing
Bryan Newbold
2021-10-15
1
-12
/
+196
|
*
fileset ingest: improve error handling
Bryan Newbold
2021-10-15
4
-48
/
+106
|
*
initial implementation of zenodo platform import
Bryan Newbold
2021-10-15
1
-0
/
+100
|
*
initial figshare platform helper
Bryan Newbold
2021-10-15
1
-0
/
+95
|
*
improvements to platform helpers
Bryan Newbold
2021-10-15
3
-34
/
+44
|
*
component ingest support for dataverse files (individual)
Bryan Newbold
2021-10-15
2
-13
/
+31
|
*
progress on web ingest strategy
Bryan Newbold
2021-10-15
3
-12
/
+121
|
*
fileset ingest progress for dataverse
Bryan Newbold
2021-10-15
4
-23
/
+291
|
*
local-file version of gen_file_metadata
Bryan Newbold
2021-10-15
3
-3
/
+56
|
*
progress on dataset ingest
Bryan Newbold
2021-10-15
4
-122
/
+333
|
*
dataset ingest: start enumerating examples
Bryan Newbold
2021-10-15
1
-0
/
+34
|
*
ingest tool: always require ingest type as part of 'single' command
Bryan Newbold
2021-10-15
1
-3
/
+3
|
*
wrap up previous renaming work
Bryan Newbold
2021-10-15
4
-6
/
+4
|
*
progress on fileset/dataset ingest
Bryan Newbold
2021-10-15
4
-0
/
+403
|
*
scripts: example archiveorg-to-fileset importer
Bryan Newbold
2021-10-15
1
-0
/
+138
|
*
initial dataset/fileset ingest proposal
Bryan Newbold
2021-10-15
1
-0
/
+185
|
*
sql: initial ingest fileset table
Bryan Newbold
2021-10-15
1
-0
/
+38
|
*
sql: fix typo in CHECK statement
Bryan Newbold
2021-10-15
1
-1
/
+1
|
*
refactoring; progress on filesets
Bryan Newbold
2021-10-15
3
-9
/
+27
|
*
rename some python files for clarity
Bryan Newbold
2021-10-15
3
-0
/
+0
|
[next]