index
:
sandcrawler
bnewbold-args
bnewbold-backfill
bnewbold-persist-grobid-errors
bnewbold-refactor-loggging
master
trawler
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
Commit message (
Expand
)
Author
Age
Files
Lines
...
*
ingest_html: update trafilatura TEI-XML output kwarg
Bryan Newbold
2021-10-26
1
-1
/
+1
*
python: isort all imports
Bryan Newbold
2021-10-26
57
-178
/
+207
*
add pyproject.toml (for isort and yapf config), and update 'lint' and 'fmt' m...
Bryan Newbold
2021-10-26
2
-3
/
+13
*
pipenv: general update; add isort, yapf (over black), grobid_tei_xml
Bryan Newbold
2021-10-26
2
-730
/
+880
*
kafka monitoring commands
Bryan Newbold
2021-10-26
1
-0
/
+4
*
more small fileset ingest tweaks
Bryan Newbold
2021-10-26
2
-6
/
+21
*
commit SPN account changes
Bryan Newbold
2021-10-15
1
-0
/
+14
*
commit old ingest domain summary
Bryan Newbold
2021-10-15
1
-0
/
+345
*
python: more aggressive gitignore
Bryan Newbold
2021-10-15
1
-0
/
+3
*
persist support for ingest platform table, using existing persist worker
Bryan Newbold
2021-10-15
3
-4
/
+131
*
sql fileset ingest table iteration
Bryan Newbold
2021-10-15
1
-12
/
+11
*
document passing back platform_base_url
Bryan Newbold
2021-10-15
1
-0
/
+1
*
improve fileset ingest integration with file ingest
Bryan Newbold
2021-10-15
4
-5
/
+25
*
more fileset iteration
Bryan Newbold
2021-10-15
5
-45
/
+81
*
move SPNv2 'simple_get' logic to SPN client
Bryan Newbold
2021-10-15
3
-52
/
+31
*
filesets: iteration of implementation and docs
Bryan Newbold
2021-10-15
5
-96
/
+167
*
updates to fileset ingest proposal
Bryan Newbold
2021-10-15
2
-239
/
+337
*
fileset ingest notes
Bryan Newbold
2021-10-15
1
-3
/
+23
*
fileset ingest: improve platform parsing
Bryan Newbold
2021-10-15
1
-12
/
+196
*
fileset ingest: improve error handling
Bryan Newbold
2021-10-15
4
-48
/
+106
*
initial implementation of zenodo platform import
Bryan Newbold
2021-10-15
1
-0
/
+100
*
initial figshare platform helper
Bryan Newbold
2021-10-15
1
-0
/
+95
*
improvements to platform helpers
Bryan Newbold
2021-10-15
3
-34
/
+44
*
component ingest support for dataverse files (individual)
Bryan Newbold
2021-10-15
2
-13
/
+31
*
progress on web ingest strategy
Bryan Newbold
2021-10-15
3
-12
/
+121
*
fileset ingest progress for dataverse
Bryan Newbold
2021-10-15
4
-23
/
+291
*
local-file version of gen_file_metadata
Bryan Newbold
2021-10-15
3
-3
/
+56
*
progress on dataset ingest
Bryan Newbold
2021-10-15
4
-122
/
+333
*
dataset ingest: start enumerating examples
Bryan Newbold
2021-10-15
1
-0
/
+34
*
ingest tool: always require ingest type as part of 'single' command
Bryan Newbold
2021-10-15
1
-3
/
+3
*
wrap up previous renaming work
Bryan Newbold
2021-10-15
4
-6
/
+4
*
progress on fileset/dataset ingest
Bryan Newbold
2021-10-15
4
-0
/
+403
*
scripts: example archiveorg-to-fileset importer
Bryan Newbold
2021-10-15
1
-0
/
+138
*
initial dataset/fileset ingest proposal
Bryan Newbold
2021-10-15
1
-0
/
+185
*
sql: initial ingest fileset table
Bryan Newbold
2021-10-15
1
-0
/
+38
*
sql: fix typo in CHECK statement
Bryan Newbold
2021-10-15
1
-1
/
+1
*
refactoring; progress on filesets
Bryan Newbold
2021-10-15
3
-9
/
+27
*
rename some python files for clarity
Bryan Newbold
2021-10-15
3
-0
/
+0
*
pdf ingest: journals.uchicago.edu pattern
Bryan Newbold
2021-10-11
1
-0
/
+8
*
spn: avoid 'None' job_id
Bryan Newbold
2021-10-11
1
-2
/
+2
*
Merge branch 'bnewbold-backfill' into 'master'
bnewbold
2021-10-04
3
-0
/
+384
|
\
|
*
temporary please option for scala backfill
Bryan Newbold
2018-07-24
1
-0
/
+22
|
*
small CdxBackfillJob refactor (code quality)
Bryan Newbold
2018-07-24
1
-5
/
+5
|
*
do sha1 pattern match correctly
Bryan Newbold
2018-07-24
2
-3
/
+18
|
*
more PDF mimetypes; fix return refactor
Bryan Newbold
2018-07-24
1
-2
/
+5
|
*
CdxBackfillJob: comment cleanup
Bryan Newbold
2018-07-24
1
-6
/
+0
|
*
CdxBackfillJob: scalastyle
Bryan Newbold
2018-07-24
1
-22
/
+14
|
*
address some (but not all) review comments
Bryan Newbold
2018-07-24
1
-20
/
+21
|
*
reference TDsl note in docs
Bryan Newbold
2018-07-24
1
-0
/
+16
|
*
fix CdxBackfillJob tests
Bryan Newbold
2018-07-24
2
-6
/
+13
[prev]
[next]