index
:
fatcat
bnewbold-doaj-article-harvest
bnewbold-elastic-extras
bnewbold-openapi-client-generator-v601
bnewbold-pythonclient-types
bnewbold-redoc
bnewbold-rust-gen-v5
bnewbold-sitemap
bnewbold-ubuntu-jammy
cockroach
confluent-kafka
master
preview
x-attic-auth-other-macaroon-lib
x-attic-camp
x-attic-changelog-export
x-attic-chocula
x-attic-cockroach
x-attic-golang
x-attic-more-importers
x-attic-preview
x-attic-python-rust-hacks
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
fatcat_tools
/
importers
/
ingest.py
Commit message (
Expand
)
Author
Age
Files
Lines
*
fileset ingest: handle missing/partial file-level metadata
Bryan Newbold
2022-04-05
1
-3
/
+3
*
ingest importer: improved extra/edit_extra code flow
Bryan Newbold
2022-04-05
1
-20
/
+13
*
fileset ingest: remove a TODO
Bryan Newbold
2022-04-04
1
-1
/
+0
*
filesets: typo bugfix, and test 'mimetype' on entity, not extra
Bryan Newbold
2022-04-04
1
-1
/
+1
*
fileset ingest: fix mimetype handling
Bryan Newbold
2022-03-31
1
-4
/
+5
*
bugfix: logic flow in fileset release checking
Bryan Newbold
2022-03-23
1
-3
/
+6
*
single-file variant of fileset importer for dataset attempts
Bryan Newbold
2022-03-23
1
-0
/
+201
*
ingest fileset fixes, and some test coverage
Bryan Newbold
2022-03-23
1
-13
/
+19
*
dataset ingest: JSON object fixes
Bryan Newbold
2022-03-22
1
-5
/
+5
*
typing: relatively simple type check fixes
Bryan Newbold
2021-11-03
1
-3
/
+4
*
typing: initial annotations on importers
Bryan Newbold
2021-11-03
1
-35
/
+46
*
fmt (black): fatcat_tools/
Bryan Newbold
2021-11-02
1
-319
/
+374
*
python: isort everything
Bryan Newbold
2021-11-02
1
-0
/
+1
*
lint: simple, safe inline lint fixes
Bryan Newbold
2021-11-02
1
-6
/
+6
*
fix missing variable in fileset ingest
Bryan Newbold
2021-11-02
1
-2
/
+1
*
WIP: more fileset ingest
Bryan Newbold
2021-10-18
1
-13
/
+21
*
WIP: rel fixes
Bryan Newbold
2021-10-14
1
-6
/
+6
*
fileset ingest small tweaks
Bryan Newbold
2021-10-14
1
-21
/
+36
*
initial implementation of fileset ingest importers
Bryan Newbold
2021-10-14
1
-2
/
+223
*
new SPN web (html) importer
Bryan Newbold
2021-10-01
1
-26
/
+80
*
ingest importer behavior tweaks
Bryan Newbold
2021-10-01
1
-8
/
+8
*
more consistent and defensive lower-casing of DOIs
Bryan Newbold
2021-06-23
1
-0
/
+4
*
ingest: swap ingest and file checks, to result in clearer stats/counts of ski...
Bryan Newbold
2021-06-03
1
-2
/
+2
*
ingest: don't accept mag and s2 URLs
Bryan Newbold
2021-06-03
1
-4
/
+4
*
web ingest: terminal URL mismatch as skip, not assert
Bryan Newbold
2020-12-30
1
-1
/
+3
*
ingest: allow dblp imports
Bryan Newbold
2020-12-23
1
-1
/
+1
*
add dblp as an ingest source and identifier
Bryan Newbold
2020-12-17
1
-1
/
+2
*
ingest: allow doaj ingest responses
Bryan Newbold
2020-12-17
1
-1
/
+2
*
html ingest: small fixes to try_update() code path
Bryan Newbold
2020-12-15
1
-5
/
+5
*
html ingest: actual xhtml mimetype
Bryan Newbold
2020-11-16
1
-2
/
+2
*
html ingest: remaining implementation
Bryan Newbold
2020-11-06
1
-22
/
+19
*
ingest: progress on HTML ingest
Bryan Newbold
2020-11-05
1
-14
/
+30
*
ingest: initial 'web' worker implementation
Bryan Newbold
2020-11-05
1
-66
/
+258
*
ingest: whitelist -> allowlist
Bryan Newbold
2020-11-05
1
-3
/
+3
*
ingest: basic checks for ingest_type
Bryan Newbold
2020-11-05
1
-3
/
+29
*
lint (flake8) tool python files
Bryan Newbold
2020-07-01
1
-6
/
+1
*
ingest importer: check that stage is consistent with release
Bryan Newbold
2020-05-26
1
-0
/
+5
*
importers: clarify handling of ApiException
Bryan Newbold
2020-05-22
1
-0
/
+1
*
ingest importer: don't use glutton matches
Bryan Newbold
2020-05-22
1
-3
/
+3
*
ingest import: fix edit_extra path
Bryan Newbold
2020-02-18
1
-1
/
+1
*
ingest importer: edit_extra is a top-level key
Bryan Newbold
2020-02-18
1
-1
/
+1
*
ingest import: allow short version of corpus names
Bryan Newbold
2020-02-18
1
-0
/
+3
*
ingest importer: pass through link rel
Bryan Newbold
2020-02-18
1
-1
/
+6
*
check ingest_request_source existance for SPN as well as ingest
Bryan Newbold
2020-02-06
1
-0
/
+3
*
additional trusted link sources
Bryan Newbold
2020-02-06
1
-0
/
+3
*
add mag and s2 as trusted link sources
Bryan Newbold
2020-02-06
1
-1
/
+1
*
ingest worker: handle missing ingest_request_source
Bryan Newbold
2020-02-06
1
-0
/
+3
*
fix trivial typo in file importer
Bryan Newbold
2020-01-20
1
-1
/
+1
*
ingest: improve tests, support old ingest results
Bryan Newbold
2020-01-15
1
-3
/
+12
*
update ingest worker for schema tweaks
Bryan Newbold
2020-01-15
1
-8
/
+15
[next]