index
:
fatcat
bnewbold-doaj-article-harvest
bnewbold-elastic-extras
bnewbold-openapi-client-generator-v601
bnewbold-pythonclient-types
bnewbold-redoc
bnewbold-rust-gen-v5
bnewbold-sitemap
bnewbold-ubuntu-jammy
cockroach
confluent-kafka
master
preview
x-attic-auth-other-macaroon-lib
x-attic-camp
x-attic-changelog-export
x-attic-chocula
x-attic-cockroach
x-attic-golang
x-attic-more-importers
x-attic-preview
x-attic-python-rust-hacks
[no description]
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
python
/
fatcat_tools
/
importers
Commit message (
Expand
)
Author
Age
Files
Lines
*
refactor importer metadata tables into separate file; move some helpers around
Bryan Newbold
2021-11-10
8
-621
/
+25
*
importers: refactor imports of clean() and other normalization helpers
Bryan Newbold
2021-11-10
12
-95
/
+104
*
remove cdl_dash_dat and wayback_static importers
Bryan Newbold
2021-11-10
3
-510
/
+0
*
datacite import: store less subject metadata
Bryan Newbold
2021-11-10
1
-1
/
+7
*
importers: use clean_doi() in many more (all?) importers
Bryan Newbold
2021-11-09
6
-12
/
+29
*
remove deprecated extid sqlite3 lookup table feature from importers
Bryan Newbold
2021-11-09
3
-160
/
+0
*
datacite importer: remove unused 'year_only' variable
Bryan Newbold
2021-11-03
1
-2
/
+3
*
datacite: add comment about potential date parsing bug
Bryan Newbold
2021-11-03
1
-0
/
+1
*
datacite importer: dateparser.date.DateDataParser()
Bryan Newbold
2021-11-03
1
-1
/
+1
*
more involved type wrangling and fixes for importers
Bryan Newbold
2021-11-03
3
-12
/
+14
*
typing: relatively simple type check fixes
Bryan Newbold
2021-11-03
14
-87
/
+82
*
typing: initial annotations on importers
Bryan Newbold
2021-11-03
22
-274
/
+443
*
importers: remove unused __main__ routine
Bryan Newbold
2021-11-03
4
-19
/
+0
*
lint: resolve existing mypy type errors
Bryan Newbold
2021-11-02
3
-22
/
+27
*
re-fix some lint issues after big 'fmt'
Bryan Newbold
2021-11-02
1
-2
/
+2
*
fmt (black): fatcat_tools/
Bryan Newbold
2021-11-02
22
-2115
/
+2578
*
python: isort everything
Bryan Newbold
2021-11-02
17
-41
/
+70
*
arabesque import 'hit' field is 1/0, not true/false
Bryan Newbold
2021-11-02
1
-2
/
+2
*
lint: simple, safe inline lint fixes
Bryan Newbold
2021-11-02
12
-22
/
+21
*
lint/fmt: remove all 'import *'
Bryan Newbold
2021-11-02
5
-21
/
+41
*
re-fmt all the fatcat_tools __init__ files for readability
Bryan Newbold
2021-11-02
1
-17
/
+39
*
small python tweaks for annotations, imports
Bryan Newbold
2021-11-02
2
-2
/
+6
*
try some type annotations
Bryan Newbold
2021-11-02
2
-55
/
+63
*
fix missing variable in fileset ingest
Bryan Newbold
2021-11-02
1
-2
/
+1
*
WIP: more fileset ingest
Bryan Newbold
2021-10-18
1
-13
/
+21
*
WIP: rel fixes
Bryan Newbold
2021-10-14
1
-6
/
+6
*
fileset ingest small tweaks
Bryan Newbold
2021-10-14
1
-21
/
+36
*
initial implementation of fileset ingest importers
Bryan Newbold
2021-10-14
2
-3
/
+224
*
generic fileset importer class, with test coverage
Bryan Newbold
2021-10-14
3
-0
/
+88
*
dblp import: basic support for handles as identifiers
Bryan Newbold
2021-10-13
1
-1
/
+5
*
dblp import: fix typos in identifier parsing
Bryan Newbold
2021-10-13
1
-2
/
+1
*
python: partial importer utilization of new schema changes
Bryan Newbold
2021-10-13
3
-6
/
+18
*
Merge branch 'bnewbold-ingest-tweaks' into 'master'
bnewbold
2021-10-02
3
-39
/
+106
|
\
|
*
kafka import: optional 'force-flush' mode for some importers
Bryan Newbold
2021-10-01
1
-0
/
+13
|
*
new SPN web (html) importer
Bryan Newbold
2021-10-01
2
-27
/
+81
|
*
ingest importer behavior tweaks
Bryan Newbold
2021-10-01
1
-8
/
+8
|
*
importer common: more verbose logging (with counts)
Bryan Newbold
2021-10-01
1
-4
/
+4
*
|
datacite: skip empty abstracts
Martin Czygan
2021-10-01
1
-1
/
+4
|
/
*
more consistent and defensive lower-casing of DOIs
Bryan Newbold
2021-06-23
2
-1
/
+6
*
datacite: more careful title string access; fixes sentry #88350
Martin Czygan
2021-06-11
1
-1
/
+1
*
ingest: swap ingest and file checks, to result in clearer stats/counts of ski...
Bryan Newbold
2021-06-03
1
-2
/
+2
*
ingest: don't accept mag and s2 URLs
Bryan Newbold
2021-06-03
1
-4
/
+4
*
small python lint fixes (no behavior change)
Bryan Newbold
2021-05-25
1
-2
/
+0
*
arabesque importer: ensure full 14-digit timestamps
Bryan Newbold
2021-05-21
1
-1
/
+3
*
datacite: a missing surname should be None, not the empty string
Martin Czygan
2021-04-02
1
-2
/
+1
*
web ingest: terminal URL mismatch as skip, not assert
Bryan Newbold
2020-12-30
1
-1
/
+3
*
dblp release import: skip arxiv_id releases
Bryan Newbold
2020-12-24
1
-0
/
+9
*
dblp import: fix arxiv_id typo
Bryan Newbold
2020-12-23
1
-1
/
+1
*
ingest: allow dblp imports
Bryan Newbold
2020-12-23
1
-1
/
+1
*
fuzzy: set 120 second timeout on ES lookups
Bryan Newbold
2020-12-23
1
-1
/
+1
[next]