index
:
fatcat-scholar
bnewbold-jammy
debug-no-i18n
master
x-attic-gitlab-a11y
x-attic-rescore
Unnamed repository; edit this file 'description' to name the repository.
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
fatcat_scholar
/
work_pipeline.py
Commit message (
Expand
)
Author
Age
Files
Lines
*
catch/ignore ChunkedEncoding errors in fetches
Bryan Newbold
2021-06-11
1
-0
/
+3
*
lint fixes, and run fmt
Bryan Newbold
2021-06-02
1
-7
/
+7
*
add 'crossref' hydration to work pipeline
Bryan Newbold
2021-06-02
1
-0
/
+35
*
schema: add 'crossref' to bundle schema, and add from_json() helper
Bryan Newbold
2021-06-02
1
-0
/
+1
*
Modernize Python syntax with pyupgrade --py38-plus **/*.py
Christian Clauss
2021-02-23
1
-1
/
+1
*
fmt and lint fixes (including one actual bug)
Bryan Newbold
2021-02-15
1
-1
/
+1
*
more seaweedfs hacks
Bryan Newbold
2021-02-12
1
-0
/
+8
*
enable sentry exceptions for workers and pipelines
Bryan Newbold
2021-01-30
1
-1
/
+10
*
work pipeline: hack to skip seaweedfs errors for now
Bryan Newbold
2021-01-26
1
-0
/
+5
*
sort keys in work pipeline (fix typo)
Bryan Newbold
2021-01-22
1
-1
/
+1
*
bug fix: actually fetch/include HTML fulltext
Bryan Newbold
2021-01-22
1
-1
/
+1
*
add basic html fulltext support to fetch pipeline
Bryan Newbold
2020-11-18
1
-2
/
+46
*
commands: show usage on empty command
Bryan Newbold
2020-11-02
1
-1
/
+1
*
work pipeline comparison fix
Bryan Newbold
2020-10-28
1
-0
/
+3
*
Upgrade Dynaconf to 3+
Bruno Rocha
2020-10-05
1
-1
/
+1
*
pipeline: skip grobid/pdftext lookups when no URL; prefer GROBID to pdftext
Bryan Newbold
2020-07-27
1
-1
/
+3
*
json: exclude None in output, and sort keys
Bryan Newbold
2020-07-27
1
-2
/
+2
*
fix lint errors (and some small bugs)
Bryan Newbold
2020-06-29
1
-6
/
+8
*
seaweedfs for S3 API; pull config from dynaconf
Bryan Newbold
2020-06-29
1
-11
/
+2
*
make fmt
Bryan Newbold
2020-06-29
1
-1
/
+3
*
fetch pdftotext and pdf_meta from blobs, postgrest
Bryan Newbold
2020-06-29
1
-18
/
+45
*
flake8 fixes (partial)
Bryan Newbold
2020-06-03
1
-5
/
+2
*
reformat python code with black
Bryan Newbold
2020-06-03
1
-68
/
+120
*
more petabox timeout handling
Bryan Newbold
2020-05-21
1
-0
/
+3
*
handle petabox read timeouts a bit
Bryan Newbold
2020-05-21
1
-1
/
+6
*
fix typo with UnicodeDecodeError catch
Bryan Newbold
2020-05-21
1
-1
/
+1
*
skip pdftotext loading on unicode error
Bryan Newbold
2020-05-20
1
-0
/
+2
*
skip SIM items w/o page_numbers (instead of asserting)
Bryan Newbold
2020-05-20
1
-1
/
+3
*
fixes from manual testing
Bryan Newbold
2020-05-20
1
-8
/
+13
*
local pdftotext cache dir hack
Bryan Newbold
2020-05-20
1
-1
/
+18
*
fixes to release+sim pipeline
Bryan Newbold
2020-05-20
1
-10
/
+16
*
first pass transform from pipelines to ES schema
Bryan Newbold
2020-05-20
1
-16
/
+1
*
WIP on SIM pipeline
Bryan Newbold
2020-05-19
1
-2
/
+2
*
WIP on release-to-sim fetching
Bryan Newbold
2020-05-19
1
-12
/
+49
*
initial progress on work pipeline
Bryan Newbold
2020-05-16
1
-0
/
+305