index
:
fatcat-scholar
bnewbold-jammy
debug-no-i18n
master
x-attic-gitlab-a11y
x-attic-rescore
Unnamed repository; edit this file 'description' to name the repository.
about
summary
refs
log
tree
commit
diff
stats
log msg
author
committer
range
path:
root
/
fatcat_scholar
/
transform.py
Commit message (
Expand
)
Author
Age
Files
Lines
*
reduce max body size to 0.5M characters
Bryan Newbold
2021-02-24
1
-1
/
+1
*
fix body size limit
Bryan Newbold
2021-02-24
1
-4
/
+4
*
fmt and lint fixes (including one actual bug)
Bryan Newbold
2021-02-15
1
-2
/
+3
*
truncate indexed fulltext body at 1 MByte
Bryan Newbold
2021-02-15
1
-2
/
+13
*
catch TEI-XML parsing exception
Bryan Newbold
2021-01-30
1
-12
/
+17
*
enable sentry exceptions for workers and pipelines
Bryan Newbold
2021-01-30
1
-1
/
+12
*
bigfix: try resolving lang_code list issue again
Bryan Newbold
2021-01-30
1
-5
/
+4
*
bugfix: lang_code sometimes a list
Bryan Newbold
2021-01-29
1
-2
/
+7
*
make fmt
Bryan Newbold
2021-01-25
1
-1
/
+4
*
basic support for excluding web content from index
Bryan Newbold
2021-01-22
1
-6
/
+45
*
bug fix: more html_fulltext not getting processed
Bryan Newbold
2021-01-22
1
-0
/
+2
*
add container_sherpa_color field, and populate it
Bryan Newbold
2021-01-22
1
-0
/
+1
*
improve 'oa' tag calculation
Bryan Newbold
2021-01-16
1
-4
/
+4
*
small corrections to schema/transform
Bryan Newbold
2021-01-16
1
-2
/
+4
*
add support for new identifiers and size_bytes schema fields
Bryan Newbold
2021-01-14
1
-0
/
+3
*
basic HTML transform/index support
Bryan Newbold
2020-11-18
1
-2
/
+46
*
refs: extract fatcat crossref pages metadata
Bryan Newbold
2020-11-13
1
-1
/
+1
*
commands: show usage on empty command
Bryan Newbold
2020-11-02
1
-1
/
+1
*
more SIM metadata mappings
Bryan Newbold
2020-10-19
1
-3
/
+31
*
SIM pipeline: more language conversions
Bryan Newbold
2020-10-16
1
-2
/
+5
*
transform: refactor tag generation out of transform heavy method
Bryan Newbold
2020-10-16
1
-28
/
+37
*
Upgrade Dynaconf to 3+
Bruno Rocha
2020-10-05
1
-1
/
+1
*
refs and grobid2json bugfixes from testing
Bryan Newbold
2020-09-14
1
-3
/
+10
*
bugfix: release_year
Bryan Newbold
2020-09-13
1
-2
/
+2
*
refs transform: both GROBID and fatcat refs
Bryan Newbold
2020-09-13
1
-5
/
+12
*
ref transform: support more GROBID fields
Bryan Newbold
2020-09-13
1
-10
/
+16
*
fixes to refs transform (for non-str author fields)
Bryan Newbold
2020-09-04
1
-2
/
+6
*
heavy to refs command
Bryan Newbold
2020-09-04
1
-2
/
+142
*
use simple names, not domain names, for some platforms
Bryan Newbold
2020-08-12
1
-3
/
+3
*
biblio metadata hacks at transform time
Bryan Newbold
2020-08-12
1
-2
/
+98
*
don't index sim_page without issue_item and first_page
Bryan Newbold
2020-08-06
1
-0
/
+3
*
handle integer conversion and bounding for ES schema
Bryan Newbold
2020-08-06
1
-10
/
+13
*
json: exclude None in output, and sort keys
Bryan Newbold
2020-07-27
1
-1
/
+1
*
ensure SIM release date parses before assigning
Bryan Newbold
2020-07-21
1
-1
/
+6
*
make fmt
Bryan Newbold
2020-06-29
1
-8
/
+13
*
include GROBID-extracted abstracts in search documents
Bryan Newbold
2020-06-29
1
-10
/
+15
*
small improvements to SIM metadata maps
Bryan Newbold
2020-06-29
1
-6
/
+11
*
fixes for pdf_meta dict
Bryan Newbold
2020-06-29
1
-1
/
+2
*
remove old COVID19 thumbnail hack
Bryan Newbold
2020-06-29
1
-1
/
+2
*
fetch pdftotext and pdf_meta from blobs, postgrest
Bryan Newbold
2020-06-29
1
-21
/
+13
*
collapse pages by SIM issue
Bryan Newbold
2020-06-04
1
-0
/
+3
*
flake8-annotation linting
Bryan Newbold
2020-06-03
1
-3
/
+3
*
flake8 fixes (partial)
Bryan Newbold
2020-06-03
1
-11
/
+2
*
reformat python code with black
Bryan Newbold
2020-06-03
1
-109
/
+158
*
fixes from running pipeline
Bryan Newbold
2020-06-03
1
-1
/
+2
*
compute and use tags
Bryan Newbold
2020-06-03
1
-0
/
+41
*
fixes from manual testing
Bryan Newbold
2020-05-20
1
-5
/
+4
*
fixes to release+sim pipeline
Bryan Newbold
2020-05-20
1
-1
/
+2
*
indexing tweaks
Bryan Newbold
2020-05-20
1
-3
/
+4
*
first pass transform from pipelines to ES schema
Bryan Newbold
2020-05-20
1
-0
/
+306